Nlpsolver knowledge representation principles: draft

Allikas: Lambda

--------

taxonomy

-------

The schema predicate is "isa", used like this:
["isa",class,object_in_class]

Note: no context is currently added to the "isa". 

Examples:

"John is a man.":     ["isa","man","c1_John"]  
"Bears are animals.": [["isa","bear","?:S2"], "=>", ["isa","animal","?:S2"]]  ehk ["or", ["-isa","bear","?:S2"], ["isa","animal","?:S2"]] 

Note: variables are anything starting with ?:, but I use a readability-enhancing convention:
?:S subject
?:O object
?:A action
?:Tense tense (past/present)
?:Fv situation number for a given tense

--------
        
       
        "property"   permanent (big) and temporary (angry) 

------

The schema predicate is "prop", with these arguments:
["prop",actual_property,object,strength of property (1 small / $generic not indicated / 3 strong), class of property (a la small bear): $generic if missing, context]

About context: 
* In case the statement holds always, use a variable for the context, like ["prop","nice","c1_John","$generic","$generic","?:Ctxt"] 
* The context structure will be extended in the future with more parameters
* The current context structure is ["$ctxt",past_pres_or_future (either "Past","Pres"),concrete_situation_number in past/present/future: separate enumerations]
        
"John is nice":          ["prop","nice","c1_John","$generic","$generic",["$ctxt","Pres",1]]     
"John is somewhat nice"  ["prop","nice","c1_John",1,"$generic",["$ctxt","Pres",1]]
"John is very nice"      ["prop","nice","c1_John",3,"$generic",["$ctxt","Pres",1]]

"John is a big mouse":   ["prop","big","c1_John","$generic","mouse",["$ctxt","Pres",1]]
"John is a nice mouse":  ["prop","nice","c1_John","$generic","$generic",["$ctxt","Pres",1]]

Notice that in the last example in "nice mouse" the "nice" is not considered to be class-related. Only a fixed list of property-words like "big", "small", etc
is considered to be class-related.
        

---------        

        "hasa"    possessions and body parts

---------

The schema predicate is "rel2" with the first argument "have", for undetermined number of things
["rel2","have",object_having,what_does_it_have,context]
plus functions for single values, counted sets and measures:
["$theof1",class_of_object_haved,object_who_has,context]
and
["$count",["$setof",logic_expression,object_who_has]]
  where logic_expression may contain pseudo-lambda-parameters $arg1,$arg2 etc  
and
["$count",["$measure1",type_of_measure,object_measured,unit_of_measurement,context]
where "unit_of_measurement" may be "$generic" if not relevant, and the type words are limited,
currently: heavy,light, long,shot, tall,short, wide,narrow, deep,shallow, warm,hot,cold,cost,cheap 

Example for undetermined/unmeasured number of things:
"John has a car.": 

First we make a formula:

[exists,[?:O3],[and,[isa,car,?:O3],[rel2,have,c1_John,?:O3,[$ctxt,Pres,1]]]]

and normalize it to 

["isa","car","cs2"]
["rel2","have","c1_John","cs2",["$ctxt","Pres",1]]

Another example:

"Elephants have a trunk."
[forall,[?:S2],[[isa,elephant,?:S2],=>,[exists,[?:O1],[and,[isa,trunk,?:O1],[rel2,have,?:S2,?:O1,[$ctxt,Pres,1]]]]]]

and normalize it (observe skolemizing the "exists ?:O1" to ["cs1","?:S2"]):

["or", ["isa","trunk",["cs1","?:S2"]], ["-isa","elephant","?:S2"]]
["or",
    ["rel2","have","?:S2",["cs1","?:S2"],["$ctxt","Pres",1]],
    ["-isa","elephant","?:S2"],
    ["$block",["$","elephant",1],["$not",["rel2","have","?:S2",["cs1","?:S2"],["$ctxt","Pres",1]]]]],


A full example which illustrates that for questions and conditions we cannot like pre-skolemize, but have to 
use the formula with quantifiers before the final normalization:

"John has a red car. John has a car?":

  [and
     [prop,red,cs2,$generic,$generic,[$ctxt,Pres,1]]
     [isa,car,cs2,]
     [rel2,have,c1_John,cs2,[$ctxt,Pres,1]]] 
  [[$def0],<=>,[exists,[?:O3],[and,[isa,car,?:O3],[rel2,have,c1_John,?:O3,[$ctxt,Pres,?:Fv5]]]]]
  { @question: [$def0] }

clausified

[
{"@logic": ["prop","red","cs2","$generic","$generic",["$ctxt","Pres",1]]},
{"@logic": ["isa","car","cs2"]},
{"@logic": ["rel2","have","c1_John","cs2",["$ctxt","Pres",1]]},
{"@logic": ["or", ["isa","car","cs3"], ["-$def0"]]},
{"@logic": ["or", ["rel2","have","c1_John","cs3",["$ctxt","Pres","?:Fv5"]], ["-$def0"]]},
{"@logic": ["or",
    ["$def0"],
    ["-isa","car","?:O3"],
    ["-rel2","have","c1_John","?:O3",["$ctxt","Pres","?:Fv5"]]]},
{"@question": ["$def0"]}
]

Next the functional having:

[and
   [isa,elephant,the_c1_elephant]
   [rel2,have,the_c1_elephant,[$theof1,trunk,the_c1_elephant,[$ctxt,Pres,1]],[$ctxt,Pres,1]]
   [isa,trunk,[$theof1,trunk,the_c1_elephant,[$ctxt,Pres,1]]]
   [prop,heavy,[$theof1,trunk,the_c1_elephant,[$ctxt,Pres,1]],$generic,trunk,[$ctxt,Pres,1]]]

Next the countable having:

"John has three red cars":
[exists,[?:O1],
  [and,[=,3,[$count,[$setof,[and,[isa,car,$arg1],
                              [prop,red,$arg1,$generic,$generic,[$ctxt,Pres,1]]],c1_John]]],
       [=,3,[$count,?:O1],[$conf,1,False]],[prop,red,?:O1,$generic,$generic,[$ctxt,Pres,1]],
       [isa,car,?:O1],
       [rel2,have,c1_John,?:O1,[$ctxt,Pres,1]]]]
       
with the main statement there normalized as

["=",3,
   ["$count",["$setof",
                ["and",["isa","car","$arg1"],["prop","red","$arg1","$generic","$generic",["$ctxt","Pres",1]]],
                "c1_John"]]]

Next the measurable having:

"Nile has the length 10 kilometers" or
"The length of Nile is 10 kilometers" etc
[and
   [rel2,have,c1_Nile,[$measure1,length,c1_Nile,kilometer,[$ctxt,Pres,1]],[$ctxt,Pres,1]]
   [isa,length,[$measure1,length,c1_Nile,kilometer,[$ctxt,Pres,1]]]
   [=,10,[$count,[$measure1,length,c1_Nile,kilometer,[$ctxt,Pres,1]]]]
   [isa,kilometer,[$measure1,length,c1_Nile,kilometer,[$ctxt,Pres,1]]]]

with the main statement there normalized as

["=",10,["$count",["$measure1","length","c1_Nile","kilometer",["$ctxt","Pres",1]]]]



----------
        
        "capability"   what can it do   (verbs?)
        
------

The schema predicates are can1 and can2:
[can1, verb_which_can_do, subject_who_can, action_or_capability_id, context]
[can2, verb_which_can_do, subject_who_can, object_of_action, action_or_capability_id, context]

and closely related predicates act1 and act2 with the same arguments, for actually doing something:
[act1, verb_which_can_do, subject_who_can, action_or_capability_id, context]
[act2, verb_which_can_do, subject_who_can, object_of_action, action_or_capability_id, context]

NB! The action verb (eat), doer/subject, context and optionally object are present as arguments, 
but location, helpers, qualities of action etc are indicated separately as properties
of the action/capability id.

Example for can1:

"John can fly":
[exists,[?:A3],[can1,fly,c1_John,?:A3,[$ctxt,?:Tense4,1]]]

which is normalized to

["can1","fly","c1_John","cs2",["$ctxt","?:Tense2",1]]

Another example for can1:

"Birds can fly." 
[forall,[?:S1],[[isa,bird,?:S1],=>,[exists,[?:A2],[can1,fly,?:S1,?:A2,[$ctxt,?:Tense3,1]]]]]

which is normalized to

["or",
    ["-isa","bird","?:S1"],
    ["can1","fly","?:S1",["cs1","?:S1"],["$ctxt","?:Tense3",1]],
    ["$block",["$","bird",1],["$not",["can1","fly","?:S1",["cs1","?:S1"],["$ctxt","?:Tense3",1]]]]]},
    
Yet another example for can1:

"Penguins cannot fly."
[forall,[?:S1],[[isa,penguin,?:S1],=>,[not,[exists,[?:A2],[can1,fly,?:S1,?:A2,[$ctxt,?:Tense3,1]]]]]

which is normalized to

["or",
    ["-isa","penguin","?:S1"],
    ["-can1","fly","?:S1","?:A2",["$ctxt","?:Tense3",1]],
    ["$block",["$","penguin",1],["can1","fly","?:S1","?:A2",["$ctxt","?:Tense3",1]]]]},
    
    
Example for can2:

"John can drive a car."
[and
   [isa,car,the_c2_car,[$conf,1,False]]
   [exists,[?:A2],[can2,drive,c1_John,the_c2_car,?:A2,[$ctxt,?:Tense3,1]]]]

which is normalized to
   
["isa","car","the_c2_car"]
["can2","drive","c1_John","the_c2_car","cs3",["$ctxt","?:Tense3",1]],

Another example for can2:

"Bears can eat honey."
[forall,[?:S2],[[isa,bear,?:S2],=>,[exists,[?:O1],[and,[isa,honey,?:O1,],[exists,[?:A3],[can2,eat,?:S2,?:O1,?:A3,[$ctxt,?:Tense4,1]]]]]]]

which is normalized to

["or", ["isa","honey",["cs1","?:S2"]], ["-isa","bear","?:S2"]],
["or",
    ["can2","eat","?:S2",["cs1","?:S2"],["cs2","?:S2"],["$ctxt","?:Tense4",1]],
    ["-isa","bear","?:S2"],
    ["$block",["$","bear",1],["$not",["can2","eat","?:S2",["cs1","?:S2"],["cs2","?:S2"],["$ctxt","?:Tense4",1]]]]],


Full example illustrating properties of the action/verb:

"John can fly fast. John can fly?"

[
{"@logic": ["prop","fast","cs2","$generic","$generic",["$ctxt","?:Tense2",1]]},
{"@logic": ["can1","fly","c1_John","cs2",["$ctxt","?:Tense3",1]]},
{"@logic": ["or", ["-$def0"], ["can1","fly","c1_John","cs3",["$ctxt","cs4","?:Fv6"]]]},
{"@logic": ["or", ["$def0"], ["-can1","fly","c1_John","?:A4",["$ctxt","?:Tense5","?:Fv6"]]]},
{"@question": ["$def0"]}
]



NB! There is also actually _doing_ something:


"John drove the red car":
[and
   [prop,red,the_c2_car,$generic,$generic,[$ctxt,Past,1]]
   [isa,car,the_c2_car]
   [exists,[?:A2],[act2,drive,c1_John,the_c2_car,?:A2,[$ctxt,Past,1]]]]


------------

        "comparative"  arity 3    subject bigger subject2
        
------------

The schema predicate is rel2_than for non-measurable and "=", "$less", "$lesseq", "$greater", "$greatereq" for measurable:
[rel2_than,property_compared,more_object,less_object,action_id,context]
["=", counted_measure1, counted_measure2]
where the "counter_measure" has the same structure/meaning as above for the "having" relation.

NB!! We should probably modify rel2_than to contain the somewhat/much distinction,
or add the distinction to the action id, or drop the action id.

Example for non-measurable comparison:

"John is nicer than Eve." 

[exists,[?:A1],[rel2_than,nice,c2_John,c1_Eve,?:A1,[$ctxt,?:Tense2,1]]]

which is normalized to

["rel2_than","nice","c2_John","c1_Eve","cs3",["$ctxt","?:Tense2",1]]

Example for measurable:

"The length of Nile is equal to the length of Amazon." 

which is normalized to

["rel2","have","c1_Nile",["$measure1","length","c1_Nile","$generic",["$ctxt","Pres",1]],["$ctxt","Pres",1]]
["isa","length",["$measure1","length","c1_Nile","$generic",["$ctxt","Pres",1]]]
["rel2","have","c2_Amazon",["$measure1","length","c2_Amazon","$generic",["$ctxt","Pres",1]],["$ctxt","Pres",1]]
["isa","length",["$measure1","length","c2_Amazon","$generic",["$ctxt","Pres",1]]]},
["=",["$count",["$measure1","length","c1_Nile","$generic",["$ctxt","Pres",1]]],["$count",["$measure1","length","c2_Amazon","$generic",["$ctxt","Pres",1]]]]

where probably only the last one is actually needed and the rest can be skipped.

---------------

    "partof"     membership
    
---------------

The schema predicate is rel2_of in combination with "part" or "rel2" in combination with "in"
["rel2_of","part",what_is_the_part,who_has_the_part,action_relation_id,ctxt]
["rel2","in",wha_is_in,in_what,context],

NB! Maybe the action_relation_id should be dropped, or maybe some sensible use can be found?
NB! Also, maybe a special relation should be created?

Example for "rel2_of"+"part":

"Trunks are a part of an elephant."
[forall,[?:S2],[[isa,trunk,?:S2],=>,[and,[isa,elephant,the_c1_elephant],
                                         [exists,[?:A3],[rel2_of,part,?:S2,the_c1_elephant,?:A3,[$ctxt,?:Tense4,1]]]]]]

normalized to

["or",
    ["rel2_of","part","?:S2","the_c1_elephant",["cs2","?:S2"],["$ctxt","?:Tense4",1]],
    ["-isa","trunk","?:S2"],
    ["$block",["$","trunk",1],["$not",["rel2_of","part","?:S2","the_c1_elephant",["cs2","?:S2"],["$ctxt","?:Tense4",1]]]]],
    
Example for "rel2"+"in":

"Elephants contain trunks"
[forall,[?:S2],[[isa,elephant,?:S2],=>,[exists,[?:O1],[and,[isa,trunk,?:O1],
                                                           [rel2,in,?:O1,?:S2,[$ctxt,Pres,1]]]]]]

which is normalized to

["or",
    ["rel2","in",["cs1","?:S2"],"?:S2",["$ctxt","Pres",1]],
    ["-isa","elephant","?:S2"],
    ["$block",["$","elephant",1],["$not",["rel2","in",["cs1","?:S2"],"?:S2",["$ctxt","Pres",1]]]]],


---------

        "subjectto"   what happens to it (can include events)
        
------------

Have not thought about it: needs work asap.

------------

        "location"   where is it normally found
        
------------

For actual location the
schema is "rel2" in combination with in  "in","on","at","near","above","under":
["rel2",in_on_at_etc,object_in_location,object_where_is_located,context]},

However, for typical location we should think a bit more, see below.

Example:

"John is in a room."

[and
   [isa,room,the_c2_room]
   [rel2,in,c1_John,the_c2_room,[$ctxt,Pres,1]]]

which is normalized to

["isa","room","the_c2_room"]
["rel2","in","c1_John","the_c2_room",["$ctxt","Pres",1]]


NB! I propose the typical generic location to be represented like this with a low probability and blocker attached:

[forall,[?:S1,?:Ctxt],[[isa,dog,?:S1],=>,[exists,[?:O2],[rel2,in,?:S1,?O1,?:Ctxt]]]]

This latter thing is currently not properly implemented in the parser.
Alternative ideas are also welcome.

-----------------
        
These need thought, no clear ideas yet:        
        
meta stuff.
mostly clear how these can connect events  (X, Y)
mostly unclear how to combine e.g. causes and property
        "causes"     causes X
        "prevents"     Y prevents doing X
        "dependency"    X requires Y
        "usedfor"      subject is used for X
        "createdby"     subject is created by X
        "madeof"        subject is made of object (substance)
        "have_goal"    subject wants to do X / X to happen
        
------------


		"time"         X happens at time
    

-------------

Time is represented (a) in a context, (b) like location above, with words  "in","at","on","during","before","after",
plus the "$time" constructor:
["rel2",in_at_etc,event,time_object,context]
where the time constructed element is used as a special typed variable:
[$time,type_of_time_indicator,time_indicator]
where the "type_of_time_indicator" can be "$generic".

Example:

"On Monday, John jumped in a house."

[exists,[?:A1],[and,[exists,[[$time,$generic,Monday]],
                       [rel2,on,?:A1,[$time,$generic,Monday],[$conf,1,False],[$ctxt,Past,1]]],
                    [exists,[?:O6],[and,[isa,house,?:O6,[$conf,1,False]],
                                               [rel2,in,?:A1,?:O6,[$conf,1,False],[$ctxt,Past,1]]]],
                    [act1,jump,c1_John,[$conf,1,True],?:A1,[$ctxt,Past,1]]]]

which is normalized as

[rel2,on,cs2,[$time,$generic,Monday],[$ctxt,Past,1]],
[isa,house,cs3],
[rel2,in,cs2,cs3,[$ctxt,Past,1]],
[act1,jump,c1_John,cs2,[$ctxt,Past,1]],

---------------

event roles
        "event_type"     stab
        "event_actor"    senators
        "event_theme"     Caesar
        "event_method"    brutally
        "event_instrument"   knife
        "event_type_modifier"  if type is go: go IN, go OUT, ...
        
--------------

These may need more thought, but for now we have:

* type,actor,theme are given as act1/act2 arguments, see above
* method and instrument are properties of the action id, indicated with "rel2" 
  in combination with the actual word like "with": what does the "with" mean,
  needs additional reasoning rules or procedural derivation of new facts.

Example:

"Senators stabbed Caesar with a knife in curia"

is normalized as

[isa,senator,the_c2_senator]},
[isa,knife,the_c3_knife]},
[isa,curia,the_c4_curia]},
[rel2,in,the_c3_knife,the_c4_curia,[$ctxt,Pres,1]]},
[isa,knife,cs6]},
[rel2,with,cs5,cs6,[$ctxt,Past,1]]},
[act2,stab,the_c2_senator,c1_Caesar,cs5,[$ctxt,Past,1]]},

Observe that the fact that there were several senators should be given,
but currently is not done for that example.




-----------

These need further thinking:

event meta
		"event_parallel"    X and Y are simultaneous
		"event_after"		Y happens after X
		"event_content"		Y is subevent of X (may be broken, other mixed use in db)
special use
		"similar"	semantic similarity

---------

I am attaching the current small ruleset I am using while debugging the parser:
it is intentionally small.

very high level commonsense rules
	transitivity of "be"
	symmetry of "similar"
	inference using taxonomy of object (can leap |- can jump)

--------