Teadmiste formaliseerimine 2019

Allikas: Lambda
teadmised

Name: Knowlege representation
Code: ITI8700
Lecturer: Tanel Tammet
Labs: Evelin Halling and Tanel Tammet
Contact: tanel.tammet@ttu.ee, 6203457, TTÜ ICT-426
Archives of previous years: 2015 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, older.

NB! This is an archive from 2019, not actual course contents

Exam results

Siin on lõpptulemus kolmanda eksami järel: Teadmiste formaliseerimine lõpptulemus 2019 koos praktikumipunktide ja hindega.

Milles seisnes vahepealne parandus: praktikumipunktide kokkuliitmisel oli kaks erinevat interpretatsiooni: kas kolmas praks on osa summast 0..40 või on ta puhas lisand esimese kahe praksi summale. Esialgne hinne tulenes esimesest interpretatsioonist, parandatud hinne teisest. Selle tulemusena mitmed hinded tõusid.


Viimane eksam toimub kolmapäeval, 12. juunil kell 12:00 IT majas ruumis ICT-A2.

Siin on esimese eksami tulemused (praktikumipunkte ei ole lisatud): Teadmiste formaliseerimine esimene eksam 2019

Kellel alla 30 punkti, palun eksam uuesti teha. Igaühel on võimalik teha eksamit kaks korda.

Exam

All exam times on Wednesdays at 12:00 (lecture timeslot):

  • 22 May 12:00 in the economics building SOC-211A
  • 29 May 12:00 in the economics building SOC-213
  • 12 June 12:00 in the IT building ICT-A2

Exam is in written form, no materials can be used. Exam will last 3 hours, but most likely you can finish in ca 1.5 hours.

Here are notes as What will be asked in the ITI8700 exam including materials to study.



Time, place, result

Lectures: Wednesdays 12:00-13:30 room U06A-229
Practical work: Wednesdays 14:00-15:30, room ICT-121, ICT-122
Practical work will give 40% and exam 60% of points underlying the final grade. The exam will consist of several small excercises.

The first practical work time on 30. January at 14:00 will be used as a secord conventional lecture of the day.

Some of the last lecture times at the end of the course may be used as additional seminars/labs.

Assumed background

You should have studied the course "Basics of AI and Machine Learning" or get acquinted with the logic and probabilities parts of the course yourself.

In particular, it is useful to read these course materials and exercises: logic AIMA book, wumpus world in AIMA book, uncertainty in AIMA book, probability models in AIMA book, prolog lab, bayes lab

Focus

The main focus of the course is on knowledge representation and reasoning (KR), on a spectrum from simple to very complex: representing and using knowledge in databases, sentences in natural language and commonsense knowledge.

The overall goal of the labwork to understand issues facing a task of building a natural language question answering system a la Watson and to see and experiment with existing parts of the solution.

The course contains the following blocks:

  • Background and basics. Representing and using simple facts and rules.
  • Knowledge conveyed by natural sentences: both statistical methods like word2vec and logic-based methods.
  • General-knowledge databases: wikidata, wordnet, yago, conceptnet, nell, cyc.
  • Reasoners and question answering systems.
  • Context, time, location, events, causality.
  • Different kinds and dimensions of uncertain knowledge.
  • Indexes and search.

Check out this description of the whole KR area.

Books to use

Observe that a noticeable part of the course contents are not covered by these books: use the course materials and links to papers, standards and tutorials provided.

Practical work

There are three labs. They are all steps in a single project: build a simple natural-language question-answering system.

The labs have to be presented to the course teachers and all students present at labwork time.

The labs can be prepared alone or by teams of two people. The first lab task will be given on 6. February.

First, read the explanation of the overall task. The following labs are steps on the path to the end goal.

NB! You have to register for the lab like this:

First, go to

https://ained.ttu.ee/

Search for: ITI8700

Log in with the "UNI ID".

Enroll on the course page found: Click "enrol me" button.

Second, go to http://gitlab.cs.ttu.ee and log in

Click on "Create a project"

Name of the project must be exactly that: iti8700-2019

Visibility level: private

Upload you code, presentation, examples etc you use for the presentation.



First lab

Deadline: 14. March (after this there will be a penalty).

If you manage to fullfill the task earlier, start doing the second lab.

Please read about the task for the first lab

Second lab

Deadline 15. May. You have to give a presentation latest at 15. May labtime (better at 8. May).

The task in the second lab is actually answering simple questions from the input text, using a small set of rules you write and a reasoner.

Please start by experimenting with a real prover: look and run Reasoner examples with gkc

We use the gkc prover for examples (instructions are in the Reasoner examples ... page above) but you could also try out an old classic prover Otter which actually works OK on windows even though it is really old.

Make sure to get the latest release of gkc from gkc releases (currently delta) and either download a binary or compile using instructions on the gkc github page. The basic instructions are also on the gkc github page. Also, see the few examples in the Examples folder.

You can use a trivial NLP-to-reasoner parser in python nlp.py as a starting point.

Have a look at the Lecture 12 materials.

Interesting large question-answering datasets and challenge problems: Stanford SQUAD, Allen institute arc2 challenge, Google natural questions, Fujitsu NLP challenge, amazon qa dataset


Third lab

Deadline 15. May. You have to give a presentation latest at 15. May labtime (better at 8. May).

The third lab is optional and will simply give as many points as lab 1 or 2 towards the final result and the grade: practical work 60% and exam 40%.

The task in the third lab is the open-world scenario of answering questions based on wikipedia and using large downloaded rule sets a la yago, wordnet etc with a reasoner.

It is a plus to be able to give uncertain answers (likely, unlikely) and handle fuzzy properties like boy, girl, grownup, rich etc.

Have a look at the Lecture 13 materials


Lecture block 1: basics and representing simple facts.

Lectures 1 and 2: Overview of the course. Background and basics: SQL, logic, NLP

Lecture materials:


Lecture 2: RDF and RDFS (and OWL)

Lecture materials:

Lecture block 2: capturing meaning in natural language

Lecture 3: Intro to homework and NLP

Lecture materials:

Lecture 4: vector representation of words

This lecture will be given by Priit Järv.

Lecture materials:

Useful additional materials from (roughly) easier to more complex:

Probabilistic models:

Also interesting to read:

Lecture block 3 and start of 4: large common-sense knowledge bases and reasoning

Lecture 5: First look into main large knowledge bases

We will have a look at the goals, main content and differences between:

Lecture 6: Big annotation systems and intro to rule reasoners

First, big annotation systems:

In connection, see also:

  • schema.org: property markup vocabulary suggested by Google, Microsoft and others.
  • json-ld: currently most popular rdf syntax (in json), also recommended by Google.

Second, intro to rule reasoners:

Additionally you may want to look at:

Lecture 7: rule reasoners

Please have a look and run the same experiments yourself with the gkc prover:

Reasoner examples with gkc

We use the gkc prover for examples (instructions are in the Reasoner examples ... page above) but you could also try out an old classic prover Otter which actually works OK on windows even though it is really old.


Lecture 8: starting to parse natural language for interaction with rule reasoners

We will start building a tiny question-answering tool, translating natural language to rule reasoner input.

Our first, very simple goal: translate the text "John is a father of Pete. Pete is a father of Mark. Who is the father of Pete?" to a rule reasoner input

father(john,pete).
father(pete,mark).
-father(X,pete) | ans(X).

then run the reasoner, fetch the ans(john) and give an answer "John is".

You can use a trivial NLP-to-reasoner parser in python nlp.py as a starting point.

We will then proceed to gradually make questions and sentences more complex and add rules to be able to answer them.

In the final part of the course we will start looking at uncertain reasoning: right now we will only try to do simple strict reasoning.

We will have a look at potential tools for semantic parsing, from simpler to more complex:

Other interesting things to look at:

Lecture block 5: context and uncertainty

Lecture 9: intro to representing context and uncertainty

Blocks world.

Some axiom sets:

And some planning problems to solve from these axiom sets:

Also:

Alternative axiomatization

Next, starting uncertainty.

We will first look at default logic.

Lecture 10: various classical ideas for discrete reasoning with uncertainty

We will look at some discrete ways:

And start with numeric:

Lecture 11: various classical ideas for numeric reasoning with uncertainty

Classical numeric ways to reason about uncertainty:

Some newer approaches:

Lecture 12: seminar-kind of work on the second lab

Background to start with:

  • Start with the example file p1.txt
father(john,pete).

% either the first or a second or both facts are true:
father(pete,mark).

% equivalent to (father(X,Y) & father(Y,Z)) => grandfather(X,Z).
-father(X,Y) | -father(Y,Z) | grandfather(X,Z).

-grandfather(john,X) | ans(X).
  • Run
    gkc p1.txt
  • Start with the trivial NLP-to-reasoner parser in python nlp.py .
  • Our initial goal is to process the text "John is a father of Andrew. Andrew is a man. Who is the father of Andrew?"


And the examples with various ways to encode normal/abnormal cases (not really needed for lab):

Syntax and command line:

  • gkc command line: gkc readme in github: see also Examples folder for a few trivial examples.
  • For simple clausal syntax see otter manual clause syntax.
  • For complex syntax options see TPTP manual: seems to be sometimes down.

Lecture 13: seminar-kind of work on the third lab

Ways to get external data in addition to our own rules and data:

  • Raw data (hard to use): wikipedia, wikidata
  • Processed raw data (a bit easier): dbpedia (processed wikipedia/wikidata)
  • Well-formed and connected options (use one of these):

See how to run gkc on the Wordnet TPTP version