Brainmaker

Nanos gigantium humeris insidentes!

两个Final Projects的方案

  • July 11, 2010 4:02 pm
  1. Equivalence between Reductive English and FOPC
  • Labeled as: Eq RE & FOPC
  • Description: Reduce the expressive power of natural language to the level that FOPC is equivalent.
  • Related knowledge:
    • Theory of FOPC / logic
    • Psychology
    • Language
    • Turing machine
  • Problems
    • 不确定这个思路是不是正确的
    • 涉及到的句型数量可能很多
    • 难以证明
  • 方案
    • 弄清FOPC表现能力不足的地方 By July 11
      • 看FOPC不足的论文,并从NLP来分析
    • Reduce Natural English to the same level
  • 一些想法
    • 我之前得出的结论并不是reductive english 和 FOL的等价性,而是 利用reductive english 来解决semantic role 歧义问题
  1. Semi-supervised Learning on NLP
  • Labeled as: ML on NLP
  • Description: Improvement or Implementation of Semi-supervised Semantic Role Labeling Using the Latent Words Language Model
  • Related Knowledge
    • Machine learning
    • NLP
  • 方向
    • Apply semi-supervised learning to certain corpus, and compare the result– basically similar to what we did in ML course
    • Improve others’ supervised learning to semi-supervised
      • 2007-Semi-Supervised Learning for Semantic Parsing using support vector machine.pdf
  • 方案:
    • 阅读2005-2010的Survey By Jul 11
      • 寻找improvement or implemantation,确定方向

Two project topics

  • July 11, 2010 3:49 pm
  1. Equivalence between Reductive English and FOPC
  • Labeled as: Eq RE & FOPC
  • Description: From the aspect of logic or math, prove that the FOL has the same expressive power of the converter I design
  • Related knowledge:
    • Theory of FOPC / logic
    • Psychology
    • Language
    • Turing machine
  • Pros and Cons
    • It’s a theoretical problem
    • It’s a good novel aspect
    • It might be too difficult
    • It might be even not correct
  • Evaluation
    • Academic Value: ▲▲▲▲△
    • Feasibility:▲▲△△△
    • Difficulty: ▲▲▲△△
  1. Semi-supervised Learning on NLP
  • Labeled as: ML on NLP
  • Description: Improvement or Implementation of Semi-supervised Semantic Role Labeling Using the Latent Words Language Model
  • Related Knowledge
    • Machine learning
    • NLP
  • Pros and cons
    • It’s popular research field and match lots of researcher.
    • The potential professors options are greater.
    • I don’t have enough background knowledge for semi-supervised Learning.
  • Evaluation
    • Academic value: ▲▲▲△△
    • Feasibility: ▲▲▲▲△
    • Difficulty:▲▲▲▲△
  1. Probabilistic disambiguation:
  • Labeled as: PD
  • Description: P1, the context-free probability of a meaning to a word W; if given probability P2 to word W2. Then the joint probability is P(E1|E2)
  • Related knowledge
    • Probability
    • Syntax and Grammar
  • Pros and Cons
    • It’s a theoretical problem
    • It looks like obvious works
    • There is some work.
  • Evaluation
    • Academic value: ▲▲▲△△
    • Feasibility: ▲▲▲▲△
    • Difficulty:▲▲▲△△

Comparision: Project 2 vs Project 3 Projct3 is vague and would be part of Project 2, thus ruling out Project 3.

Prolog for natural language processing

  • July 11, 2010 2:29 pm

Chapter 3: The Linguistic Background

3.6 Functional Grammar and unification

This section is a good introduction to the functional structures research, and the last paragraph serves as a brief survey on this field.

semisupervised learning on Semantic role labelling

  • July 10, 2010 2:40 pm

Semi-supervised Semantic Role Labeling Using the Latent Words Language Model

Brief Summary

  • July 10, 2010 1:05 pm

Paper to read
Toward the expressive power of natural language  in  Principles of Semantic Networks
Topic on expressive power
Automatic Labeling of Semantic Roles
A good paper on semantic labeling
一阶逻辑的不足
Mentioned of FOL being not able to express some mathematical problem
Semantic Structure — Jackendoff’s conceptual semantics
It’s a deep research and kinda complete.
Graph-based knowledge representation [electronic resource] : computational foundations of conceptual
Online, 2009, new idea
Knowledge Representation and Predicate Calculus
Let’s say it’s a simple survey on NLP
Natural Language, Knowledge Representation, and Logical Form
has a discussion on expressive power

Idea:
Simple English -> FOC base
English expressive power reduction -> FOL
Statistic NLP:一个子问题,由当前概率确定;如果联系上下文,则为在给定事件下,当前事件的概率
prolog reasons with every possible rule, but here, we attach probability to each rule, when reasoning, we apply with probability.Aquist, Hoepelman, and Rohrer 1980
Simple English -> Common Logic Controlled English
Statistics on Semantic role parser
Apply relation checking and statistics in semantic role parsing for disambugulity
conceptual semantics by jackendoff

Summary One:
The books are too old to give you a new idea, you should use books as foundation and read the magazine

Some remainder

  • July 9, 2010 10:53 am

About Probabilistic and Statistic

  • Principles of knowledge representation / edited by Gerhard Brewka

About the expressive power equivalence

About the implementation of the FOL base

Other project

  • UPenn Tree bank
  • stanford Part-of-speech tagger
  • Raphael SIR system

Probabilistic

  • July 9, 2010 10:25 am

Probabilistic linguistics / edited by Rens Bod, Jennifer Hay, and Stefanie Jannedy   P128.P73 P76 2003
Principles of knowledge representation / edited by Gerhard Brewka BD161 .B74 1996
Probabilistic Foundations of Reasoning with Conditionals / Judea Pearl, Moises Goldszmidt
Introduction to information retrieval / Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze QA76.9.T48 M26 2008  2008

Statistical language learning / Eugene Charniak   P98.5.S83 C47 1993

Statistical language models for information retrieval [electronic resource] / ChengXiang Zhai c2008 online

概率

  • July 9, 2010 2:29 am

既然很多NLU问题都是imperical, 为什么不直接表达为概率呢

  • 一个子问题,由当前概率确定
  • 如果联系上下文,则为在给定事件下,当前事件的概率

借概率与自然语言相关的书来看后,先静一天,把先前的思路重新整理

combination of statistics and predicate calculus

prolog reasons with every possible rule, but here, we attach probability to each rule, when reasoning, we apply with probability.Aquist, Hoepelman, and Rohrer 1980

July 09 paper to read

  • July 9, 2010 1:22 am

paper 1:

Natural Language, Knowledge Representation, and Logical Form
has a discussion on expressive power

paper 2

SIR: A COMPUTER PROGRAM FOR SEMANTIC INFORMATION RETRIEVAL

1968年,拉菲尔(B.Raphael)在美国麻省理工学院用LI SP语言建立了SIR系统,针对英语提出了24个匹配模式,把输入的英语句子与这些模式相匹配,从而识别输入句子的结构,在从存贮知识的数据库到回答问题的过程中,可以处理人们对话中常用的一些概念,如集合的包含关系、空间关系等等,并可进行简单逻辑推理,机器并能在对话中进行学习,记住已学过的知识,从事一些初步的智能活动。

Knowledge Representation and Predicate Calculus

  • July 8, 2010 10:17 pm

http://herselfsai.com/2007/02/knowledge-representation-and-predicate-calculus.html

One of the trickier parts of designing artificial intelligence is how to represent information to something that can not understand language. Knowledge representation is defined by its syntax and its facts (semantics) which together along with the rules for deducing new sentences make up the logic. Entailment is the relationship between current sentences and new sentences derived from the current sentences. A good knowledge representation language will combine the ease of spoken and written languages, like English, with the conciseness of a computer language, like C.

Propositional (boolean) Logic This consists of a syntax, the semantics and rules for deducing new sentences. Symbols are used to represent facts which may or may not be true. W may 72 represent the fact it is windy outside. The symbols are combined with boolean logic to generate new sentences. Facts can be true or false only. The program may know a statement is true or false or be uncertain as to its truth or falsity. The syntax consists of , v, => (if x then y), (equal), and (not). Backus-Naur Form (bnf) is the grammar used to put sentences together. The semantics are just the interpretation of a statement about a proposition. Truth tables are used to define the semantics of sentences or test validity of statements.

Truth table for AND

T and T = T

T and F = F

F and F = F

F and T = F

A model is a world in which a sentence is true under a particular interpreta tion. There can be several models at once that have the same interpretations. An inference rule is a rule that is derived from a pattern that is true all the time for specific cases.

First Order Logic (first-order predicate calculus)
This consists of objects, predicates on objects, connectives and quantifiers. Predicates are the relations between objects, or properties of the objects. Connectives and quantifiers allow for universal sentences. Relations between objects can be true or false as well as the objects themselves. The program may not know whether something is true or false or give it a probability of truth or falseness.

Procedural Representation
This method of knowledge representation encodes facts along with the sequence of operations for manipulation and processing of the facts. This is what expert systems are based on. Knowledge engineers question experts in a given field and code this information into a computer program. It works best when experts follow set procedures for problem solving, i.e. a doctor making a diagnoses.

The most popular of the Procedural Representations is the Declarative. In the Declarative Representation the user states facts, rules, and relationships. These represent pure knowledge. This is processed with hard coded procedures.

Relational Representation
Collections of knowledge are stored in table form. This is the method used for most commercial databases, Relational Databases. The information is manipulated with relational calculus using a language like SQL. This is a flexible way to store information but not good for storing complex relationships. Problems arise when more than one subject area is attempted. A new knowledge base from scratch has to be built for each area of expertise.

Hierarchical Representation
This is based on inherited knowledge and the relationships and shared attributes between objects. This good for abstracting or granulating knowledge. Java and C++ are based on this.

Semantic Net
A data graph structure is used and concrete and abstract knowledge is represented about a class of problems. Each one is designed to handle a specific problem. The nodes are the concepts features or processes. The edges are the relationships (is a, has a, begins, ends, duration, etc). The edges are bidirectional, backward edges are called ‘Hyper Types’ or ‘Back Links’. This allows backward and forward walking through the net. The reasoning part of the nets includes: expert systems; blackboard architecture; and a semantic net description of the problem. These are used for natural language parsing and databases. Predicate Logic and propositional Most of the logic done with AI is predicate logic. It used to represent objects, functions and relationships. Predicate logic allows representation of complex facts about things and the world. (If A then B). A ‘knowledge base’ is a set of facts about the world called ’sentences’. These are put in a form of ‘knowledge representation language’. The program will ‘ASK’ to get information from the knowledge base and ‘TELL’ to put information into the knowledge base. Using objects, relations between them, and their attributes almost all knowledge can be represented. It does not do well deriving new knowledge. The knowledge representation must take perceptions and turn them into sentences for the program to be able to use them, and it must take queries and put them into a form the program can understand.

Resolution and unification
Resolution: prove A true by proving A Not is false. Unification: take two predicate logic sentences and using substitutions make them the same. Unification is the single operation that can be done on data structures (expression trees) in Prolog. These are the techniques used to process predicate logic knowledge and the are the basis for Lisp and Prolog.

Frames
Each frame has a name and a set of attribute-value pairs called slots. The frame is a node in a semantic network. Hybrid frame systems are meant to over come serious limitations in current setups. They work much like an object oriented language. A frame contains an object, its attributes, relationships and its inherited attributes. This is much like Java classes. We have a main class and sub classes that have attributes, relationships, and methods for use.

A logic has a language, inference rules, and semantics. Two logical languages are propositional calculus and predicate calculus.

Propositional Calculus which is a descendant of boolean algebra is a language that can express constraints among objects, values of objects, and inferences about objects and values of objects.

The elements of propositonal calculus are:
Atoms: the smallest elements
Connectives: or, and, implies, not
Sentences: aka ‘well-formed formula’s, wffs

The legal wwfs
disjunction: or
conjunction: and
implication: implies
negation: not

Rules of inference are used to produce other wwfs
modus ponens: ( x AND ( x implies y) ) implies y
AND introduction: x, y implies ( x AND y )
AND commutativity: x AND y implies x
OR introduction: x, y implies ( x OR y )
NOT elimination: NOT ( NOT x ) implies x
resolution: combining rules of inference into one rule:
{ example: (x OR y) AND ( NOT y OR z ) = x OR z
Horn clauses: a clause having one TRUE literal, there are three types: a single atom (q); an implication or rule ( p AND q = r ); a set of negative literals ( p OR q => ) . These have linear time algorithms.

Definitions:
Semantics:
associations of elements of a language with the elements of the domain

Propositions:
a statement about an atom, example: The car is running. Car is the atom, ‘is running’ is the proposition.

interpretation:
is the association of the proposition with the atom

denotation:
in a given interpretation the proposition associated with the atom is the denotation

value:
TRUE, FALSE, given to an atom

knowledge base:
a collection of propositional calculus statements that are true in the domain

truth table:
a tablular format for representing states

satisfies:
a true statement under a given interpretation

model:
an interpretation that satisfies each statement in a set of statements.

validity:
a statement that is TRUE under all interpretations

equivalence:
statements are equivalent if their truth values are identical under all interpretations.

Examples:
DeMorgan’s Laws:
NOT (x OR y ) = ( NOT x ) AND ( NOT y )
NOT ( x AND y ) = ( Not x ) OR ( NOT y )

Contrapositive:
( x implies y ) = ( NOT x implies NOT y )
if ( x = y ) then ( x implies y ) = ( y implies x )

Propositional satisfiability, aka PSAT
a model formula that comprises the conjunction of all the statements in the set.

Predicate Calculus takes propositional calculus further by allowing statements about propositions as well as about objects. This is first order predicate calculus.
Contains:
object constants, term: strings of characters, xyz, linda, paris
relation constants: divided by, distance to/from, larger than
function constants: small, big, blue
functional expression: examples: distance(here, there); xyz;
worlds: can have infinite objects, functions on objects, relations over objects
interpretations: maps object constants into objects in the world
quantifiers: can be universal or for a selected object or group of objects

Predicate Calculus is used to express mathematical theories. It consists of sentences, inference rules and symbols. First-order predicate calculus symbols consist of variables about which a statement can be made, logic symbols (and, or, not, for all, there exists, implies) and punctuation ( ‘(‘, ‘)’ ).

If we have a set S in which all of the statements are true then S is a model. If S implies U then U is true for all models of S and NOT U is false for all models of S. If we make a set S’ which has all of the statements of S and the statement NOT U it is not a model. All statements in a model must be true. S’ is unsatisfiable since there is no way for the statements of S and the statement NOT U, both of which are in S’ to be true at the same time. This is used to prove formulas in theorem proving. To show S implies U is is sufficient to show S’= S, NOT U is unsatisfiable.

Resolution and unification

Resolution: prove A true by proving A Not is false.

Unification: take two predicate logic sentences and using substitutions make them the same. Unification is the single operation that can be done on data structures (expression trees) in Prolog. These are the techniques used to process predicate logic knowledge and the are the basis for Lisp and Prolog.

Resolution is one way to prove unsatisfiability.

More information:
First Order Predicate Calculus
Syntax of Predicate Calculus (pdf)
Knowledge Representation and Predicate Calculus (pdf)
Predicate Calculus (pdf)