Nanos gigantium humeris insidentes!
You are currently browsing the Projects category

KiwiPredator Architecture

  • July 16, 2013 12:05 am

Feel free to check out my repo at





Parser is the component that converts sentence into a parser tree. Which is done by Stanford Parser. 


Extractor is the component that extract subject, predict and object from a sentence. It contains two main parts — a markup language called tree-based regular expression language (TREL), its interpreter and a tregex engine which perform regular expression on a parser tree.

TREL has two characteristics.

  • Markup Language
  • Inheritance

There are three main types of ERML. Splitting, Pruning and Replacing.

  • Splitting: Mainly used for splitting one tree into several subtrees.
  • Pruning: Mainly used for pruning uninterested parts.
  • Replacing: Mainly used for replacing pronoun, tense and number type.


Evaluator reviews the hypothesis and either takes it as a new item of knowledge or discard it. In either case, it will send feedback to the TREL. 


  • February 19, 2013 8:37 pm

I am switching to the tree-based regular expression.

The basic idea is using Tregex to prune the parser tree to get the essential of a sentence.

Watson's architecture

  • June 23, 2011 3:09 am

The high-level architecture of IBM's DeepQA used in Watson


  • October 12, 2010 1:01 am


Nguyen Bach & Sameer Badaskar
Language Technologies Institute
Carnegie Mellon University

Original Source:

Supervised Relation Extraction

  • Formulate relation extraction as a supervised classification task.
  • Focused on feature-based and kernel methods


  • Structuring the information on the web
  • Involves annotating the unstructured text with
    • Entities
    • Relations between entities
  • Extracting semantic relations between entities in text


  • Example 1: “Bill Gatesworks at Microsoft Inc.”
    • Person-Affiliation(Bill Gates, Microsoft Inc)
  • Example 2:Located-In(CMU, Pittsburgh)
  • Higher order relations
    • Protein-Organism-Location
  • Entity tuple: entities are bound in a relation
    • r(e1,e2,…,en)


  • Question Answering: Ravichandran & Hovy (2002)
    • Extracting entities and relational patterns for answering factoid questions (Example: “When was Gandhi born ?” amounts to looking for Born-In(Gandhi, ??) in the relational database)
  • Mining bio-medical texts
    • Protein binding relations useful for drug discovery
    • Detection of cancerous genes (“Gene X with mutation Y leads to malignancy Z”)


  • Datasets
    • Automatic Content Extraction (ACE)
    • Message Understanding Conference (MUC-7)
  • Supervised Approaches
    • Relation extraction as a classification task.
    • Precision, Recall and F1
  • Semi-supervised Approaches
    • Bootstrapping based approaches result in the discovery of large number of patterns and relations.
    • Approximate value of precision computed by drawing a random sample and manually checking for actual relations


  • Supervised approaches
    • Feature based
    • Kernel based
    • Concerns
  • Semi-supervised approaches
    • Bootstrapping
    • DIPRE, Snowball, KnowItAll, TextRunner
  • Higher-order relation extraction

Supervised Approaches (1)

  • Formulate the problem as a classification problem (in a discriminative framework)
  • Given a set of +ve and –ve training examples
  • Sentence : S = w1w2…e1…wi…e2…wn-1wn

fR(T(S)) ={ +1 ,  if e1 and e2 are related by R
-1  , otherwise

Supervised Approaches (2)

  • fR(.) can be a discriminative classifier
    • SVM, Voted Perceptron, Log-linear model …
    • Can also be a multiclass classifier!
  • T(S) can be
    • A set of features extracted from the sentence
    • A structured representation of the sentence (labeled sequence, parse trees)

Supervised Approaches (3)

  • Features
    • Define the feature set
    • Similarity metrics like cosine distance can be used
  • Structured Representations
    • Need to define the similarity metric (Kernel)
    • Kernel similarity is integral to classifiers like SVMs.

Supervised Approaches(4)


  • Khambhatla (2004), Zhou et. al. (2005)
  • Given a sentence
    • 1.Perform Textual Analysis (POS, Parsing, NER)
    • 2.Extract
      • Words between and including entities
      • Types of entities (person, location, etc)
      • Number of entities between the two entities, whether both entities belong to same chunk
      • # words separating the two entities
      • Path between the two entities in a parse tree
  • Textual Analysis involves POS tagging, dependency parsing etc.
  • What forms a good set of features ?
  • Choice of features guided by intuition and experiments.
  • Alternative is to use the structural representations and define an appropriate similarity metricfor the classifier!


  • Kernel K(x,y) defines similarity between objects x and y implicitly in a higher dimensional space
  • (x,y) can be Strings: similarity number of common substrings(or subsequences) between x and y
    • Example: sim(cat,cant) > sim(cat,contact)
    • Excellent introduction to string kernels in Lodhi et. al. (2002)
  • Extend string kernels to word sequences and parse trees for relation extraction

Kernels (Word Subsequences)

Word contextaround entities can be indicator of a relation -Bunescu& Mooney (2005a)

Each word is augmented with its POS, Generalized POS, chunk tag (NP, VP, etc), entity type (Person, Organization, none)

Kernels (Trees)

  • Tree kernels differ over types of trees used and attributes of nodes
  • Zelenko et. al. (2003)
    • Use shallow parse trees. Each node contains
      • Entity-Role (Person, Organization, Location, None)
      • Text it subsumes
      • Chunk tag (NP, VP etc)
    • Tasks: organization-location, person-affiliation detection
    • Tree kernel with SVM improves over feature based SVM for both tasks (F1 7% and 3% respectively)
  • Culotta & Sorensen (2004)
    • Use dependency trees. Node attributes are
      • Word, POS, Generalized POS, Chunk tag, Entity type, Entity-level, Relation argument
  • These tree kernels are rigid –attributes of nodes must match exactly!


  • Bunescu & Mooney (2005b)
    • Sufficient to use only the shortest path between entitiesin a dependency tree.
    • Each word in shortest path augmented with POS, Generalized POS, Entity type etc…
    • Structure of the dependency path is also encoded
    • Performs the best among all kernels

Kernels Vs Features

Supervised Approaches (Concerns)

  • Perform well but difficult to extend to new relation-types for want of labeled data
  • Difficult to extend to higher order relations
  • Textual analysis like POS tagging, shallow parsing, dependency parsing is a pre-requisite. This stage is prone to errors.

Rationales in Relation Extraction


  • Supervised approaches
    • Feature-based and kernel methods
  • Semi-supervised approaches
    • Bootstrapping
  • Higher-order relation extraction
  • Applications
    • Question-Answering
    • Mining biomedical text
  • Evaluation

A Shortest Path Dependency Kernel for Relation Extraction

  • August 23, 2010 8:53 pm

1. Introduction

Information Extraction (IE), which is traditionally divided into three sub-problems:  co-reference resolution, named entity recognition and relation extraction.

In this paper we focus exclusively on extracting relations between predefined types of entities in the ACE corpus.

the use of automatically derived syntactic information can lead to significant improvements in extraction accuracy.

The amount of syntactic knowledge used in IE systems varies from part-of-speech only (Ray and Craven,2001) to chunking (Ray and Craven, 2001) to shallow parse trees(Zelenko et al., 2003) to dependency trees derived from full parse trees ( Culotta and Sorensen, 2004).

The performance however depends not only on the details of the exact models using this information, but also on the details of the exact models using this information.

2. Dependency Graphs

Let e1 and e2 be two entities mentioned in the same sentence such that they are observed to be in a relationship R, i.e R(e1,e2) = 1.

We assume that a relation is to be extracted only between entities mentioned in the same sentence, and that the presence or absence of a relation is independent of the text preceding or following the sentence.

A subset of these word-word dependencies capture the predicate-argument relations present in the sentence. Arguments  are connected to their target predicates either directly through an arc pointing to the predicate, or indirectly through a preposition or infinitive particle.

Word-word dependencies are typically categorized in two classes as follows:

  • Local Dependencies
  • Non-local Dependencies

A context free Grammar parser can be used to extract local dependencies, which for each sentence form a dependency tree. Mildly context sensitive formalisms such as combinatory Categorial Grammar(CCG)(Steedman, 2000) model word-word dependencies more directly and can be used to extract both local and long-distance dependencies, giving rise to a directed acyclic graph.

3 The Shortest Path Hypothesis

contribution of the sentence dependency graph to establishing the relationship R(e1,e2) is almost exclusively concentrated in the shortest path between e1 and e2 in the undirected version of the dependency graph.

If entities e1 and e2 are arguments of the same predicate, then the shortest path between them will pass through the predicate, which may be connected directly to the two entities, or indirectly through prepositions. If e1 and e2 belong to different predicate-argument structures that share a common argument, then the shortest path will pass through this argument.

for the rest, check another paper from the same authors

Note of Discovering Relations among named Entities from Large Corpora

  • August 23, 2010 3:55 pm

Paper:  Discovering Relations among named Entities from Large Corpora

1. Introduction

Our method does not need the richly annotated corpora required for supervised learning — corpora which take great time and effort to prepare. It also does not need any instances of relations as initial seeds for weakly supervised learning.

Instead, we only need a named entity (NE) tagger to focus on the named entities which should be the arguments of relations. Recently developed named entity taggers work quite well and are able to extract named entities from text at a practically useful level.

3. Relation Discovery

3.2 Named entity tagging

Sekine proposed 150 types of named entities (Sekine et al., 2002). We use an extended named entity tagger (Sekine, 2001) in order to detect useful relations between extended named entities.

3.3 NE pairs and context

We define the co-occurrence of NE pairs as follows: two named entities are considered to co-occur if they appear within the same sentence and are separated by at most N intervening words. So we have set a frequency threshold to remove those pairs.

3.4 Context similarity among NE pairs

We adopt a vector space model and cosine similarity in order to calculate the similarities between the set of contexts of NE pairs.

The cosine similarity cosine(θ) between context vectors α and β is calculated by the following formula.

 cosine(\theta) = \frac{\alpha \cdot \beta }{|\alpha||\beta|}

Cosine similarity varies from 1 to -1. A cosine similarity of 1 would mean these NE pairs have exactly the same context words with the NEs appearing predominantly in the same order, and cosine similarity of -1 would mean these NE pairs have exactly the same context words with the NEs appearing predominately in reverse order.

3.5 Clustering NE pairs

We make clusters of NE pairs based on the similarity.

3.6 Labeling clusters

We simply count the frequency of the common words in all combinations of the NE pairs in the same cluster.

Note of An Integrated Approach for Relation Extraction

  • August 22, 2010 7:36 pm

From : An Integrated Approach for relation Extraction from wikipedia texts


two kinds of patterns:

  1. dependency patterns from dependency analysis of texts in Wikipedia;
  2. surface patterns generated from high redundant information from the web corpus.

1. Introduction

Currently the leading methods in unsupervised information extraction are based on collecting redundancy information from a local corpus or use the Web as corpus.

One of the main challenges and research interest for frequent pattern mining is how to abstract away from different surface realizations of semantic relations to discover discriminative patterns efficiently.

The common process is to generate linguistic patterns based on analysis of the syntactic, dependency or shallow semantic structure of text, then train to identify entity pairs which assume a relationship and classify them into pre-defined relationships. The advantage of these methods is using linguistic technologies to learn semantic information from different surface expressions.

2. Related Work

Discovering relations by clustering pairs of co-occurring entities represented as vectors of context features was introduced by (Hasgawa, et al., 2004)

The field of Unsupervised Relation Identification (URI)- the task of automatically discovering intersting relations between entities in a large text corpora is introduced by Hasegawa, Sekine et al.(2006) Discovering relations by clustering pairs of co-occurring entities represented as vectors of context features.

Dependency patterns have the properties of being more accurate, less spam-prone and much less redundant than surface patterns from the web corpus.

4. Pattern combination approach for relation extraction

4.1 Overview of the approach

Our approach requires a set of wikipedia articles as input; for each article, outputs a list of concept pairs with relation labels.

There are four primary modules of our approach:

  • Text Pre-processor and Relation candidate Generation
    preprocess wikipedia articles to split text and filter sentences. For each article, it collects a set of concept pairs as relation candidates.
  • Web Context Collector
    collect context information from the web corpus. For each relation candidate, it generates a set of ranked relational terms and set of surface patterns.
  • Dependency Pattern Modeling
    generate dependency patterns for each relation candidates from corresponding sentences in Wikipedia articles.
  • Linear Clustering Algorithm
    cluster relation candidates based on their context. 2 sub-modules:

    • local clustering
    • global clustering

4.2 Text Preprocessor and relation candidate generation

  1. consider all hyper-linked concepts in the article as related concepts, which may share a semantic relationship with the entitled concept.
  2. we use a linguistic parser to split article text into sentences, and finally refine sentences which contain one of the references of entitled concept and one of the lined texts for the dependency pattern modeling module.

4.3 Web Context Collection

4.4 Dependency pattern modeling


  1. shortest path dependency kernels outperform dependency tree kernels by offering a very condensed representation of the information needed to assess their relationship (Bunescu and Mooney, 2005)
  2. embedded structures of the linguistic representation are important for obtaining a good coverage of the pattern acquisition (Culotta and Sorensen, 2004; Zhang et al., 2006)

We generate dependency patterns as sub-paths from the shortest dependency path for each concept pair. The process of inducing dependency patterns is in two steps:

  1. shortest dependency tree inducing. From the dependency tree structure, we firstly induce the shortest dependency path with the entitled concept and related concept.
  2. Dependency pattern generation. Then we use a frequent tree-mining algorithm(Zaki et al.2002) to generate dependency patterns from the shortest dependency path for relation clustering.

Note of Relation Extraction from wikipedia using subtree mining

  • August 8, 2010 10:56 am
  • Relation Extraction from Wikipedia Using Subtree mining Keywords that provide clues for each relation label are identified by a Keyword Extractor.
  • An Entity Classifier module will classify the entities into types to limit the available relations for entity pairs. The Relation Extractor will extract subtree feature from a pair of the principal entity and a mention of secondary entity.
  • It then incorporates the subtree feature together with entity type feature into a feature vector and classifies relations of the entity pairs using SVM-based classifiers.

Principal Entity Detector

We propose a simple but efficient technique to identify a set of referring expressions, denoted as F, which provides better results than those produced by the above coreference tools. We adopt(Morton 2000) to classify the expressions in F into three types (1) personal pronoun (2) proper noun (common nouns. Based on chunking information, the technique is as follows:

  1. Start with F = {}
  2. Select the first two chunks for F: the proper chunk of the article title and the first proper chunk in the first sentence of the article, if any. These are the fist two names  of the principal entity. If F is still empty, stop.
  3. For each remaining proper chunk p in the article, if p is derived from any expressions selected in (2), then F ← p. Proper chunk p1 is derived from proper chunk p2 if all its proper nouns appear in p2. These proper chunks are various identifiers of the principal entity.
  4. In the article, select c as the most frequent subjective pronouns, find c’ as its equivalent objective pronoun and add them to F
  5. for each chunk p with the pattern [DT N1 … Nk] where DT is a determiner and Ni‘s are common nouns, if p appears more frequently than all the selected pronouns in (4), the F← p

Entity Classifier

The entity type is very useful for relation extraction.

We first identify year, month and date by directly examining their surface text. Types of other entities, including principal entities and secondary entities, are identified by classifying their corresponding articles.

We develop SVM-based classifier for each remaining type using a one-against-all strategy.

We represent an article an article in the form of a feature  vector and use the following features:

  • category feature: categories collected when tracing from the article up to k levels of its category structure
  • pronoun feature: the most frequent subjective pronoun in the article
  • singular noun feature: singular nouns of the first sentence of the article

Keyword Extractor

Our hypothesis in this research is that there exist some keywords that provide clues to the relationshiop between a pair.

  1. we map entities in such relations to those in sentences to collect sample sentences for each relationship
  2. The Tf-idef model is exploited to measure the relevance of words to each relationship for those on the dependency path between the entity pair
  3. we choose the keywords manually from lists of candidates ranked by relevance score with respect to each relation.

Subtree Feature from the Dependency Path

read paper Bunescu Extracting relations from text: From word sequences to dependency paths.

Supervised Learning for Relation Extraction

We formulate our problem of relation classification into a multiclass and multi-label problem in which one SVM-based classifier is dedicated for a relation.

We represent each mention of a secondary entity in a sentence with respect to a relation r as a feature vector receiving values 0 and 1. Feature vectors are created from the type of principal entity, type of secondary entity, and the mined subtree of the sentence. The number of slots for a subtree feature depends on the relation. The principal entity might be absent in a sentence and its type is unchanged for all sentences in an article.

Morton T 2000. Coreference for nlp applications

Note Of Universal Networking Language-based Relation Extraction

  • August 6, 2010 4:01 pm

paper: A statistical Approach for Universal networking Language-based Relation Extraction

Abstract– With the assumption that the positions of phrases in a sentence between which there exists a relation have been identified, we focus on the problem of classifying the relation between the given phrases.

The UNL relation classifier was developed by using statistical techniques applied on several lexical and syntactic features. In addition to the common used features, we also propose a new feature that reflects the actual semantic relation of two phrases independent on words in the between.

1. Introduction

Universal Networking Language(UNL) was defined as an artificial language that is able to represent information and knowledge described in natural languages[13].

  • Universal Words (UWs): is UNL’s vocabulary.
  • Relations:defines relationships between pairs of UWs.
  • Attributes: describes the subjectivity of sentences including
    • time with respect to the speaker,
    • speaker’s view of aspect,
    • speaker’s view of reference,
    • speaker’s focus,
    • speaker’s attitudes and
    • speaker’s view point.

One of the important tasks to create UNL representations from natural language text is to extract relations between pairs of UWs. Relation extraction task can be divided into two subtasks:

  1. identifying pairs of UWs between which it is likely to have a relation
  2. identifying the relation label for the pairs.

In this paper, we focus on the second problem. That means, we develop a label classifier given pairs of UWs between which there exists a relation.

The problem: given an English text and a pair of phrases in the text between which there exists a relation, identify the type of relation for the pair.

In this work, we apply statistical techniques to train our classifier.

  1. a number of features of the two phrases will be extracted from the text to create some feature vectors.
  2. count the occurrences of the relations for each feature vector
  3. estimate the probability of each relation type given a feature vector

Dataset: Universal networking Digital Language foundation(UNDL)

3. Method

source phase and destination phrase

For each pair of source and destination phrases in a relation of the training set, we extracted linguistic features and created a feature vector.

  1. The number of occurrences of each relation label for each feature vector will be counted in the whole training set.
  2. In testing time, the probability for each relation label will be estimated given the feature vector associated with a pair of phrases.
  3. The relation label with highest probability will be chosen and assigned to the phrases.

A. Feature Selection

Phrase Type, Head word, voice, dependency path(by Minipar parser), syntactic cross path

detail for Syntactic Cross Path

The string representing a path from the source position through the syntactic parse tree to the destination position. procedures:

  • For source and destination phrases, specify the highest nodes which still receive the head words from the phrases. (H1 H2)
  • specify the lowest common node that covers both of the phrases. (C)
  • Assume that the source phrase and the destination phrase are separated, we have three cases
    • The node C is higher than the nodes H1 and H2, trace upwards from H1 to C, then trace downwards to H2
    • The node H1 is higher or equals to the node C: trace downwards from C to H2
    • The node H2 is higher or equals to the node C: trace upwards from H1 to C

The advantage of syntactic cross path feature: it reflect the syntactic structure of the two concepts represented by the two phrases that is independent on the words lying between the phrases.



Adopted a recommendation to this effect in 1964

Phrase type VBD(POS tag of “adopted”), NN(POS tag of “effect”)
Head word adopt, effect
Voice Active, Unspecified
Dependency path (V)obj v(N)mod v(Prep)pcomp-n v(N)
Syntactic cross path VP v NP v PP(to) NP


B Probability predication

we will count number of relations for each feature vector in the training data and estimate the probability for each relation given a feature vector in testing.

The conditional probability can be estimated as the following:

P(R|F) = \frac{\#(R,F)}{\#(F)}


  • R: relation label
  • F: feature vector
  • #(R,F): number of relations with label R counted for feature vector F in the training.
  • #(F): number of relations which receive F as a feature vector counted in the training.

To estimate the probability for such the cases, we reduce the restriction of the condition in the probability formula by dividing the general feature vector into smaller vectors. In other words, we reduce the dimension of feature vector, we estimate the conditional probability by using linear interpolation method as proposed in[1].


  • Input: a training set including full information of relations
  • Ouput: estimated conditional probabilities P1, …, P8
  1. For each training fragment S; source phrase and destination phrase; relation label R, do:
    1. Receive syntactic parse tree T from Charniak parser and dependency tree D from Minipar parser from the fragment S.
    2. Based on the trees T and D, extract the features for the two phrases: pt1, pt2, hw1, hw2, voice1, voice2,depPath, SynCrossPath
    3. Increase the following counters by 1:
      1. c1(R, pt1), c2(R, pt2), …, C8(R, pt1, pt2, hw1, hw2, voice1, voice2, depPath, synCrossPath)
      2. t1(pt1), t2(pt2), …,  t8(pt1, pt2, hw1, hw2, voice1,  voice2, depPath,  synCrossPath)
        where ci(R,Fi) = #(R,Fi) and ti(Fi)=#(Fi)
  2. calculate all the conditional probabilities
    P_1(R|pt_1)=\frac{c_1(R,pt_1)}{t_1(pt_1)}, ...,
    P_8(R|pt_1, pt_2, hw_1, hw_2, voice_1, voice_2, depPath, synCrossPath) = \frac{c_8(pt_1, pt_2, hw_1, hw_2, voice_1, voice_2, depPath, synCrossPath)}{t_8(pt_1, pt_2, hw_1, hw_2, voice_1, voice_2, depPath, synCrossPath)}


  • Input: a fragment S; source phrase and destination phrase; all the conditional probabilities P1, …, P2
  • Output: relation label R
  1. Receive syntactic parse tree T from Charniak parser and dependency D from Minipar parser for the fragment S.
  2. Based on the trees T and D, extract the features for the two phrases: pt1, pt2, hw1, hw2, voice1, voice2, depPath, synCrossPath
  3. Estimate probabilities for  all relation labels by using linear interpolation as listed above.
  4. Choose the relation label R for the two phrases:
    R = \arg\displaystyle\max_{r} P(r| the two phrases)

a novel idea on relation extraction

  • August 6, 2010 3:22 pm

treat the relation as a label and the use SVM to train