Nanos gigantium humeris insidentes!


  • October 12, 2010 1:01 am


Nguyen Bach & Sameer Badaskar
Language Technologies Institute
Carnegie Mellon University

Original Source:

Supervised Relation Extraction

  • Formulate relation extraction as a supervised classification task.
  • Focused on feature-based and kernel methods


  • Structuring the information on the web
  • Involves annotating the unstructured text with
    • Entities
    • Relations between entities
  • Extracting semantic relations between entities in text


  • Example 1: “Bill Gatesworks at Microsoft Inc.”
    • Person-Affiliation(Bill Gates, Microsoft Inc)
  • Example 2:Located-In(CMU, Pittsburgh)
  • Higher order relations
    • Protein-Organism-Location
  • Entity tuple: entities are bound in a relation
    • r(e1,e2,…,en)


  • Question Answering: Ravichandran & Hovy (2002)
    • Extracting entities and relational patterns for answering factoid questions (Example: “When was Gandhi born ?” amounts to looking for Born-In(Gandhi, ??) in the relational database)
  • Mining bio-medical texts
    • Protein binding relations useful for drug discovery
    • Detection of cancerous genes (“Gene X with mutation Y leads to malignancy Z”)


  • Datasets
    • Automatic Content Extraction (ACE)
    • Message Understanding Conference (MUC-7)
  • Supervised Approaches
    • Relation extraction as a classification task.
    • Precision, Recall and F1
  • Semi-supervised Approaches
    • Bootstrapping based approaches result in the discovery of large number of patterns and relations.
    • Approximate value of precision computed by drawing a random sample and manually checking for actual relations


  • Supervised approaches
    • Feature based
    • Kernel based
    • Concerns
  • Semi-supervised approaches
    • Bootstrapping
    • DIPRE, Snowball, KnowItAll, TextRunner
  • Higher-order relation extraction

Supervised Approaches (1)

  • Formulate the problem as a classification problem (in a discriminative framework)
  • Given a set of +ve and –ve training examples
  • Sentence : S = w1w2…e1…wi…e2…wn-1wn

fR(T(S)) ={ +1 ,  if e1 and e2 are related by R
-1  , otherwise

Supervised Approaches (2)

  • fR(.) can be a discriminative classifier
    • SVM, Voted Perceptron, Log-linear model …
    • Can also be a multiclass classifier!
  • T(S) can be
    • A set of features extracted from the sentence
    • A structured representation of the sentence (labeled sequence, parse trees)

Supervised Approaches (3)

  • Features
    • Define the feature set
    • Similarity metrics like cosine distance can be used
  • Structured Representations
    • Need to define the similarity metric (Kernel)
    • Kernel similarity is integral to classifiers like SVMs.

Supervised Approaches(4)


  • Khambhatla (2004), Zhou et. al. (2005)
  • Given a sentence
    • 1.Perform Textual Analysis (POS, Parsing, NER)
    • 2.Extract
      • Words between and including entities
      • Types of entities (person, location, etc)
      • Number of entities between the two entities, whether both entities belong to same chunk
      • # words separating the two entities
      • Path between the two entities in a parse tree
  • Textual Analysis involves POS tagging, dependency parsing etc.
  • What forms a good set of features ?
  • Choice of features guided by intuition and experiments.
  • Alternative is to use the structural representations and define an appropriate similarity metricfor the classifier!


  • Kernel K(x,y) defines similarity between objects x and y implicitly in a higher dimensional space
  • (x,y) can be Strings: similarity number of common substrings(or subsequences) between x and y
    • Example: sim(cat,cant) > sim(cat,contact)
    • Excellent introduction to string kernels in Lodhi et. al. (2002)
  • Extend string kernels to word sequences and parse trees for relation extraction

Kernels (Word Subsequences)

Word contextaround entities can be indicator of a relation -Bunescu& Mooney (2005a)

Each word is augmented with its POS, Generalized POS, chunk tag (NP, VP, etc), entity type (Person, Organization, none)

Kernels (Trees)

  • Tree kernels differ over types of trees used and attributes of nodes
  • Zelenko et. al. (2003)
    • Use shallow parse trees. Each node contains
      • Entity-Role (Person, Organization, Location, None)
      • Text it subsumes
      • Chunk tag (NP, VP etc)
    • Tasks: organization-location, person-affiliation detection
    • Tree kernel with SVM improves over feature based SVM for both tasks (F1 7% and 3% respectively)
  • Culotta & Sorensen (2004)
    • Use dependency trees. Node attributes are
      • Word, POS, Generalized POS, Chunk tag, Entity type, Entity-level, Relation argument
  • These tree kernels are rigid –attributes of nodes must match exactly!


  • Bunescu & Mooney (2005b)
    • Sufficient to use only the shortest path between entitiesin a dependency tree.
    • Each word in shortest path augmented with POS, Generalized POS, Entity type etc…
    • Structure of the dependency path is also encoded
    • Performs the best among all kernels

Kernels Vs Features

Supervised Approaches (Concerns)

  • Perform well but difficult to extend to new relation-types for want of labeled data
  • Difficult to extend to higher order relations
  • Textual analysis like POS tagging, shallow parsing, dependency parsing is a pre-requisite. This stage is prone to errors.

Rationales in Relation Extraction


  • Supervised approaches
    • Feature-based and kernel methods
  • Semi-supervised approaches
    • Bootstrapping
  • Higher-order relation extraction
  • Applications
    • Question-Answering
    • Mining biomedical text
  • Evaluation
Print Friendly