Brainmaker

Nanos gigantium humeris insidentes!

Note of Discovering Relations among named Entities from Large Corpora

  • August 23, 2010 3:55 pm

Paper:  Discovering Relations among named Entities from Large Corpora

1. Introduction

Our method does not need the richly annotated corpora required for supervised learning — corpora which take great time and effort to prepare. It also does not need any instances of relations as initial seeds for weakly supervised learning.

Instead, we only need a named entity (NE) tagger to focus on the named entities which should be the arguments of relations. Recently developed named entity taggers work quite well and are able to extract named entities from text at a practically useful level.

3. Relation Discovery

3.2 Named entity tagging

Sekine proposed 150 types of named entities (Sekine et al., 2002). We use an extended named entity tagger (Sekine, 2001) in order to detect useful relations between extended named entities.

3.3 NE pairs and context

We define the co-occurrence of NE pairs as follows: two named entities are considered to co-occur if they appear within the same sentence and are separated by at most N intervening words. So we have set a frequency threshold to remove those pairs.

3.4 Context similarity among NE pairs

We adopt a vector space model and cosine similarity in order to calculate the similarities between the set of contexts of NE pairs.

The cosine similarity cosine(θ) between context vectors α and β is calculated by the following formula.

 cosine(\theta) = \frac{\alpha \cdot \beta }{|\alpha||\beta|}

Cosine similarity varies from 1 to -1. A cosine similarity of 1 would mean these NE pairs have exactly the same context words with the NEs appearing predominantly in the same order, and cosine similarity of -1 would mean these NE pairs have exactly the same context words with the NEs appearing predominately in reverse order.

3.5 Clustering NE pairs

We make clusters of NE pairs based on the similarity.

3.6 Labeling clusters

We simply count the frequency of the common words in all combinations of the NE pairs in the same cluster.

Print Friendly