Brainmaker

Nanos gigantium humeris insidentes!

Note of Lexical Filtering on Overall Opinion Polarity Identification

  • August 1, 2010 10:20 am

F. Salvetti, S.Lewis, C.Reichenbach.Impact of Lexical Filtering on Overall Opinion Polarity Identification

Flow

HTML documents were converted to plain text, tagged using the Brill tagger, and fed into filters and classifiers.

Basic Assumption or Points:

Related Research

Research has demonstrated that there is a strong positive correlation between the presence of adjectives in a sentence and the presence of opinion (Wiebe, Bruce, & O’ Hara 1999).

Hatzivassiloglou & McKeown 1997), combined a log-linear statistical model that examined the conjunctions between adjectives,(such as “and”, “but”, “or”), with a clustering algorithm that grouped the adjectives into two sets which were then labelled positive and negative.

Turney extracted n-grams based on adjectives( Turney 2002). In order to determine if an adjective had a positive /negative polarity he used AltaVista and its function NEAR. He combined the number of co-occurrences of the adjective under investigation NEAR the adjective ‘excellent’ and NEAR the ‘poor’, thinking that high occurrence NEAR ‘excellent’ implies positive polarity.

Corpus

The cornell data consists of 27,000 movie reviews in HTML form, using 35 different rating scales such as A…F or 1…10 in addition to the common 5 star system. We divided them into two classes (positive and negative) and took 100 reviews from each class as the test set.

Methodology

Features for analysis

Three basic approaches for handling this kind of data pre-processing come to mind:

  • Leave the data as-is : Each word will be represented by itself
  • Parts-of-speech tagging: Each word is enriched by a POS tag, as determined by a standard tagging technique (such as the Brill Tagger(Brill 1995))
  • Perform POS taggin and parser (Using e.g. the Penn Tree-bank (Marcus, Santorini, & marcinkiewicz 1994))—severe performance issues

We thus focus our analysis in this paper on POS-tagged data (sentences consisting of words enriched with information about their parts of speech).

We Thus make the following assumptions about our test and training data:

  1. All words are transformed into upper case,
  2. All words are stemmed,
  3. All words are transformed into (word,POS) tuples by POS tagging (notation word/ POS).

All of these are computationally easy to achieve ( with a reasonable amount of accuracy ) using the Brill Tagger.

Experiments

Setting

  • Data: cornell
  • Part-of-speech tagger: Brill tagger (Brill 1995)
  • wordnet: 1.7.13

Part of Speech Filters

Any portion that does not contribute to the OvOP is noise. To reduce noise, filters were developed that use POS tags to do the following.

  1. Introduce custom parts of speech when the tagger does not provide desired specificity (negation and copula)
  2. Remove the words that are least likely to contribute to the polarity of a review(determiner, preposition, etc)
  3. Reduce parts of speech that introduce unneccessary variance to POS only

The POS filters are not designed to reduce the effects of conflicting polarity. They are only designed to reduce the effect of lack of polarity.

One design principle of the filter rules is that they filter out parts of speech that do not contribute to the semantic orientation and keep the parts of speech that do contribute such meaning. Based on analysis of movie review texts, we devised “filter rules” that take Brill-tagged text as input and return less noisy, more concentrated sentences that have a combination of words and word/POS-tag pairs removed from the original. A summary of the filter rules defined in this experiment is shown in Table 2.

Table 2: Summary of POS filter rules
POS r1 r2 r3 r4 r5
JJ K K K K K
RB D K K K K
VBG K K K K D
VBN K K K K D
NN G G G G G
VBZ D D K K D
CC D D D K K
COP K K K K K

K: keep D: Drop G:Generalize

Wiebe et al., as well as other researchers, showed that subjectivity is especially concentrated in adjectives ( Wiebe, Bruce, & O’ Hara 1999; Turney & Littman 2003). Therefore, no adjectives or their tags were removed, nor were copula verbs or negative markers. However, noisy information such as determiners, foreign words, prepositions, modal verbs, possesives, particles, interjections, etc. were removed from the text stream. Other parts of speech, such as nouns and verbs, were removed but their POS-tags were retained.


WordNet filtering

generalization

===============Summary by me===============

There is a strong positive correlation between the presence of adjectives in a sentence and the presence of opinion. (paper to read)

Turney extracted n-grams based on adjectives( Turney 2002). In order to determine if an adjective had a positive /negative polarity he used AltaVista and its function NEAR. (paper to read)

___________Brill tagger POS filter Classifiers
plain text========>Result 1======>Result2=========>Result3.

About the POS filter

  1. Introduce custom parts of speech when the tagger does not provide desired specificity (negation and copula)
  2. Remove the words that are least likely to contribute to the polarity of a review(determiner, preposition, etc)
  3. Reduce parts of speech that introduce unneccessary variance to POS only

Reduplication of Nguyen’s RE 2007 — Related Work

  • July 31, 2010 2:42 pm

Comparison:

Extracting Using Lexical Information:

5. A. Culotta, A. McCallum, and J. Betz. Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text. In Proceedings of the HLT-NAACL-2006, 2006.
6. S. Brin. Extracting Patterns and Relations from the World Wide Web. In Proceedings of the 1998 International Workshop on the Web and Databases, pages 172-183, 1998.
7. E. Agichtein and L. Gravano. Snowball: Extracting Relations from Large Plain-Text Collections.In the 5th ACM International Conference on Digital Libraries (ACM DL), 2000.
8. D. Ravichandran and E. H. Hovy. Learning Surface Text Patterns for a Question Answering System. In Proceedings of the ACL-2002, pages 41-47, 2002.

Extracting Using Hard Matching of dependency Paths

9. R. C. Bunescu and R. J. Mooney. Extracting Relations from Text: From Word Sequences to Dependency Paths. In “Text Mining and Natural Language Processing”, Anne Kao Steve Poteet (eds.), forthcoming book, 2006


  • follow [5] to define the entity
  • three types of principal entities [10]


11. D. Lin. Dependency-Based Evaluation of Minipar. In Proceedings of the Workshop on the Evaluation of Parsing Systems, 1stInternational Conference on Language Resources and Evaluation, 1998.
12. J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. Mining Sequential Patterns by Pattern-Growth: The Pre¯xSpan Approach. IEEE Transactions on Knowledge and Data Engineering, 16(10), 2004.
13. M. Palmer, D. Gildea and P. Kingsbury. The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, 31(1), pages 71-106, 2005.
14. C. F. Baker, C. J. Fillmore, and J. B. Lowe. The Berkeley FrameNet Project. In Proceedings of the COLING/ACL-98, pages 86-90, 1998.
15. D. Gildea and D. Jurafsky. Automatic Labeling of Semantic Roles. Computational Linguistics, 28(3), pages 245-288, 2002.
16. P. Koomen, V. Punyakanok, D. Roth, and W. Yih. Generalized Inference with Multiple Semantic Role Labeling Systems. In Proceedings of the CoNLL, pages 181-184, 2005.

Un Yong Nahm. Text Mining with Information Extraction. Ph.D. Thesis. Department of Computer Sciences, University of Texas at Austin, 2004.
C. J. Fillmore and C. F. Baker. Frame Semantics for Text Understanding. In Proceedings of WordNet and Other Lexical Resources Workshop, NAACL, 2001.
C. Sutton and A. McCallum. An Introduction to Conditional Random Fields for Relational Learning. In Introduction to Statistical Relational Learning, Lise Getoor and Ben Taskar (eds.), MIT Press, 2006.

Reduplication of Nguyen’s RE 2007 — Tool

  • July 31, 2010 2:28 pm
  • Entity Dector
    • use: technique based on the wikipedia’s nature (algorithm given)
    • option: co-reference tools in LingPipe library and in OpenNLP tool set
  • Derive dependency trees
    • Minipar parser
  • Mining sequential patterns
  • PA structure
    • use: SNoW-based Semantic role Labeler — here


  1. articles should be processed to 
    1. remove the HTML tags,
    2. extract hyperlinks which point to other wikipedia’s articles.
  2. Parallelly processed to anchor all occurrences of
    1. principal entities —- Algorithm provided or OpenNLP
    2. secondary entities —- Simple
  3. Sentence Selector 
    1. chooses sentences which contain the principal entity and at least one secondary entity–simple
    2. Each of such pairs becomes a relation candidate.
  4. The trainer receives articles with HTML tags to
    1. identify summary sections and–simple
    2. extract ground truce relations annotated by human editors.  —seems difficult
  5. Previously selected sentences that contain entity pairs from ground true relations are
    1. identified as training data.
  6. The trainer will 
    1. learn the key patterns with respect to each relation.  —  based on Bunescu’s work or minipar, prefixSpan, SNow
  7. During testing, for each sentence and an entity pair on it, the Relation Extractor will
    1. identify the descriptive label and then
    2. outputs the final results.

Reduplication of Nguyen’s RE 2007 — Procedure

  • July 31, 2010 11:11 am

Paper: Exploiting Syntactic and Semantic Information for Relation Extraction from Wikipedia

For background and Assumption , see the note  here.

This paper has two versions, and the reduplication is based on the second version

Abstract: Mined frequent subsequences from the path between an entity pair in the syntactic and semantic structure in order to explore key patterns reflecting the relationship between the pair. The relations between entities are treated as multiclass labels, and the entities pair and labels can be passed to SVM for training.

4. Extract Relations from Wikipedia

4.1 Relation Extraction Framework

  1. articles should be processed to remove the HTML tags, extract hyperlinks which point to other wikipedia’s articles.passed to pre-processor: Sentence Splitter, Tokenizer and Phrase Chunker
  2. Parallelly processed to anchor all occurrences of principal entities and secondary entities. The Secondary Entity detector simply labels appropriate surface text of the hyperlinks as secondary entities.
  3. Sentence Selector chooses only sentences which contain the principal entity and at least one secondary entity. Each of such pairs becomes a relation candidate.
  4. The trainer receives articles with HTML tags to identify summary sections and extract ground truce relations annotated by human editors.
  5. Previously selected sentences that contain entity pairs from ground true relations are identified as training data.
  6. The trainer will learn the key patterns with respect to each relation.
  7. During testing, for each sentence and an entity pair on it, the Relation Extractor will identify the descriptive label and then outputs the final results.

Principal Entity Detector

  • Most of the pronouns in an article refer to the principal entity.
  • The first sentence of the article is often used to briefly define the principal entity.

We use rules to identify a set of referents to the principal entity, including three types[10]:

  • pronoun (“he”,”him”,”they”…)
  • proper noun (e.g., Bill Gates, William Henry Gates, Microsoft, …)
  • common nouns ( the company, the software, …)

Steps to collect the referents for D include:

  1. Start with D = {}
  2. Select the first two chunks: the proper chunk of the article title and the first proper chunk in the first sentence of the article if any. We define a proper chunk as a chunk that contains at least a proper noun, word with NNP or NNPS tag. We call the first proper chunks selected so far (up to two) in D as names of the principal entity.
  3. If D contains no items, return D and stop the algorithm. Otherwise, continue.
  4. Get a list of all other proper chunks of the articles. For each proper chunk p, if p is derived from any of the names, then D← p. We call a proper chunk p1 is derived from a proper chunk p2 if the set of all proper nouns of p1 is a sub set of all proper nouns of p2.
  5. From the article, select c as the most frequent pronouns, find c’ as its equivalent pronoun and add them to D. For example, ‘he’ is equivalent to ‘him’, ‘she’ is equivalent to ‘her’ …We call c and c’ pronoun referents.
  6. Select all the chunks with the pattern [DT N1 … Nk] where DT is a determiner and Nk is a common noun. Only those chunks which appear more frequently than the pronoun referents are added into D.


Supported by the nature of Wikipedia, our technique performs better than those of the co-reference tools in LingPipe library and in OpenNLP tool set. All the occurrences of the collected referents are labeled as principal entity.

Training Data Builder

Examining whether the pair is in ground true relation set or not. if yes, it attaches the relation label to the pair and create a new training sentence for the relation.

We define a training sentence as a six-tuple: (s, l1, r1, l2, r2, rel)  where s is the sentence itself, (l1, r1) and (l2, r2) indicate the token-based boundaries of the  principal entity and secondary entity, and rel indicates the relation label between the entity  pair. With a sentence s selected from the above procedure, we receive the identifiers of the  entities and search for a ground true relation r such that r.ep is the identifier of the principal  entity and r.es is the identifier of the secondary entity. Then, we create a new training sentence  from s and r.

For a relation r, the purpose of building training data is to collect the sentences that exactly express r. To reduce noise in training data, it is necessary to eliminate the pairs from the ground truce set which hold more than one relation.

4.2 Learning Patterns with Dependency Path

In this section, we will explain our first method to extracting relation using syntactic information.

Follow the idea in [9] we assume that the shortest dependency path tracing from a principal entity through the dependency tree to a secondary entity gives a concrete syntactic structure expressing relation between the pair.

Key patterns learning from the dependency paths for each relationship.

  1. derive dependency trees of the training sentences by Minipar parser [11] and extract paths between entity pairs
  2. transform the paths into sequences which are in turn decomposed into subsequences
  3. From the subsequence collections of a relation r, we can identify the frequent subsequences for r.
  4. During testing, dependency path between an entity pair in an novel sentence is also converted into sequence and match with the previously mined subsequences.

Sequential Representation of Dependency Path

A word together with its Part-Of-Speech tag will be an element of the sequence.

Example


  1. For each word excluding the first and the last ones in the dependency path, we consider  the combination of its base form and its Part-Of-Speech (POS) as an element of the sequence.
  2. For the first and the last word, only the POS is concerned as a sequence element.
  3. For a dependency relation, a pair of direction and relation label is considered as an  element of the sequence.

Learning Key Patterns as Mining Frequent Sequence

PrefixSpan, which is introduced in [12], is known as an efficient method to mining sequential patterns. A sequence s=<s1s2…sn>;, where si is an itemset, is called subsequence of a sequence p =<p1p2…pm>; if there exists integers 1 <= j12<…n<=m such that s1 ⊆ pj1, …, sn ⊆ pjn.

In this research, we use the implementation tool of PrefixSpan developed by Taku kudo.

From here, sequence database denotes the set of sequences converted from dependency paths with respect to a relation.

Weighting The Patterns

It is necessary for each mined pattern to be assigned a weight with respect to a relation for estimating the relevance. Factors:

  • Length of the pattern: if two paths share a long common subpattern, it is more likely that the paths express the same relationship.
  • Support of the pattern: is the number of sequences that contain the pattern. It is more likely that a pattern with high support should be a key pattern
  • Amount of lexical information: although the sequences contain both words and dependency relations from the original dependency path, we found that wordbased items are more important.
  • Number of sequence databases in which the pattern appear: if the pattern can be found in various sequence databases, it is more likely that the pattern is common and it should not be a key pattern of any relation.

 

Therefore, weight of a pattern with respect to a relation r is calculated as:

w_r(p) = \frac{\irf(p)\times support_{D_r}(p)\times l(p) \times e^{lex(p)}}{\mid D_r \mid}

  • Dr is the sequence database of r, supportDr(p) is the support of p in Dr.
  • irf(p) is Inverted Relation Frequency of p calculated by log(frac{|R|}{|M(p)|}), where R is set of relations and M(p) is set of sequence databases in which p occurs.
  • l(p) is length of p, lex(p) is the number of word-based items in p.

Relation Selection

Given a novel sentence and the anchors of an entity pair in it, we will predict the appropriate relation of the pair. We extract the dependency path P, transform P into sequential pattern and then accumulate the scores of its subsequences for each relation r:

L_r(P)= \displaystyle\sum_{p in S(p)} \omega_{r (P)}

  • Lr(P) likelihood score to say that P expresses relation r
  • S(P) set of all subsequences of the sequential representation of P
  • The appropriate relation should be the one giving highest score to P:

R= \arg\displaystyle\max_r L_r(P)


4.3 Learning Patterns with Dependency Path and Semantic Role

In this method, the only additional step is to augment dependency trees with PA structure, all the other steps are the same to those of method in Section 4.2.

Frame Semantics theory: A frame defines relationships between a predicate and its participants in a context, which form Predicate-Argument(PA) structure[13].

We use the SNoW-based Semantic role Labeler [16], a state-of-art in SRL task which conforms the definition of PropBank and CoNLL-2005 shared task on SRL. Since the SRL task just labels roles to constituents or phrases without indicateing which primitive concept playing the role, we still use dependency parsing information to further analyze the phrases. We combine the two infomation sources by integrating semantic role information into dependency parse tree of a sentence as follows:

  • For each predicate P and its role r, identify headwords of the two phrases.
  • Place the semantic relation between teh headwords into dependency tree. The relation is directed, receiving the headword of P as its head, head word of R as its tail and R as its label.

5. Experimental Setting

5.1 Data

  • real Wiki dumped on Aug 10,2006
  • articles including the summary sections  only

summary sections have templates: Infobox_Company, Infobox_Senetor, Infoxbox_Celebrity


[5] Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and patterns in Text.

[9]Extracting Relations from Text

[10]Correference for nlp applications.

[11] Dependency-Based Evaluation of Minipar

[12] Mining Sequential patterns by Pattern-Growth: The prefixspan approach.

[13] The proposition Bank: An Annotated Corpus of Semantic Roles

[16] Generalized Inference with Multiple Semantic Role Labeling Systems

Classifier

  • July 31, 2010 12:27 am

Learning and Data Mining

Both Data Mining and Machine learning are techniques related to the processing of large amounts of data.

  • The Data Mining technique tries to obtain patterns or models from the data collected.
  • Machine learning is the basic part that the different types of existing classifiers have in common. The basic idea of learning is using the perceptions not only to act but also to improve the ability of an agent to act in the future.

Learning techniques usually fall into the following categories:

Supervised learning

The supervised learning involves learning a function from tagged examples above, to establish a correspondence between the inputs and these learning system tries to tag (classify) a set of vectors choosing one of several categories (classes).

Unsupervised learning

The unsupervised learning consists in learning from input patterns with no output values specified. The main problem of this technique is how to take a decision between all patterns provided. The system takes the input objects as a set of random variables, building a density model for that data set.

Semi-supervised learning

Semi-supervised learning is based on techniques that combine the previous two, this is because in some cases can be very difficult to tag or classify all the data. The aim is to combine tagged and untagged data to improve modeling. Although it is not always helpful and there are several methods to do so.

Reinforcement Learning

The reinforcement learning is a way of learning by observing the world.

The idea of learning consists in building a function with the observed behaviour as their input and output. Learning methods can be understood as the research of a rank of hypothesis to find the appropriate function.

归纳的方法

  • July 30, 2010 11:35 pm

Hidden Markov Model
Decision tree learning
Nearest Neighbor Algorithm
Conditional Random Field : 一个很好的网站
Naive Bayes : 一个理论介绍
Kernel Method
Support Vector Machine
Maximum Entropy : 一个链接

Semi supervised learning


要做的事:将下面的归纳收集到brainmaker里

  1. 找出提出者的论文
  2. 找一至两篇成熟的应用
    1. 不限定方向
    2. 限定在nlp方向
  3. 找出成熟的实现
    1. SVM: SVMlight

排序:


Support Vector Machine
Naive Bayes : 一个理论介绍
Kernel Method
———————————————–
Hidden Markov Model
Conditional Random Field : 一个很好的网站
Maximum Entropy : 一个链接
Decision tree learning
Nearest Neighbor Algorithm

The Methodology

  • July 30, 2010 11:35 pm

Hidden Markov Model
Decision tree learning
Nearest Neighbor Algorithm
Conditional Random Field : 一个很好的网站
Naive Bayes : 一个理论介绍
Kernel Method
Support Vector Machine
Maximum Entropy : 一个链接

Semi supervised learning




Support Vector Machine
Naive Bayes : 一个理论介绍
Kernel Method
———————————————–
Hidden Markov Model
Conditional Random Field : 一个很好的网站
Maximum Entropy : 一个链接
Decision tree learning
Nearest Neighbor Algorithm

Book to borrow

  • July 30, 2010 5:28 pm

V.Vapnik. Statistical Learning Theory. John Wiley, 1998

I had returned it. Now borrow it again.

A Survey on Relation Extraction

  • July 30, 2010 5:18 pm

shared library

  • July 30, 2010 10:38 am

Q. I’ve just performed a client only install of InterBase 7.5 and whenever I start any of the command line utilities (iblicense, gstat, gfix, isql) I get the error:

error while loading shared libraries: libgds.so: cannot load shared object file: No such file or directory

What do I do?

A. The simplest solution is create a symbolic lib of libgds.so in /usr/lib. Example:

ln -s /opt/interbase/lib/libgds.so.0 /usr/lib/libgds.so

Another solution is to set your LD_LIBRARY_PATH to point to lib directory where InterBase is installed. Example:

LD_LIBRARY_PATH /opt/interbase/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH

Note that the client install only considers InterBase servers running on default portof 3050. If the server you are connecting is running on a different port you must edit your services file (/etc/services) to indicate the correct label and port. For additional information see the InterBase 7.5 Operations Guide.