Nanos gigantium humeris insidentes!

Ted Talk: Filter Bubble

  • May 4, 2014 1:25 am

CF: user based vs Item based

  • April 3, 2014 3:14 am

  • A user profile normally contains less ratings than a product profile!
  • User-based CF – similarity between users is dynamic, pre-comupting user neighborhood can lead
    to poor predictions

    • ” Because the similarity between users can change if
      only a few ratings are changing (the overlap
      between usersʼ profiles is small)
  •  Item-based CF – similarity between items is more static
  • This enables pre-computing of item-item similarity => prediction process involves only a table lookup for
    the similarity values & computation of the weighted sum.

Query understanding

  • March 23, 2014 7:50 am
  1. as a classification problem: query classification
    1. Classification
      1. topic: use query topic taxonomies
      2. intents:
        1. navigational
        2. informational
        3. transactional
    2.   Solutions
      1. ad hoc threshold based — rule based
      2. machine learning
        1. classify session (not query)
        2. classify the click-through data (not query)
    3.  data
      1. aid with click through data
      2. session
    4. survey
  2. query intent as a recommendation problem
    1. people who does this means that
query expansion
three main underlying intents, namely navigational, informational, and transactional.
There exist two main “dimensions” in which query classification has been usually performed: “topic” and “intent”.
topic: movie, travel, news etc
  1. “navigational” (the user wants to reach a particular website),
  2. “informational” (the user wants to find a piece of information on the Web), and (3)
  3. “transactional” (the user wants to perform a web-mediated task).

If a person is born deaf, which language do they think in?

  • February 11, 2014 8:35 pm

Learning Topics

  • January 16, 2014 12:44 am


  • bootstrapping test
  • jackknifing test
  • permutation test

Learning Models

  • Regression
    • OLS
    • GLM
      • linear component
      • error structures
      • link functions
  • Classification
    • SVM
  • Ordinal Classification

Goodness of Fit

  • Pearson Chi-square test
  • Root mean square error


  • t-test
  • Chi-squared test

Four Assumptions Of Multiple Regression That Researchers Should Always Test

  • December 9, 2013 11:29 pm



Linear Regression, Bridge Regression and Lasso

  • December 6, 2013 12:03 am

Recommendation System

  • November 4, 2013 6:22 am


The distinction between the physical and on-line worlds has been called the long tail  phenomenon, and it is suggested in Fig. 9.2. The vertical axis represents popularity  (the number of times an item is chosen). The items are ordered on the horizontal axis according to their popularity. Physical institutions provide only the most popular items to the left of the vertical line, while the corresponding on-line institutions provide the entire range of items: the tail as well as the popular items.

Screen Shot 2013-11-03 at 10.18.13 PM

The long tail: physical institutions can only provide what is popular,
while on-line institutions can make everything available

There are two basic architectures for a recommendation system:

1. Content-Based  systems focus on properties of items. Similarity of items is determined by measuring the similarity in their properties.

2. Collaborative-Filtering  systems focus on the relationship between users and items. Similarity of items is determined by the similarity

1. Content-based

Item Profile: In a content-based system, we must construct for each item a profile , which is a record or collection of records representing important characteristics of that item.

User Profile:

We not only need to create vectors describing items; we need to create vectors with the same components that describe the user’s preferences.

With profile vectors for both users and items, we can estimate the degree to which a user would prefer an item by computing the cosine distance between the user’s and item’s vectors.

2. Collaborative Filtering

Measure Similarity of Users

Cluster users or items

Naive Bayes classifier Probabilistic model

  • October 14, 2013 8:08 pm

Abstractly, the probability model for a classifier is a conditional model.
p(C \vert F_1,\dots,F_n)\,
over a dependent class variable C with a small number of outcomes or classes, conditional on several feature variables F_1 through F_n. The problem is that if the number of features n is large or when a feature can take on a large number of values, then basing such a model on probability tables is infeasible. We therefore reformulate the model to make it more tractable.

Probit Models — An Application Example

  • October 14, 2013 6:01 pm