Brainmaker

Nanos gigantium humeris insidentes!

Automatic summarization

  • May 3, 2015 10:09 am

http://www.cis.upenn.edu/~nenkova/1500000015-Nenkova.pdf

 

https://github.com/miso-belica/sumy/blob/dev/sumy/summarizers/lsa.py

  1. Extractive summarization
  2. Abstractive summarization

http://www.jatit.org/volumes/Vol59No1/7Vol59No1.pdf

  1. Structure based
  2. semantic based
    1. Framework for abstractive summarization using textto-text generation
    2. Semantic graph reduction approach for abstractive Text Summarization

Learn to make Pie

  • April 10, 2015 12:29 am

The recipe: https://www.math.hmc.edu/funfacts/ffiles/20010.5.shtml

http://math.stackexchange.com/questions/261694/working-out-digits-of-pi

Question: Can we learn to make it?

Introduction to Deep learning

  • April 4, 2015 2:07 am

Recommendation Systems

  • April 3, 2015 11:31 pm

http://java.dzone.com/articles/recommendation-engine-models

http://www.groupes.polymtl.ca/inf6304/Presentations/20133/matrix-fac.pdf

  1. Collaborative Filtering
    1. Neighbor-based
    2. Model-based (latent factor)
  2. Content based Filtering

Using A/B test for other purpose

  • April 3, 2015 9:52 pm

A/B test is broadly used in many customer facing systems to measure the significance of a hypothesis. But most applications are focusing on revenue or other short-term impact metrics. For applications like personalized search engine, personalized recommendation engine, the results of this, they are getting more and more monotonic.

We should try to apply A/B test to see whether the personalized content truly interests the user. This should include two aspects:

  1. replace the recommendations with other randomness (meaning not recommend that item), will the customer eventually find it interesting in other way.
  2. once this item being discovered (via personalized recommendations or spontaneous discovery), whether user show significant interest to this item than other items along the way (e.g. those random items before discovering this item).

The philosophy behind this is same as Organic Training Data that we are making the rich richer, and that’s what we should avoid.

Organic Training Data

  • April 3, 2015 9:34 pm

With the personalized search or recommendation systems, users are fed with more and more monoly  content that they might not initially consume. We call this non-organic data. With this data being reused for training, we are making the rich richer.

Thus we should try to use the spontaneous consumption events rather than the events from personalized system for training.

Introduction to Latent Factor CF

  • April 3, 2015 9:10 pm

Matrix factorization

 

http://hpi.de/fileadmin/user_upload/fachgebiete/naumann/lehre/SS2011/Collaborative_Filtering/pres1-matrixfactorization.pdf

Towards diverse recommendation

  • April 3, 2015 7:23 pm

reduce redundancy while maintaining query relevance in re-ranking retrieved documents. (Carbonell and Goldstein 1998)

how to use rating

  • November 11, 2014 8:27 pm
  • http://www.evanmiller.org/how-not-to-sort-by-average-rating.html
  • http://www.evanmiller.org/bayesian-average-ratings.html
  • http://www.evanmiller.org/ranking-items-with-star-ratings.html

Some thoughts about Recommendations

  • August 13, 2014 7:21 pm

Popularity is meaningless, the average people’s wisdom is only informative when it comes to entertainment. Personalization based on personal interest would only end up in a concentrated  converged small subset. If you keep narrowing down the side, it would end up falls into the top 10 best seller  that genre.

What makes a new generation of recommendations system?

  1. some level of diverse — Aesthetically Tired
  2. high quality guiding source — just like when you follow some hot shots on twitter, you obtain information from them, not from average people
  3. Editor picks for you requires a human editor — not necessary to be picking everything, but need to provide sources and probably quality assurance.
  4. 80/20: some portion of human work is better than pure machine. That 20% needs not be total amount of repeating work, could be collaboration — providing source.
  5. Designing a new strategy — Do not optimize over crap. Think from scratch.
  6. Cocktail strategy — how people explore new things — a little bit of everything — best sellers, trending, personal interest, people who share similar taste, something different.