The recipe: https://www.math.hmc.edu/funfacts/ffiles/20010.5.shtml
Question: Can we learn to make it?
- Collaborative Filtering
- Model-based (latent factor)
- Content based Filtering
A/B test is broadly used in many customer facing systems to measure the significance of a hypothesis. But most applications are focusing on revenue or other short-term impact metrics. For applications like personalized search engine, personalized recommendation engine, the results of this, they are getting more and more monotonic.
We should try to apply A/B test to see whether the personalized content truly interests the user. This should include two aspects:
- replace the recommendations with other randomness (meaning not recommend that item), will the customer eventually find it interesting in other way.
- once this item being discovered (via personalized recommendations or spontaneous discovery), whether user show significant interest to this item than other items along the way (e.g. those random items before discovering this item).
The philosophy behind this is same as Organic Training Data that we are making the rich richer, and that’s what we should avoid.
With the personalized search or recommendation systems, users are fed with more and more monoly content that they might not initially consume. We call this non-organic data. With this data being reused for training, we are making the rich richer.
Thus we should try to use the spontaneous consumption events rather than the events from personalized system for training.
Popularity is meaningless, the average people’s wisdom is only informative when it comes to entertainment. Personalization based on personal interest would only end up in a concentrated converged small subset. If you keep narrowing down the side, it would end up falls into the top 10 best seller that genre.
What makes a new generation of recommendations system?
- some level of diverse — Aesthetically Tired
- high quality guiding source — just like when you follow some hot shots on twitter, you obtain information from them, not from average people
- Editor picks for you requires a human editor — not necessary to be picking everything, but need to provide sources and probably quality assurance.
- 80/20: some portion of human work is better than pure machine. That 20% needs not be total amount of repeating work, could be collaboration — providing source.
- Designing a new strategy — Do not optimize over crap. Think from scratch.
- Cocktail strategy — how people explore new things — a little bit of everything — best sellers, trending, personal interest, people who share similar taste, something different.