recommender


The prediction module using Slope-One algorithm can compute popular items.

In this implementation, popular items are not only the most frequent items. Each user’s ratings defines some kind of relative order of a subset of items. The prediction module compute some kind of global order for all items by merging all “user subset’s orders”. This is actually using Slope-One algorithm for a user which would rate all items with same score.

Popular items recommendation is useful for new user!

Did I say adviserl need doc and testing … well … I did implemented something else: a (too) simple HTTP API based on inets. It enable rating of items and retrieving recommendations. Hopefully it will also display few info about state of adviserl application. If set in config file, adviserl will start inets at startup.

Because CF need data to learn, small examples to illustrate adviserl are not easy to find. And thus the first “real” example is already a mis-use of adviserl: it uses the CF algorithm as an IR tool!

Anyway, here we go, I created a tag recommender with my delicious bookmarks (this is not new, delicious already display related tags, but this is just an application example toy).

This is done by considering each bookmark as a source (a user) and each tag as an item: each time a tag is associated with a bookmark, this is translated as “the bookmark rate the tag with a score of 1”. The complete code is this:
application:start(adviserl),
{ok, DeliciousPID} = deli_posts:start_link(),
gen_server:call(DeliciousPID, {login, User, Password}),
{ok, Posts, _Status} = gen_server:call(DeliciousPID, {get_posts, User, Options}, infinity),
io:format("Loading posts", []),
lists:foreach(
fun(#delipost{href=HRef,tags=Tags}) ->
io:format(".", []),
lists:foreach(
fun(Tag) -> adviserl:rate(HRef,Tag,{1,no_data}) end,
Tags
)
end,
Posts
),
io:format("~n", []).
Getting a recommendation for few keywords is then:
Keywords = ["erlang", "concurrency"],
KeywordIDs = lists:map(fun(K) -> adv_items:id_from_key(K) end, Keywords),
Ratings = lists:map(fun(ID) -> {ID, 1} end, IDs),
Rec0 = adviserl:recommend_all(Ratings),
lists:map(fun({ID,_}) -> {ok,K} = adv_items:key_from_id(ID), K end, Rec0).

(lot of this code is about format and conversion, hopefully it will be done in next API release).

This delicious example toy can be run by keywords.sh in delicious example folder.

Yeah, I know, we can do the same more easily with few statistics (and R) and no CF … but (1) I needed a small example and (2) this could be extended to use different user accounts.

Hey! I should try to use citeUlike instead of delicious for the next example!

Until now, to access a remote node running adviserl, you could use the native Erlang RPC:
erlang> rpc:call(adviserl_node@localhost, adviserl, rate, [1, 2, {3,nodata}]).

I added a gen_server API to do exactly the same thing:
erlang> gen_server:call({adv_api, adviserl_node@localhost}, {rate, 1, 2, {3,nodata}}).

This is not simpler, but not more complicated either, and hopefully this will be more flexible when coming to distribution (the API may then become a global process).

I also added a bunch of shell script to start, stop, rate, or run prediction from command line. The start-stop script (adv.sh) is quite usefull, while the others are convenient when debugging. Those scripts have yet to be documented 😦 but are more or less straightforward and have a minimum help message when bad options or bad arguments are given.
shell> ./bin/adv.sh start
shell> ./bin/adv-rate.sh 1 2 2 # user 1 rate item 2 with score 2
shell> ./bin/adv-rate.sh 1 3 3 # ...
shell> ./bin/adv-rate.sh 2 3 3
shell> ./bin/adv-rate.sh 2 4 4
shell> ./bin/adv-rate.sh 3 3 3
shell> ./bin/adv-getratings.sh 1
3 3
2 2
shell> ./bin/adv-recommendall-source.sh 3 # prediction for user 3
4 4.00000
2 2.00000

I think my next post on adviserl will show a more useful example! 🙂

I have no idea what is a good way to discover and learn new things. Surely “science of learning” has some models and advices that may have been useful, but I choose to follow the lazy way. To learn about collaborative filtering, I googled.

The first source of information reported by the search engine is (of course?) wikipedia: collaborative filtering .Then I passed 2 links in Google results (general information) to direct myself toward specialized articles through this great personal page. I navigate a bit in it and with the help of few more Google searches I got lost … on my way to be lost, at least I figure out some articles that seems fundamental: Reporting and evaluating choices in a virtual community of use, GroupLens: an open architecture for collaborative filtering of netnews, Social information filtering: algorithms for automating ‘word of mouth’. My machine learning background is appreciated at this stage.

But Wow! That’s a lot of information: do I really understood what I read? This was time to got dirty hands so I choose one system looking relatively simple and faced it at implementation level: Slope-One algorithm.

Then I found a recent article about Google news personalization recommender system: I still don’t feel comfortable to compare the different algorithms, but at least this great article put things in real context (taking into account the big scalability problem). And also this point me toward a wonderful survey: Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions.

I have enough (too much) information for now, so I guess that (1) it is why I feel a bit confused and (2) it is the good time to write down all that stuffs to try to organized it: field description and algorithms classifications, evaluation of recommender systems (see for example here and here) and at last but not least the scalability problem. And I will try to use citeUlike and its recommendations 🙂

Hopefully, after that, I can go further “Incorporating contextual information in recommender systems using a multidimensional approach” (available here), but before I still have to find a survey of available recommender systems. Any recommendation? 😉

Did I miss something important?

Long time ago I wrote a very simple Slope-One implementation (collaborative filtering algorithm): this was easy and fulfilled all my needs … which was then to learn CF and Erlang ;).

Then I realized that it could be more than fun and could even become useful as a simple recommender system. So I wrote it as an OTP application and it is published under GPLv3 license. Lots (lots!) of things still have to be done but the basic are like that:

% start application: by default use Slope-One data/algorithm
application:start(sasl),
application:start(adviserl),
% add some rating in the system
adviserl:rate(1, 2,  {3, no_rating_data}), % user 1 rate item 2 with value 3 (no data)
adviserl:rate(1, 4,  {5, no_rating_data}), % ...
adviserl:rate(2, 2,  {1, no_rating_data}),
adviserl:rate(2, 5,  {8, "damn good!"}), % any data term can be associated to rating value
adviserl:rate(3, 4,  {3, no_rating_data}),
adviserl:rate(3, 5,  {2, no_rating_data}),
adviserl:rate(3, 12, {2, no_rating_data}),
% some debug output to "see" the data
adv_ratings:print_debug(), % display the ratings per user
adv_items:print_debug(), % display a covisitation matrix
% try some predictions
adviserl:recommend_all(1), % prediction for user 1
adviserl:recommend_all(2), % ... for user 2
adviserl:recommend_all(3),
adviserl:recommend_all(4),
adviserl:recommend_all([]), % for any user without rating!
adviserl:recommend_all([{2,5}]), % for any user having those ratings
adviserl:recommend_all([{4,5}]), % idem
adviserl:recommend_all([{2,5},{4,5}]), % idem with multiple ratings
adviserl:recommend_all([{3,5}]), % ... even if item is unknown
% update on the fly
IncreaseRating = fun({R, Data}) -> {R + 1, Data} end,
DefaultRating = {1, no_data},
adviserl:rate(1, 2, {7, now()}), % user 1 change rating of item 2 from 3 to 7, adding data
adviserl:rate(1, 2, IncreaseRating, DefaultRating), % update from 7 to 8 with function
adviserl:rate(1, 42, IncreaseRating, DefaultRating), % rate item 42 at 1 (default)
ok_lah.

Among the main points on the pseudo roadmap:

  • API to call adviserl functions through process messages
  • data persistence
  • data distribution
  • algorithm distribution