adviserl


The prediction module using Slope-One algorithm can compute popular items.

In this implementation, popular items are not only the most frequent items. Each user’s ratings defines some kind of relative order of a subset of items. The prediction module compute some kind of global order for all items by merging all “user subset’s orders”. This is actually using Slope-One algorithm for a user which would rate all items with same score.

Popular items recommendation is useful for new user!

Did I say adviserl need doc and testing … well … I did implemented something else: a (too) simple HTTP API based on inets. It enable rating of items and retrieving recommendations. Hopefully it will also display few info about state of adviserl application. If set in config file, adviserl will start inets at startup.

adviserl is now version 0.2 thanks to its new default backend: Mnesia.

Mnesia

All 4 data services (sources, items, ratings and predictions) can now store everything into Mnesia tables, which help to (let’s hope …) ensure consistency. Using Mnesia will also ease to distribute the application later.

The main difficulty was the backup of adviserl data because all services are independent. So adviserl only provides tools (adv_mnesia:backup/0, adv_util:save_files/0) and the user of adviserl need to know want he want to backup and when (no logic is implemented to backup or restore data). There is no function to restore a backup when adviserl is running, but for a shortcoming, it is easy to stop and restart adviserl.

For now, adviserl start mnesia if needed, but cannot stop mnesia when adviserl stop. Still wondering if mnesia could/should be an included application (using mnesia_sup module).

API changes

The main interface has change a little bit to facilitate the use of ID or key to identify items and sources. By default, ‘keys’ are used to identify sources and items (adviserl:rate/3,4). New functions enable to directly use ID (adviserl:rate_id/3,4).

Also new functions enable to add rating asynchronously (adviserl:async_rate/3,4 and adviserl:async_rate_id/3,4), meaning that recommendation will not use those ratings until specified (adv_predictions:init/0): this enable to quickly add a lot of ratings (without expensive update of recommendation data), then purposely trigger a re-initialization of recommendation data.

Conclusion

That’s all for now.

I hope I can work on using multiple prediction algorithms, but it may be better to first implement some regression tests and write a minimal documentation 😉

I’m looking forward to play with recommender algorithms 🙂

I added a “welcoming bug” feature in adviserl: a simple save_files/0 function to backup the recommender data in a persistent manner.

That was quick to write, and I really need it now that I begin to look at a lot of data. But it’s the wrong way to do it: I need to synchronize the multiple servers before to make or restore the backup. Still wondering if I should have done that in a branch …

Anyway, it’s here, and it works: if the application anvironment define few files where to save data, those files will be read and written when application start or stop respectively (you can use load_files/0 or save_files/0 whenever you want, but be careful that servers should not process anything else at the same time to avoid inconsistent state).

Mnesia backend is becoming urgent now … it’s the next point on the roadmap 🙂

Because CF need data to learn, small examples to illustrate adviserl are not easy to find. And thus the first “real” example is already a mis-use of adviserl: it uses the CF algorithm as an IR tool!

Anyway, here we go, I created a tag recommender with my delicious bookmarks (this is not new, delicious already display related tags, but this is just an application example toy).

This is done by considering each bookmark as a source (a user) and each tag as an item: each time a tag is associated with a bookmark, this is translated as “the bookmark rate the tag with a score of 1”. The complete code is this:
application:start(adviserl),
{ok, DeliciousPID} = deli_posts:start_link(),
gen_server:call(DeliciousPID, {login, User, Password}),
{ok, Posts, _Status} = gen_server:call(DeliciousPID, {get_posts, User, Options}, infinity),
io:format("Loading posts", []),
lists:foreach(
fun(#delipost{href=HRef,tags=Tags}) ->
io:format(".", []),
lists:foreach(
fun(Tag) -> adviserl:rate(HRef,Tag,{1,no_data}) end,
Tags
)
end,
Posts
),
io:format("~n", []).
Getting a recommendation for few keywords is then:
Keywords = ["erlang", "concurrency"],
KeywordIDs = lists:map(fun(K) -> adv_items:id_from_key(K) end, Keywords),
Ratings = lists:map(fun(ID) -> {ID, 1} end, IDs),
Rec0 = adviserl:recommend_all(Ratings),
lists:map(fun({ID,_}) -> {ok,K} = adv_items:key_from_id(ID), K end, Rec0).

(lot of this code is about format and conversion, hopefully it will be done in next API release).

This delicious example toy can be run by keywords.sh in delicious example folder.

Yeah, I know, we can do the same more easily with few statistics (and R) and no CF … but (1) I needed a small example and (2) this could be extended to use different user accounts.

Hey! I should try to use citeUlike instead of delicious for the next example!

I (began to) wrote a delicious API to implement a toy example for adviserl (thus the code is here). More about the toy example in next blog post.

Haven’t found anywhere a delicious API library for Erlang … but haven’t really look for it because I wanted to avoid dependency. So I wrote one, organised in 2 parts.

  • The first part (deli_api) is able to:
    • connect to the delicious server and retrieve data (posts) using inets http client;
    • convert those data (posts) in Erlang terms using xmerl (discard XML as different delicious API (rss, api, …) have different format of the same data);
  • The second part is a gen_server (deli_posts) which accept those requests:
    • {login, User, Password}, return ok
    • logout, return ok
    • {get_posts, User, [Option]} with Option=no_update, return {ok, [Post], archive|uptodate}

The server will maintain a local cache (a DETS file) and download from delicious server only if needed and requested. The login is only used to retrieve data from delicious server: data put in cache can always be accessed without password.

I may like to use this API in order to write a plugin for SharedCopy: any shared URL could be automatically posted in delicious (seems easy with Yaws but I need a web hosting for that).

The recommender system adviserl manage all data (item and user, or source) through identifiers.

Keeping those identifiers opaque for the whole system is possible, but as many prediction algorithms use integer identifiers (for matrix operations), adviserl provides an API in input of the system to generate integer ID if needed. Thus all identifiers in adviserl are integers, with an automatic conversion if needed.
erl> rate(5, 1, {3,no_data}). # user 5 rate item 1 with score 3 (no rating data)
erl> rate("bill", "microsoft", {5, no_data}). # user "bill" rate item "microsoft" ...
erl> adv_items:id_from_key("microsoft").
2 # item "microsoft" got ID 2
erl> adv_source:id_from_key("bill").
1 # source "bill" got ID 1

The API is also usefull to keep data about items or sources, which open the possibility to content-based recommendation algorithms. For this, it’s enough to call the item/source API before to set related ratings. Any item/source has an integer ID, a external key (any Erlang term), and a data term (a property list in the following example).
erl> adv_items:insert_new("linux", [{tags, ["oss", "os"]}]).
{true,3} # bool if inserted or existing, integer is ID
erl> adv_items:object_from_id(3).
{3,"linux",[{tags,["oss","os"]}]} # {ID, Key, Data}
erl> adviserl:rate("linus", "linux", {6,no_data}).

Until now, to access a remote node running adviserl, you could use the native Erlang RPC:
erlang> rpc:call(adviserl_node@localhost, adviserl, rate, [1, 2, {3,nodata}]).

I added a gen_server API to do exactly the same thing:
erlang> gen_server:call({adv_api, adviserl_node@localhost}, {rate, 1, 2, {3,nodata}}).

This is not simpler, but not more complicated either, and hopefully this will be more flexible when coming to distribution (the API may then become a global process).

I also added a bunch of shell script to start, stop, rate, or run prediction from command line. The start-stop script (adv.sh) is quite usefull, while the others are convenient when debugging. Those scripts have yet to be documented 😦 but are more or less straightforward and have a minimum help message when bad options or bad arguments are given.
shell> ./bin/adv.sh start
shell> ./bin/adv-rate.sh 1 2 2 # user 1 rate item 2 with score 2
shell> ./bin/adv-rate.sh 1 3 3 # ...
shell> ./bin/adv-rate.sh 2 3 3
shell> ./bin/adv-rate.sh 2 4 4
shell> ./bin/adv-rate.sh 3 3 3
shell> ./bin/adv-getratings.sh 1
3 3
2 2
shell> ./bin/adv-recommendall-source.sh 3 # prediction for user 3
4 4.00000
2 2.00000

I think my next post on adviserl will show a more useful example! 🙂

Long time ago I wrote a very simple Slope-One implementation (collaborative filtering algorithm): this was easy and fulfilled all my needs … which was then to learn CF and Erlang ;).

Then I realized that it could be more than fun and could even become useful as a simple recommender system. So I wrote it as an OTP application and it is published under GPLv3 license. Lots (lots!) of things still have to be done but the basic are like that:

% start application: by default use Slope-One data/algorithm
application:start(sasl),
application:start(adviserl),
% add some rating in the system
adviserl:rate(1, 2,  {3, no_rating_data}), % user 1 rate item 2 with value 3 (no data)
adviserl:rate(1, 4,  {5, no_rating_data}), % ...
adviserl:rate(2, 2,  {1, no_rating_data}),
adviserl:rate(2, 5,  {8, "damn good!"}), % any data term can be associated to rating value
adviserl:rate(3, 4,  {3, no_rating_data}),
adviserl:rate(3, 5,  {2, no_rating_data}),
adviserl:rate(3, 12, {2, no_rating_data}),
% some debug output to "see" the data
adv_ratings:print_debug(), % display the ratings per user
adv_items:print_debug(), % display a covisitation matrix
% try some predictions
adviserl:recommend_all(1), % prediction for user 1
adviserl:recommend_all(2), % ... for user 2
adviserl:recommend_all(3),
adviserl:recommend_all(4),
adviserl:recommend_all([]), % for any user without rating!
adviserl:recommend_all([{2,5}]), % for any user having those ratings
adviserl:recommend_all([{4,5}]), % idem
adviserl:recommend_all([{2,5},{4,5}]), % idem with multiple ratings
adviserl:recommend_all([{3,5}]), % ... even if item is unknown
% update on the fly
IncreaseRating = fun({R, Data}) -> {R + 1, Data} end,
DefaultRating = {1, no_data},
adviserl:rate(1, 2, {7, now()}), % user 1 change rating of item 2 from 3 to 7, adding data
adviserl:rate(1, 2, IncreaseRating, DefaultRating), % update from 7 to 8 with function
adviserl:rate(1, 42, IncreaseRating, DefaultRating), % rate item 42 at 1 (default)
ok_lah.

Among the main points on the pseudo roadmap:

  • API to call adviserl functions through process messages
  • data persistence
  • data distribution
  • algorithm distribution