July | 2007 | khi's life with an ech

July 2007

Monthly Archive

July 27, 2007

Sieve again

Posted by khigia under erlang
1 Comment

I didn’t known this was a so popular problem when I coded a sieve of Eratosthene using stream design from SICP.

In fact, I saw this video about NewSqueak which present an example of a concurrent algorithm of the sieve, and in this related discussion on plan9 group I also found an Haskell implementation (more related to the stream design than to the concurrent design IMHO). I also had a look at libtask which offer a C version of the concurrent sieve.

That’s already a lot, so one more shouldn’t be useful … but I like this example, and thus here is my Erlang implementation using a spawn for each prime in the sieve.

July 21, 2007

Advises on delicious tags

Posted by khigia under adviserl, erlang, recommender
[3] Comments

Because CF need data to learn, small examples to illustrate adviserl are not easy to find. And thus the first “real” example is already a mis-use of adviserl: it uses the CF algorithm as an IR tool!

Anyway, here we go, I created a tag recommender with my delicious bookmarks (this is not new, delicious already display related tags, but this is just an application example toy).

This is done by considering each bookmark as a source (a user) and each tag as an item: each time a tag is associated with a bookmark, this is translated as “the bookmark rate the tag with a score of 1”. The complete code is this:
application:start(adviserl),
{ok, DeliciousPID} = deli_posts:start_link(),
gen_server:call(DeliciousPID, {login, User, Password}),
{ok, Posts, _Status} = gen_server:call(DeliciousPID, {get_posts, User, Options}, infinity),
io:format("Loading posts", []),
lists:foreach(
fun(#delipost{href=HRef,tags=Tags}) ->
io:format(".", []),
lists:foreach(
fun(Tag) -> adviserl:rate(HRef,Tag,{1,no_data}) end,
Tags
)
end,
Posts
),
io:format("~n", []).
Getting a recommendation for few keywords is then:
Keywords = ["erlang", "concurrency"], KeywordIDs = lists:map(fun(K) -> adv_items:id_from_key(K) end, Keywords), Ratings = lists:map(fun(ID) -> {ID, 1} end, IDs), Rec0 = adviserl:recommend_all(Ratings), lists:map(fun({ID,_}) -> {ok,K} = adv_items:key_from_id(ID), K end, Rec0).
(lot of this code is about format and conversion, hopefully it will be done in next API release).

This delicious example toy can be run by keywords.sh in delicious example folder.

Yeah, I know, we can do the same more easily with few statistics (and R) and no CF … but (1) I needed a small example and (2) this could be extended to use different user accounts.

Hey! I should try to use citeUlike instead of delicious for the next example!

July 21, 2007

delicious adviserl

Posted by khigia under adviserl, erlang
[2] Comments

I (began to) wrote a delicious API to implement a toy example for adviserl (thus the code is here). More about the toy example in next blog post.

Haven’t found anywhere a delicious API library for Erlang … but haven’t really look for it because I wanted to avoid dependency. So I wrote one, organised in 2 parts.

The first part (deli_api) is able to:
- connect to the delicious server and retrieve data (posts) using inets http client;
- convert those data (posts) in Erlang terms using xmerl (discard XML as different delicious API (rss, api, …) have different format of the same data);
The second part is a gen_server (deli_posts) which accept those requests:
- {login, User, Password}, return ok
- logout, return ok
- {get_posts, User, [Option]} with Option=no_update, return {ok, [Post], archive|uptodate}

The server will maintain a local cache (a DETS file) and download from delicious server only if needed and requested. The login is only used to retrieve data from delicious server: data put in cache can always be accessed without password.

I may like to use this API in order to write a plugin for SharedCopy: any shared URL could be automatically posted in delicious (seems easy with Yaws but I need a web hosting for that).

July 21, 2007

Storing data into adviserl

Posted by khigia under adviserl, erlang
Leave a Comment

The recommender system adviserl manage all data (item and user, or source) through identifiers.

Keeping those identifiers opaque for the whole system is possible, but as many prediction algorithms use integer identifiers (for matrix operations), adviserl provides an API in input of the system to generate integer ID if needed. Thus all identifiers in adviserl are integers, with an automatic conversion if needed.
erl> rate(5, 1, {3,no_data}). # user 5 rate item 1 with score 3 (no rating data) erl> rate("bill", "microsoft", {5, no_data}). # user "bill" rate item "microsoft" ... erl> adv_items:id_from_key("microsoft"). 2 # item "microsoft" got ID 2 erl> adv_source:id_from_key("bill"). 1 # source "bill" got ID 1

The API is also usefull to keep data about items or sources, which open the possibility to content-based recommendation algorithms. For this, it’s enough to call the item/source API before to set related ratings. Any item/source has an integer ID, a external key (any Erlang term), and a data term (a property list in the following example).
erl> adv_items:insert_new("linux", [{tags, ["oss", "os"]}]). {true,3} # bool if inserted or existing, integer is ID erl> adv_items:object_from_id(3). {3,"linux",[{tags,["oss","os"]}]} # {ID, Key, Data} erl> adviserl:rate("linus", "linux", {6,no_data}).

July 18, 2007

adviserl got a gen_server API

Posted by khigia under adviserl, erlang, recommender
Leave a Comment

Until now, to access a remote node running adviserl, you could use the native Erlang RPC:
erlang> rpc:call(adviserl_node@localhost, adviserl, rate, [1, 2, {3,nodata}]).

I added a gen_server API to do exactly the same thing:
erlang> gen_server:call({adv_api, adviserl_node@localhost}, {rate, 1, 2, {3,nodata}}).

This is not simpler, but not more complicated either, and hopefully this will be more flexible when coming to distribution (the API may then become a global process).

I also added a bunch of shell script to start, stop, rate, or run prediction from command line. The start-stop script (adv.sh) is quite usefull, while the others are convenient when debugging. Those scripts have yet to be documented 😦 but are more or less straightforward and have a minimum help message when bad options or bad arguments are given.
shell> ./bin/adv.sh start shell> ./bin/adv-rate.sh 1 2 2 # user 1 rate item 2 with score 2 shell> ./bin/adv-rate.sh 1 3 3 # ... shell> ./bin/adv-rate.sh 2 3 3 shell> ./bin/adv-rate.sh 2 4 4 shell> ./bin/adv-rate.sh 3 3 3 shell> ./bin/adv-getratings.sh 1 3 3 2 2 shell> ./bin/adv-recommendall-source.sh 3 # prediction for user 3 4 4.00000 2 2.00000

I think my next post on adviserl will show a more useful example! 🙂

July 11, 2007

object AND functional

Posted by khigia under Uncategorized
[3] Comments

From a programmer perspective, how easy (difficult) is it to mix OO and functional programming?

Earlier this year, I’ve been reading SICP and Programming pearls at the same time; Last month, I’ve been writing both OO Python code and functional Erlang code. Those experiences are probably the best feeling of programming I had since a while: I’m not writing code, I am coding abstractions.

But I’m still wondering about mixing both styles/paradigms in the same project or same code file. My intuition is that it may be nice to work with different styles for different parts of a program, but I would really be afraid to introduce such kind of mix in a project maintained by multiple programmers. Any experience to share on this subject?

Perhaps I should give Nemerle a try. Any other?

July 10, 2007

Googling to learn collaborative filtering

Posted by khigia under recommender
1 Comment

I have no idea what is a good way to discover and learn new things. Surely “science of learning” has some models and advices that may have been useful, but I choose to follow the lazy way. To learn about collaborative filtering, I googled.

The first source of information reported by the search engine is (of course?) wikipedia: collaborative filtering .Then I passed 2 links in Google results (general information) to direct myself toward specialized articles through this great personal page. I navigate a bit in it and with the help of few more Google searches I got lost … on my way to be lost, at least I figure out some articles that seems fundamental: Reporting and evaluating choices in a virtual community of use, GroupLens: an open architecture for collaborative filtering of netnews, Social information filtering: algorithms for automating ‘word of mouth’. My machine learning background is appreciated at this stage.

But Wow! That’s a lot of information: do I really understood what I read? This was time to got dirty hands so I choose one system looking relatively simple and faced it at implementation level: Slope-One algorithm.

Then I found a recent article about Google news personalization recommender system: I still don’t feel comfortable to compare the different algorithms, but at least this great article put things in real context (taking into account the big scalability problem). And also this point me toward a wonderful survey: Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions.

I have enough (too much) information for now, so I guess that (1) it is why I feel a bit confused and (2) it is the good time to write down all that stuffs to try to organized it: field description and algorithms classifications, evaluation of recommender systems (see for example here and here) and at last but not least the scalability problem. And I will try to use citeUlike and its recommendations 🙂

Hopefully, after that, I can go further “Incorporating contextual information in recommender systems using a multidimensional approach” (available here), but before I still have to find a survey of available recommender systems. Any recommendation? 😉

Did I miss something important?

July 6, 2007

Adviserl goes live

Posted by khigia under adviserl, erlang, recommender
[2] Comments

Long time ago I wrote a very simple Slope-One implementation (collaborative filtering algorithm): this was easy and fulfilled all my needs … which was then to learn CF and Erlang ;).

Then I realized that it could be more than fun and could even become useful as a simple recommender system. So I wrote it as an OTP application and it is published under GPLv3 license. Lots (lots!) of things still have to be done but the basic are like that:

% start application: by default use Slope-One data/algorithm
application:start(sasl),
application:start(adviserl),

% add some rating in the system
adviserl:rate(1, 2,  {3, no_rating_data}), % user 1 rate item 2 with value 3 (no data)
adviserl:rate(1, 4,  {5, no_rating_data}), % ...
adviserl:rate(2, 2,  {1, no_rating_data}),
adviserl:rate(2, 5,  {8, "damn good!"}), % any data term can be associated to rating value
adviserl:rate(3, 4,  {3, no_rating_data}),
adviserl:rate(3, 5,  {2, no_rating_data}),
adviserl:rate(3, 12, {2, no_rating_data}),

% some debug output to "see" the data
adv_ratings:print_debug(), % display the ratings per user
adv_items:print_debug(), % display a covisitation matrix

% try some predictions
adviserl:recommend_all(1), % prediction for user 1
adviserl:recommend_all(2), % ... for user 2
adviserl:recommend_all(3),
adviserl:recommend_all(4),
adviserl:recommend_all([]), % for any user without rating!
adviserl:recommend_all([{2,5}]), % for any user having those ratings
adviserl:recommend_all([{4,5}]), % idem
adviserl:recommend_all([{2,5},{4,5}]), % idem with multiple ratings
adviserl:recommend_all([{3,5}]), % ... even if item is unknown

% update on the fly
IncreaseRating = fun({R, Data}) -> {R + 1, Data} end,
DefaultRating = {1, no_data},
adviserl:rate(1, 2, {7, now()}), % user 1 change rating of item 2 from 3 to 7, adding data
adviserl:rate(1, 2, IncreaseRating, DefaultRating), % update from 7 to 8 with function
adviserl:rate(1, 42, IncreaseRating, DefaultRating), % rate item 42 at 1 (default)

ok_lah.

Among the main points on the pseudo roadmap:

API to call adviserl functions through process messages
data persistence
data distribution
algorithm distribution

	Some Lazy Fun with S… on Simple stream implementation i…
	mua ruột gối ở đâu h… on List of Erlang data struc…
	campospate14864 on Thanks wordpress.com
	C# / F# Performance… on ocaml vs. F# for big integer…
	Python:Next step for… on My next programming langu…

khi’s life with an ech

July 2007

Sieve again

Advises on delicious tags

delicious adviserl

Storing data into adviserl

adviserl got a gen_server API

object AND functional

Googling to learn collaborative filtering

Adviserl goes live

Tumblr feed

Categories

Pages

Blog Recent Comments

Archives

Meta