ocaml


Lately I’ve been playing a bit … well, that’s a programmer kind of game: I wrote yet other implementations (with yet other bugs I assume) of some classic data-structures, in Ocaml.

Those are all about string set or string map, and are purely functional (thus persistent).

The code is in the project ocaml-stringset (http://github.com/khigia/ocaml-stringset/tree/master). There is a TST (ternary search tree), a Collapsed TST, a Trie (bootstrap version of Chris Okasaki) and maybe other will be added. Most of the code is in file/module StringSet so it is easy to embed in a project. There is one example file for each datastructure (as well as some unit test … and the unit tests of TST iss pretty bad as it test the internal structure, not the API).

Haven’t been as far as doing stress tests though it would be nice to compare the different behaviours.

Update: those algo are implemented for string, but we could certainly provide a generic implementation for any datatype that is iterable collection and whose items are comparable. I may try to make some generic version later on.

Update: most of those tree algo implementation are NOT tail recursive; this would probably need continuations … I’ll add this if/when needed (in production environment, it should probably be done anyway except if the use case is restricted and ensure a limit in recursion depth).

(that time I learnt the lesson: never promise what the next post will be about coz I often not deliver)

Optimization algo need to explore the space of solutions, and this exploration is a major part of the algo! Let’s focus on one simple case: exploring in random order two finite dimensions. For example, let two integer variables, V1 taking value in interval [1, 100] and V2 taking value in interval [1, 20]: how to explore all the possible pairs (v1,v2) in random order?

A simple (almost-)solution is given by the following python code:

[(e1,e2) for e1 in shuffle(range([1,100)), for e2 in shuffle(range(1,20))]

which is in fact the cartesian product of the shuffles of V1 and V2 intervals … but this is ordered by values of the first shuffle and thus not random.

OK, now is the bad news: I have no solution completely space efficient to propose. My best effort is a solution which compute at least two shuffles, one on each dimension (kind of O(n + m) for space) and is even not random … just random enough for most of the cases 😛

So how? the solution is describe in this post http://weblog.raganwald.com/2007/02/haskell-ruby-and-infinity.html and especialy the section about the tabular view of the cartesian product. Did you read it? so the proposed solution is to navigate the cross-product table along its diagonal instead of row by row … simply smart isn’t it? It give “impression” of randomness 😛

Ocaml code for this algo is here: http://github.com/khigia/ocaml-anneal/tree/master/walks.ml (function pair_permutation_random_seq); It uses extensively an ad-hoc stream implementation (Seq module) to perform the walk lazyly. It was a good example to test the stream implementation!

I’ve been using this python implementation of simulated annealing to deduce Singapore bus service from a database of bus stops (see previous posts about Google Map API).

I wanted to play a little bit more with this algo and decided to port it to OCaml: you can find the code on GitHub.

As a first try, I wrote a direct code translation. Only few points differ:

  • The OCaml code use a stream implementation (lazy list) … (mostly equivalent to my previous Erlang implementation).
  • The OCaml code uses array structure where Python uses list.
  • The OCaml implementation of function reversed_section do only one array reverse in all cases.

Result is quite ok. Without any optimization on the algo, the OCaml native code performed quite faster (around 10 times faster), meaning that solving the TSP problem for SG bus services took 40 minutes for OCaml version where the Python code ran for hours (of course we could improve this version too).

The algo itself can be improve (I guestimate the time for SG map problem can be reduce by one more order of magnitude with the same simulated annealing approach).

But in coming post (code is there, need time to clean it), I’ll be looking at other interesting problems I found inside this one, especially on how to walk in solutions space in quasi-random order.

New example in ocamerl lib of erlocaml project: a tiny map-reduce-like (1) word-count program (word-count is the simple example often used to illustrate map-reduce principle).

First part of this example is an ocaml distributed erlang node [l52]. It is composed of one permanent registered (‘wc’) mbox [l40] which upon request create other mboxes implementing the mappers [l19] (on request, a mapper read a text file [l8] and send it word by word to reducer [l29]).

Second part of this map-reduce example is an erlang module implementing two functionalities. One of them is the reduce part implementation [l34] done by a process which accept messages from mappers and update a dictionary accordingly.
Second responsibility of this module is orchestration of the map-reduce flow [l90], consisting of running all mappers [l66], waiting end of all those executions [l79], and collecting result of all reducers [l95].

Assuming EPMD is running, this script run an ocaml node and an erlang node, and run a simple map-reduce to count words in two test files.

As usual, not sure it’s a useful example, but it was sure fun to write 🙂

(1) this is not a map-reduce lib … there is no failure case taken care of, processes pipe is mostly done through messages over network (while map-reduce is mainly meant to use an efficient filesystem based communication), etc!

Little intro step-by-step to the ocaml-erlang message exchange mechanism offered by ocamerl (a lib of erlocaml).

Subject of the following code is a wonderfull application … named “echo server”!

  1. In the interpretor shell of ocaml, we create a hidden erlang node and an activity to echo all received message (both on console as well as in respond to the caller).
  2. In the erlang interpretor shell, we send few message and compare them to the received ones.

The following assumes code is running on machine “comp1”.

Ocaml shell:

(* getting Ocamerl available in toplevel *)
ocaml> #use "topfind";; (* ocamerl used findlib for install *)
ocaml> #thread;; (* ocamerl is multithreaded app *)
ocaml> #require "ocamerl";;

(* creating/declaring a node, assuming epmd is running *)
ocaml> let o1 = Ocamerl.Enode.run "o1" ~cookie:"cookie";;
val o1 : Ocamerl.Enode.t = <abstr>

(* creating/declaring a mbox, equivalent of erlang process *)
ocaml> let m = Ocamerl.Enode.create_mbox o1;;
val m : Ocamerl.Enode.Mbox.t = <abstr>
ocaml> Ocamerl.Enode.register_mbox o1 m "echoer";; (* give it a name *)
- : unit = ()

ocaml> Ocamerl.Enode.Mbox.create_activity m (fun msg -> match msg with
    | Ocamerl.Eterm.ET_tuple [|pid; any;|] ->
        Printf.eprintf "MSG:%s\n%!" (Ocamerl.Eterm.to_string msg);
        Ocamerl.Enode.send o1 pid any
    | _ ->
        () (* drop unexpected msg *)
);;
- : unit = ()

Erlang shell:

# starting erlang node with same cookie
erl -sname e1 -setcookie cookie

% check connection
erl> pong = net_adm:ping(o1@comp1).
pong

% utility to print whatever is in message queue
erl> F = fun() -> receive W -> io:format("got back: ~w~n", [W]) after 1 -> error end end.
#Fun<erl_eval.20.67289768>

% some tests ... send data, received it back
erl> {echo1, o1@comp1} ! {self(), {1,2,"test"}}.
{<0.37.0>,{1,2,"test"}}
erl> F().
got back: {1,2,[116,101,115,116]}
ok
% in the mean time, ocaml shell also display the data

That’s it! A wonderfull echo server 🙂

Amongst things on the to-do list:

  • Should not have to create a mbox and set its activity separately (need some wrapper)
  • Could have an onode ocaml toplevel which run one node by default and offer direct interface (e.g. “send”).

Yet another useless example for erlocaml, but this time it is not completely silly … it does something 🙂

In fact the project eocarve uses:

Idea is simple: an ocaml node run and provide a API for Erlang node to call the seamcarving library. See eocarve wiki for details/example.

Aim of this project was mainly to demonstrate use of ocamerl lib … however it may be usefull for an Erlang web app which would need seamcarving (heavy weight for CPU!). Had fun to integrate those lib in same app.

Minor updates for erlocaml (more exactly on ocamerl) … mostly code cleaning and minor refactoring!

Only one new feature: ability for Erlang processes to send message to unregistered ocaml processes. An example of that is “ex_node_mult” which generalize “ex_node_double” by dynamically creating ocaml process to perform a multiplication … yep I know, not very useful

% erlang code using the ocaml node
{byn, OcamlNode} ! {self(), 2},
By2 = receive M -> M after 1 -> error,
{byn, OcamlNode} ! {self(), 3},
By3 = receive M -> M after 1 -> error,
P1 = make_ref(),
P2 = make_ref(),
By2 ! {self(), P1, 21},
By3 ! {self(), P1, 21},
ok = receive {P1, 42} -> ok after 1 -> error end,
ok = receive {P2, 63} -> ok after 1 -> error end

With this feature, ocamerl begin to be usable … and that’s exactly what I will be doing next: experiment with more useful (at least less silly) examples! Some ideas are:

  • tiny mapreduce example with ocaml workers (erlang doing the network part);
  • using the ocaml image processing lib from erlang (not best example as lot of data need to be exchange … this is task to be solved for future erlocaml development);
  • others???

Next Page »