August 2008


Lately I’ve been playing a bit … well, that’s a programmer kind of game: I wrote yet other implementations (with yet other bugs I assume) of some classic data-structures, in Ocaml.

Those are all about string set or string map, and are purely functional (thus persistent).

The code is in the project ocaml-stringset (http://github.com/khigia/ocaml-stringset/tree/master). There is a TST (ternary search tree), a Collapsed TST, a Trie (bootstrap version of Chris Okasaki) and maybe other will be added. Most of the code is in file/module StringSet so it is easy to embed in a project. There is one example file for each datastructure (as well as some unit test … and the unit tests of TST iss pretty bad as it test the internal structure, not the API).

Haven’t been as far as doing stress tests though it would be nice to compare the different behaviours.

Update: those algo are implemented for string, but we could certainly provide a generic implementation for any datatype that is iterable collection and whose items are comparable. I may try to make some generic version later on.

Update: most of those tree algo implementation are NOT tail recursive; this would probably need continuations … I’ll add this if/when needed (in production environment, it should probably be done anyway except if the use case is restricted and ensure a limit in recursion depth).

(that time I learnt the lesson: never promise what the next post will be about coz I often not deliver)

Optimization algo need to explore the space of solutions, and this exploration is a major part of the algo! Let’s focus on one simple case: exploring in random order two finite dimensions. For example, let two integer variables, V1 taking value in interval [1, 100] and V2 taking value in interval [1, 20]: how to explore all the possible pairs (v1,v2) in random order?

A simple (almost-)solution is given by the following python code:

[(e1,e2) for e1 in shuffle(range([1,100)), for e2 in shuffle(range(1,20))]

which is in fact the cartesian product of the shuffles of V1 and V2 intervals … but this is ordered by values of the first shuffle and thus not random.

OK, now is the bad news: I have no solution completely space efficient to propose. My best effort is a solution which compute at least two shuffles, one on each dimension (kind of O(n + m) for space) and is even not random … just random enough for most of the cases 😛

So how? the solution is describe in this post http://weblog.raganwald.com/2007/02/haskell-ruby-and-infinity.html and especialy the section about the tabular view of the cartesian product. Did you read it? so the proposed solution is to navigate the cross-product table along its diagonal instead of row by row … simply smart isn’t it? It give “impression” of randomness 😛

Ocaml code for this algo is here: http://github.com/khigia/ocaml-anneal/tree/master/walks.ml (function pair_permutation_random_seq); It uses extensively an ad-hoc stream implementation (Seq module) to perform the walk lazyly. It was a good example to test the stream implementation!