Warning: long post ahead … but mostly code 🙂

This post contains absolutely no idea nor thought: it is just a recap of my attempt to read an AVI file format (or RIFF file format, as I do not parse AVI data but only document structure). Let’s go directly in code with this simple module header!


dbg(Level, Template, Args) ->
    Indent = lists:flatten(lists:duplicate(Level, "  ")),
    io:format(Indent ++ Template, Args).

go() ->

go(Filename) ->
    {ok, #file_info{size=Size}} = file:read_file_info(Filename),
    {ok, IODev} = file:open(Filename, [read, binary]),
    {ok, Parts} = walk_data(0, [], IODev, 0, Size).

So, dbg is a crap function to print debug message … yeah, the old fashion way, it’s so simple for just a post! go is the main entry point and call the ‘real’ code: the approach is to call the walk_data function which will build and return a list of AVI structures (first parameter will be level of nesting, used for printing comment with a meaningful indentation, and second one is an accumulator for recursion to come).

I mainly use this short document: AVI is a (nested) sequence of two kind of structure, either LIST or CHUNK. More precisely, first come a mandatory RIFF-AVI LIST then multiple (and optional) RIFF-AVIX kind of LIST. Let’s walk those structures:

walk_data(Level, Parts, File, From, To) when From < To ->
    case chunk_or_list(File, From) of
        avichunk ->
            {ok, Part, NextPos} = walk_chunk(Level, File, From, To),
            walk_data(Level, [Part|Parts], File, NextPos, To);
        avilist ->
            {ok, Part, NextPos} = walk_list(Level, File, From, To),
            walk_data(Level, [Part|Parts], File, NextPos, To);
        Error ->
            {error, "maybe unexpected EOF", Error}
walk_data(_Level, Parts, _File, _From, _To) ->
    {ok, lists:reverse(Parts)}.

chunk_or_list(File, Pos) ->
    case file:pread(File, Pos, 4) of
        {ok, <<"RIFF">>} ->
        {ok, <<"LIST">>} ->
        {ok, _FourCC} ->
        eof ->

Walk is straightforward, from position From to To, accumulating result in reverse order (I love this [head|tail] list notation … was Prolog the first to use it?). chunk_or_list read few bytes (the FourCC header) to guess the kind of the next structure (CHUNK or LIST) in file; this structure is loaded, and walk continue.

walk_list(Level, File, From, To) ->
    case read_list_header(File, From) of
        {ok, AviList={avilist, List, FourCC, DataPos, DataSize}, NextPos} ->
            dbg(Level, "read list header (pos=~p, next=~p): List=~p FourCC=~p~n", [From, NextPos, List, FourCC]),
            {ok, SubPart} = case FourCC of
                <<"movi">> ->
                    dbg(Level, "... skipping list FourCC=~p...~n", [FourCC]),
                    {ok, []};
                _ ->
                    walk_data(Level + 1, [], File, DataPos, DataPos + DataSize)
            {ok, {AviList, SubPart}, NextPos};
        eof ->
            dbg(Level, "end of file~n", []),

read_list_header(File, Pos) ->
    case file:pread(File, [{Pos, 4}, {Pos + 4, 4}, {Pos + 8, 4}]) of
        {ok, [List, <<Size:4/little-unsigned-integer-unit:8>>, FourCC]} ->
            {ok, {avilist, List, FourCC, Pos + 12, Size - 4}, Pos + 8 + Size};
        {ok, [eof, eof, eof]} ->
        _ ->
            {error, "no list header to read, but not empty data~n"}

To walk a LIST, read the header (remember that the FourCC field length is part of the data size …), read the nested data (this re-use the walk_data), and return the LIST representation: a 2-tuple with first the header (could be a record) and then a list of sub parts. There is a useless test to not walk the real data because my test file is kind of big. Walking the CHUNK is quite the same.

walk_chunk(Level, File, From, To) ->
    case read_chunk_header(File, From) of
        {ok, AviChunk={avichunk, FourCC, DataPos, DataSize}, NextPos} ->
            %FourCC = <<_StreamNumber:2/binary, _DataType:2/binary>>},
            dbg(Level,  "read chunk header (pos=~p, next=~p): FourCC=~p DataSize=~p~n", [From, NextPos, FourCC, DataSize]),
            chunk_spy(FourCC, File, DataPos, DataSize),
            {ok, AviChunk, NextPos};
        eof ->
            dbg(Level, "end of file~n", []),

read_chunk_header(File, Pos) ->
    case file:pread(File, [{Pos, 4}, {Pos + 4, 4}]) of
        {ok, [FourCC, <<Size:4/little-unsigned-integer-unit:8>>]} ->
            NextPos = Pos + 8 + Size,
            PaddedNextPos = NextPos + (NextPos rem 2),
            {ok, {avichunk, FourCC, Pos + 8, Size}, PaddedNextPos};
        {ok, [eof, eof]} ->
        _ ->
            {error, "no chunk header to read, but not empty data~n"}

Similar to LIST, without nested data. Also, this went wrong at the first attempt: I found in this page that CHUNK data is padded to word boundary (grr).

But that’s all it take to read a well formated RIFF file. And for those wondering about the chunk_spy function, continue to read this blog :).