Warning: long post ahead … but mostly code
This post contains absolutely no idea nor thought: it is just a recap of my attempt to read an AVI file format (or RIFF file format, as I do not parse AVI data but only document structure). Let’s go directly in code with this simple module header!
-module(avir).
-compile([export_all]).
-include_lib("kernel/include/file.hrl").
dbg(Level, Template, Args) ->
Indent = lists:flatten(lists:duplicate(Level, " ")),
io:format(Indent ++ Template, Args).
go() ->
go("test.avi").
go(Filename) ->
{ok, #file_info{size=Size}} = file:read_file_info(Filename),
{ok, IODev} = file:open(Filename, [read, binary]),
{ok, Parts} = walk_data(0, [], IODev, 0, Size).
So, dbg is a crap function to print debug message … yeah, the old fashion way, it’s so simple for just a post! go is the main entry point and call the ‘real’ code: the approach is to call the walk_data function which will build and return a list of AVI structures (first parameter will be level of nesting, used for printing comment with a meaningful indentation, and second one is an accumulator for recursion to come).
I mainly use this short document: AVI is a (nested) sequence of two kind of structure, either LIST or CHUNK. More precisely, first come a mandatory RIFF-AVI LIST then multiple (and optional) RIFF-AVIX kind of LIST. Let’s walk those structures:
walk_data(Level, Parts, File, From, To) when From < To ->
case chunk_or_list(File, From) of
avichunk ->
{ok, Part, NextPos} = walk_chunk(Level, File, From, To),
walk_data(Level, [Part|Parts], File, NextPos, To);
avilist ->
{ok, Part, NextPos} = walk_list(Level, File, From, To),
walk_data(Level, [Part|Parts], File, NextPos, To);
Error ->
{error, "maybe unexpected EOF", Error}
end;
walk_data(_Level, Parts, _File, _From, _To) ->
{ok, lists:reverse(Parts)}.
chunk_or_list(File, Pos) ->
case file:pread(File, Pos, 4) of
{ok, <<"RIFF">>} ->
avilist;
{ok, <<"LIST">>} ->
avilist;
{ok, _FourCC} ->
avichunk;
eof ->
eof
end.
Walk is straightforward, from position From to To, accumulating result in reverse order (I love this [head|tail] list notation … was Prolog the first to use it?). chunk_or_list read few bytes (the FourCC header) to guess the kind of the next structure (CHUNK or LIST) in file; this structure is loaded, and walk continue.
walk_list(Level, File, From, To) ->
case read_list_header(File, From) of
{ok, AviList={avilist, List, FourCC, DataPos, DataSize}, NextPos} ->
dbg(Level, "read list header (pos=~p, next=~p): List=~p FourCC=~p~n", [From, NextPos, List, FourCC]),
{ok, SubPart} = case FourCC of
<<"movi">> ->
dbg(Level, "... skipping list FourCC=~p...~n", [FourCC]),
{ok, []};
_ ->
walk_data(Level + 1, [], File, DataPos, DataPos + DataSize)
end,
{ok, {AviList, SubPart}, NextPos};
eof ->
dbg(Level, "end of file~n", []),
eof
end.
read_list_header(File, Pos) ->
case file:pread(File, [{Pos, 4}, {Pos + 4, 4}, {Pos + 8, 4}]) of
{ok, [List, <<Size:4/little-unsigned-integer-unit:8>>, FourCC]} ->
{ok, {avilist, List, FourCC, Pos + 12, Size - 4}, Pos + 8 + Size};
{ok, [eof, eof, eof]} ->
eof;
_ ->
{error, "no list header to read, but not empty data~n"}
end.
To walk a LIST, read the header (remember that the FourCC field length is part of the data size …), read the nested data (this re-use the walk_data), and return the LIST representation: a 2-tuple with first the header (could be a record) and then a list of sub parts. There is a useless test to not walk the real data because my test file is kind of big. Walking the CHUNK is quite the same.
walk_chunk(Level, File, From, To) ->
case read_chunk_header(File, From) of
{ok, AviChunk={avichunk, FourCC, DataPos, DataSize}, NextPos} ->
%FourCC = <<_StreamNumber:2/binary, _DataType:2/binary>>},
dbg(Level, "read chunk header (pos=~p, next=~p): FourCC=~p DataSize=~p~n", [From, NextPos, FourCC, DataSize]),
chunk_spy(FourCC, File, DataPos, DataSize),
{ok, AviChunk, NextPos};
eof ->
dbg(Level, "end of file~n", []),
eof
end.
read_chunk_header(File, Pos) ->
case file:pread(File, [{Pos, 4}, {Pos + 4, 4}]) of
{ok, [FourCC, <<Size:4/little-unsigned-integer-unit:8>>]} ->
NextPos = Pos + 8 + Size,
PaddedNextPos = NextPos + (NextPos rem 2),
{ok, {avichunk, FourCC, Pos + 8, Size}, PaddedNextPos};
{ok, [eof, eof]} ->
eof;
_ ->
{error, "no chunk header to read, but not empty data~n"}
end.
Similar to LIST, without nested data. Also, this went wrong at the first attempt: I found in this page that CHUNK data is padded to word boundary (grr).
But that’s all it take to read a well formated RIFF file. And for those wondering about the chunk_spy function, continue to read this blog
.
December 16, 2007 at 5:36 am
Sorry sorry! I promised a following to this post … and never wrote it. In fact I did wrote some code to transform some parts of the AVI file (mainly in header section).
My goal was to enable me to encode AVI file on linux for my Samsung T9. In fact, Samsung uses common free codecs, but they add some data in header part that is checked by the T9 to see if it is a correct AVI file encoded by the Samsung tool. I did find which header is changed, but it is apparently not a constant value (like a version or whatever), and I did not spent time to find how to calculate this header field.
I can still publish the writing part of my AVI walk if somebody is interested.
December 27, 2007 at 11:19 pm
Did you figure out how to convert an avi to play on the Samsung T9 using freely available Linux tools?
December 28, 2007 at 2:15 am
No, I haven’t found how to do that!
I know how to use mencoder to create the good file format, but Samsung added some protection to the T9 … I don’t know why??? (any comment is welcome on that subject!)
I bought the Samsung because of UMS support; I thought it will be easy to use with linux but it isn’t. I’m a bit puzzle that Samsung use free codec, open source software, and close everything so I cannot use it with linux!
That’s why I would not recommend to buy Samsung. Quality is good, but you may have more chance to use an Apple or Sony player.