22 September 2006

Erlang: String processing optimiziation

I wish to share an experience which I had today when writing code.

I had a following function, which splits binary on enter character. I used it to split binaries to separate lines.

Original code looked like this -- which is obvious dumb solution for
that task.


break_on_nl(B) -> break_on_nl(B, []).
break_on_nl(<<"\n", Tail/binary>>, Res) -> {lists:reverse(Res), Tail};
break_on_nl(<<>>, Res) -> {lists:reverse(Res), <<>>};
break_on_nl(<<C, Tail/binary>>, Res) -> break_on_nl(Tail, [C|Res]).


I.e. just cutting one character a time, checking that we reached \n and returning collected string and remaining tail if we reached it.

After some occasional loon on this code I realized, that I can greatly improve it. Instead rebuilding binary to list and then reversing list I could just check every byte of binary, check if it is an Enter char, and then return binary splitted to 2 parts.

Code like that:


break_on_nl1(B) -> break_on_nl1(0, B).
break_on_nl1(Len, Bin) when Len == size(Bin) ->
{Bin, <<>>};
break_on_nl1(Len, Bin) ->
<<Msg:Len/binary, Symb, Tail/binary>> = Bin,
case Symb of
$\n -> {Msg, Tail};
_ -> break_on_nl1(Len+1, Bin)
end.


Main advantage is that I do not modify original binary and it passed
from one call to next one without copying.

This little code rewrite gave me 2 times speed increase, which was quite essential, because I have strong timing constrains for that code.

No comments: