I read a paper today titled "Regular Expression Matching Can Be Simple And Fast". I thought his results were interesting enough to compare to my own regular expression engine. (Yes, I'm dorky enough to have my own engine.) I call my program "chrep", which is short for "Church's Regular Expression Parser".
Below, I've tested four engines with the string "aaaaaaaaaaaaaaaaaaaa" against the expression "a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaa" (which should always return a successful match). (Note: Sorry if you aren't able to read the entire command. My CSS-foo sucks and you probably have to read the source code.)
jcchurch@linux:~> time echo "aaaaaaaaaaaaaaaaaaaa" | chrep 'a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaa' [aaaaaaaaaaaaaaaaaaaa] in : aaaaaaaaaaaaaaaaaaaa real 0m5.137s user 0m2.964s sys 0m0.012s
jcchurch@linux:~> time echo "aaaaaaaaaaaaaaaaaaaa" | egrep 'a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaa' aaaaaaaaaaaaaaaaaaaa real 0m0.055s user 0m0.008s sys 0m0.008s
jcchurch@linux:~> time echo "aaaaaaaaaaaaaaaaaaaa" | awk '/a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaa/{print}' aaaaaaaaaaaaaaaaaaaa real 0m0.049s user 0m0.004s sys 0m0.004s
jcchurch@linux:~> time echo "aaaaaaaaaaaaaaaaaaaa" | perl -ne'print if /a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaa/' aaaaaaaaaaaaaaaaaaaa real 0m0.393s user 0m0.328s sys 0m0.008s
Perl takes almost a half second. That's horribly slow compared to grep and awk, but still 13 times faster than my parser.
My parser ('cause I know you are wondering) is a simple interpretative, recursive decent parser. There is no compiling of the expression, which I know would speed things up. I'm glad to see that I'm not too far behind the competition, and this will encourage me to make those benchmarks.
1 comment:
It's insane that yours is already that good :-)
Post a Comment