Java Regex Primer

For want of a better place to announce it, I’ve just whipped together a small tutorial on Java’s Regex support (aimed at people already familure with regex in other languages or those learning in conjunction with a detailed regex tutorial). Essentially it’s a quick description of the Matcher class complete with examples.

http://tanksoftware.com/tutes/article-regexprimer.pdf

Java regex is as fast if not faster than other languages too - I wrote some code last year which parsed nine thousand letters (HTML) extracting the bodies into a file. It took only 2 minutes (and this was off a network drive too!) to complete.

Cheers,

Will.

Nice tutorial - I never really understood regex :-[

Personally I don’t care about the issue of topicality…

I allways wanted to know something about regex. Does that stuff creates an optimized automata all the time or does it uses some sort of hugly backtracking method ?

I’m not quite sure - but I do know that it is increadably fast. In a real world application of java regex - I parsed 9,000 HTML files off the network, extracting (and saving) their body text. It took less than 120 seconds! In comparison, the automated converting of said files from Word documents to HTML took overnight (not using java regex ;)).

Will.

Comparing it with Word is not fair for Word. ;D You may want to check JavaCC cause it builds a simplified automata for the lexer part and it is much more flexible. For what you are saying the java Regex, most probably, works with a NDFA.