Updated 20 July 2012 with Github link
I’ve done it! Ye mighty auto tokenizer allows you to define a grammar and tokenizer strings. I’ve never done it, looked at other code, or read anything on how to do this so there is a chance I’ve used a pattern I never knew I was using, etc.
You feed it a token grammar and a string, and this son-of-a-gun will give you the next token. Warning: like any declarative language this will do what you tell it to do, therefore do not give it faulty grammars (like ones that will accept nothing)!
ArrayList<Node> alphaNumericList = new ArrayList<Node>();
ArrayList<Node> sentenceList = new ArrayList<Node>();
ArrayList<Node> anotherWordList = new ArrayList<Node>();
ArrayList<Node> wordList = new ArrayList<Node>();
ArrayList<Node> whitespaceList = new ArrayList<Node>();
Node whitespace;
Node word;
Node anotherWord;
Node sentence;
whitespaceList.add ( TerminalFactory.createTerminalString(" "));
whitespaceList.add ( RepetitionFactory.createRepetition(TerminalFactory.createTerminalString(" ")) );
whitespace = ListFactory.createList(whitespaceList);
alphaNumericList.add(number());
alphaNumericList.add(letter());
wordList.add (letter());
wordList.add(RepetitionFactory.createRepetition( letter() ));
word = ListFactory.createList(wordList);
anotherWordList.add(whitespace);
anotherWordList.add(word);
anotherWord = ListFactory.createList(anotherWordList);
sentenceList.add (word);
sentenceList.add (RepetitionFactory.createRepetition(anotherWord));
sentenceList.add (OptionFactory.createOptional(TerminalFactory.createTerminalString(".")));
sentence = ListFactory.createList(sentenceList);
System.out.println ( Parser.parse("The dirty fairies are dead", sentence));
System.out.println ( Parser.parse("The dirty fairies are dead.", sentence));
System.out.println ( Parser.parse("The dirty fai.ries are dead.", sentence));
Output:
26
27
14
Legal disclaimer:
By downloading the said file, knowingly or not, you agree to have no rights to its code or your knowledge of the knowledge gained upon mentally processing it. You have no copying rights, understanding rights, or right to process any thoughts derived from the knowledge of the said file. You are however given the right to live and breathe under the condition you do so without violation of any stated rule in this disclaimer.
Code available at https://github.com/keldon85-avasopht/mighty-parser
As for the approach, I’ve never done anything like this before, I just looked at the rules for grammars and created factories for them. The parser does most of the work, but you can see from the [poorly (un)commented] code how it works.