[Renamed] Ye mighty lexer compiler!

Have you tried http://www.antlr.org/ ?
I’ve used it a lot and it is a very convenient and efficient solution for this type of problem.
Especially given that it comes with a grammar design/interpret/debug tool.

That does seem interesting; my code currently works as a lex/tokenizer compiler, but I’m also looking to make it generate parse trees. In fact it is practically capable of doing that! What you have here though is the code to do so yourself; you can examine it and see how simple these programs are. Note that there is an error I’ve updated on my workstation but not here. Since I’m at work I have no access to it.

The nodes should [also] correspond to Von Neumann architecture, where the NODE structure can compile to code/commands. Of course I have no current ambitions for that, though it would be fun. Second to fun, it would be nice to have the code interpret the grammar in text, i.e. you could give it this text stream …

 program = block "." .
 
 block =
     ["const" ident "=" number {"," ident "=" number} ";"]
     ["var" ident {"," ident} ";"]
     {"procedure" ident ";" block ";"} statement .
 
 statement =
     [ident ":=" expression
     | "call" ident
     | "begin" statement {";" statement} "end"
     | "if" condition "then" statement
     | "while" condition "do" statement
     ] .
 
 condition =
     "odd" expression
     | expression ("="|"#"|"<"|"<="|">"|">=") expression
     .
 
 expression = ["+"|"-"] term {("+"|"-") term} .
 
 term = factor {("*"|"/") factor} .
 
 factor = ident | number | "(" expression ")" .

… and then use the following code …

NODE term_grammar = grab_identity ("term");

… I think I’ll do this some weekend or something.

Oh yeah one more thing, full source code is available at the Mighty Parser github!

Lazy as I am, could you post something that compiles, and that walks the generated tree ?

It’s lots easier to mess with something that works :slight_smile:

It compiles and walks the tree; just import the Jar into Eclipse. In the Parser class there is a boolean to debug_output the tree walking. I’ve made some additions in the office as I tend to do tool researching out of hours, so when I get back into the office I’ll give you the update that’s got it screwing around with a particular type of config file I use. It lexes it, and I’ll even make it parse the file (on Monday at around 18:20 GMT after I finish work).

If there is trouble compiling, try creating a java project (in Eclipse) and importing into the src(source) folder. If you don’t use Eclipse then maybe someone could cook up an ant makefile (I have no idea how easy/hard they are).

So if that code isn’t enough (at the moment) it should be at 18:20GMT on Monday - maybe over the weekend if I go in! So I guess I’ll have to do something to make this more ‘visible’; and outputting to the console is so pre 2k, I’ll output to a JFrame or something. I will [eventually] make it read the grammars from text in the not too distant future!

Edit: not today, I’ve got a ticket for a film screening 8) (wootages)

CFGs are indeed more powerful then regular expresions, in fact, infinitely times more powerful. See http://en.wikipedia.org/wiki/Chomsky_hierarchy

Type-3, or Regular languages, described by RE, are only a subset of CFG (Type-2). Very simple example of language you can’t describe with RE is a^nba^n (that is, some number of a, then b, and then same number of a again).

EDIT: wow, I’m a necromancer… :smiley:

Bring out your dead.

Generally, if the grammar is simple enough, I write descent parsers otherwise I like packrat/PEG.

wow finaly ! I will have wait for this answer nearly for 3 years now (ayway interresting)

Every time I try to use a damned parser generator I fail. I’ve put serious effort into it multiple times. Tried JavaCC and antlr. I just don’t seem to be able to wrap my head around how to approach solving ambiguity problems.

keldon85’s API looks nice and straightforward. I would worry that it silently ignores ambiguity though.

Although this topic has not been posted in for 180 days, I believe responding here is justified as there are unanswered questions.

I’ve uploaded the code to github now, but you should be aware that another library (parboiled) does the same job, and goes further to return a parse tree and provides a nice way to express your grammars.

[quote]wow finaly ! I will have wait for this answer nearly for 3 years now (ayway interresting)
[/quote]
Oh, I thought I pretty much responded to your question (though I didn’t quote you). Sorry about that.

[quote]keldon85’s API looks nice and straightforward. I would worry that it silently ignores ambiguity though.
[/quote]
It most likely does.


I ended up using it for lexing in a scripting system by the way.