Java regex pattern for finding text between [[ ]]

Hiah.

I’m implementing a lua text editor into my engine, and wanted to create syntax highlighting that’ll work with lua. Most of it works so far, but I can’t wrap my head around multiline strings.

Unlike java, where multiline strings are done with /* and */, lua has them between these characters: [[ and ]], but I’m not much of a regex expert and was hoping someone could help me out with this issue.

Maybe I’m missing something, but wouldn’t this just be:

[[.*]]

Note that brackets [] are a special character in regex, so you have to escape them:

\[\[.*\]\]

And of course, backslashes are a special character in Java, so you have to escape them:

String regex = "\\[\\[.*\\]\\]";

More info can be found in the Pattern class.

If that’s not what you’re looking for, can you please post a small example showing the kinds of Strings you’re matching and what you’re trying to do with the regex?

Geez, regex appears to be a programming language within itself
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#sum

@KevinWorkman
Tried your idea out, but unfortunately there’s no effect

For comparison, here’s what I found online for finding a string surrounded by " marks:

"\"([^\"\\\\]|\\\\.)*\""
\[\[(.|\n)*\]\]

with \ escaped:

\\[\\[(.|\\n)*\\]\\]

that should work.

you can try it out here

Did you add the multiline option to the Pattern?


Pattern regex = Pattern.compile("--\\[\\[.+?\\]\\]", Pattern.MULTILINE);

Otherwise, regex will stop at a single line.

that won’t work.


import java.util.regex.Pattern;

public class MyClass {
    public static void main(String args[]) {
        String str = "hello[[ abc \n def \n ghi ]] world";
        Pattern regex = Pattern.compile("--\\[\\[.+?\\]\\]", Pattern.MULTILINE);
        String shc = regex.matcher( str ).replaceAll( "" );
        String phased = str.replaceAll("\\[\\[(.|\\n)*\\]\\]", "");
        System.out.println("SHC:" + shc);
        System.out.println("Phased:" + phased);
    }
}

gives the output:


SHC:hello[[ abc 
 def 
 ghi ]] world
Phased:hello world

Couldn’t seem to get yours to work, phased.

Though from your guys responses, I managed to write this:

"(\\[\\[)(.|\\R)*(\\]\\])"

Which seems to get the job done perfectly… until there’s two sets.

The second “]]” causes the entire script to be highlighted, as it regards the second “[[” as part of the inner text from the first “[[”.

Hmm…

Here is a fix.

Without testing it, this may be the fix for you:

"(\\[\\[)(.|\\R)*?(\\]\\])"

? will make it non greedy.

it seems \r is your line break in your editor instead of \n, so I guess that is why mine does not work for you.

The other problem will be that it will detect comments within ", so you will also need to work out how to get around that.

Don’t forget to include lots and LOTS of unit tests if you want that regex to be anything but a blackbox when you (or some unlucky other) come back to it in a few months time.

Yes, that’ll be important.

Unlikely, unless we’re gone back significantly in time, but \r\n is likely. I’d missed the addition of \R in Java 8 - nice!

https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#lineending

\R in javas regex is any combination of linebreak characters… I think…?

It all matters which one I check first. If I check for multiline comments before I check for quotation comments then the multiline will be captured. but even IF it doesn’t work,[sup] they’re the same color… [sup]so the user will never know! :slight_smile:[/sup][/sup]

Your fixed version works perfectly, though!

Regex was a nightmare before I learned it, but now, it’s a lucid nightmare.

But in all seriousness, regex is a very useful skill and I would recommend anyone take the time to learn it.