I know this isn’t really game related, but I am hoping there is enough java expertise here to point me in the correct direction.
I need to be able to scan many blocks of text and identify all the URLs present whether they are in an anchor tag, image tag, or plain text. I believe I can handle the URLs within valid html markup just fine. It is the plain text portion that is going to give me trouble, specifically, attempts to hide the URL within other html markup such as size or color. I can’t really show an example of the color trick, but, basically, they make the surrounding text close enough to background color to appear invisible.
aaaaawww.foo.orgaaaa
aaaaawww.foo.orgaaaaa
Regular expressions are not really an option as speed is critical (I need to be able to process 300-500 blocks of text a second where the smallest block of text would be roughly the size of this post)
Pointers to open source projects already handling something similar would be perfect, but, if I need to roll my own solution, parsing advice is welcome as well.
Thanks in advance.