html text "cleaner" and/or applet editor

I need two things for JGF at the moment, both to support online editing of HTML source code.

The easy one is “something to edit the HTML source embedded in a webpage”. I can do a search for html applet editors (easy) - although I’d appreciate any suggestions for particularly good ones if you know of any (there’s an awful lot out there).

The hard one is “something to post-process the HTML source and strip everything bad and convert plain text which has been manually formatted into HTML”. The main problem is … I have no idea what google terms to search for. Everything I’ve tried drew blanks :(. The main things I can think of that it needs to do are:

  • strip all nasty stuff, basically all scripting etc (even “cunningly disguised scripting” that is non-obvious)
  • detect “blank lines” and reformat the preceding and postceding text into two paragraphs by wrapping them with P tags
  • detect “probable website URL’s” and replace with A HREF links
  • …etc

All this stuff is commonly done by forum software and many other things, but I’ve never seen / heard of a particular open-source java library for it (surely there must be one, somewhere??)

PS I can move this into off-topic if anyone thinks it is a bit unrelated to games. Server-side post-processing of HTML is useful for a lot of games (e.g. any with online highscores or other user-enterable data) but perhaps that’s a bit tenuous.