Looking for a simple html parser to help render speech bubble chat text

Hi,

I’d like to render chat text in a little rectangle, kind of like a speech bubble. It would be cool if the text could have html tags in it so that they could be rendered in bold, and different colours, with line breaks, and other little things like that.

I know I can do that with an AWT/Swing component like JTextPane or something, but I want to render the text using pure java2D for performance reasons.

Do you think I should parse the html tags myself? or is there a good light-weight package that can easily do it better? Like if the package could call a listener method when it spots a BOLD tag then I could set the java.awt.Graphics object to a bold style, then continue rendering the text using java.awt.Graphics.drawString(text, x, y).

Thanks heaps! 8)
Keith

Ooops, I should have googled harder before posting this… I’ve found this project:

http://htmlparser.sourceforge.net/

The Lexer is able to spit out the tags in order, and has different classes for each tag so I can even ignore the ones that I’m not worrying about. Great!

http://htmlparser.sourceforge.net/javadoc/org/htmlparser/lexer/Lexer.html

[s]Whats wrong with a JLabel? ???

http://java.sun.com/docs/books/tutorial/uiswing/components/html.html[/s]

[quote]I know I can do that with an AWT/Swing component like JTextPane or something, but I want to render the text using pure java2D for performance reasons.
[/quote]
Are you sure it’s going to change anything performance wise? Won’t just calling printComponent work?

Thanks for pointing that out, but because I’m rendering in my own thread, I can’t render any Swing components without synchronizing with Swing’s Event Dispatch Thread, otherwise you get weird deadlocks and exceptions. Also, Swing components are painted pretty inefficiently - backgrounds are painted on top of each other which kills the FPS when you render the component every frame.

Swing has a html parser. No css. this extracts the text only:


package ui.parsers;

import java.awt.Color;
import java.net.URI;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Locale;
import java.util.Map;
import javax.swing.text.AttributeSet;
import javax.swing.text.DefaultStyledDocument;
import javax.swing.text.StyledDocument;
import javax.swing.text.html.HTMLEditorKit.ParserCallback;
import javax.swing.text.html.HTML.Tag;
import javax.swing.text.html.HTML.Attribute;
import javax.swing.text.MutableAttributeSet;
import javax.swing.text.Element;
import javax.swing.text.SimpleAttributeSet;
import javax.swing.text.StyleConstants;
import util.swing.BufferedStyledDocumentBuilder;

/**
 * Parses html to get the text and images
 * saves the a list of the current names of <a name="names"></a>
 * on the attribute HTML.Attribute.NAME for the revelant attributesets
 * The locations for inside links are inside the document properties.
 * 
 * So if you want to a link - access the clicked attributeset,
 * get the name element of the list (the one you want):
 * Object key = att.getAttribute(HTML.Attribute.HREF);
 * Then get the correct index :
 * Integer i = (Integer) doc.getProperty(key);
 * 
 * Saves any uri that references an file (file:\\ ...)
 * on the attribute HTML.Attribute.ARCHIVE for the revelant attributesets
 * As before:
 * URI key = (URI)att.getAttribute(HTML.Attribute.ARCHIVE);
 * 
 * @author i30817
 *
 */
public class HTMLCallBack extends ParserCallback {

    private final BufferedStyledDocumentBuilder builder;
    private final MutableAttributeSet memory = new SimpleAttributeSet();
    private final List<String> hrefList = new LinkedList<String>();
    private final Map<String, Integer> nameMap = new HashMap<String, Integer>();
    private boolean linkVisited = false;

    public HTMLCallBack(DefaultStyledDocument d) {
        super();
        builder = new BufferedStyledDocumentBuilder(d);
    }

    @Override
    public void handleEndOfLineString(String eof) {
        //they say flush() is called after this by the parser.
        //They are lying liars who lie.
        builder.commit();
        commitNames();
    }

    @Override
    public void handleText(char[] text, int position) {
        //System.out.println("handleText("+new String(text)+")");

        builder.append(text, memory);
        //Only erase <a> attributes if we encountered real text.
        //This means that we can't start and end a link on whitespace.
        if (linkVisited) {
            boolean needsCleanup = false;

            for (int i = text.length - 1; i > -1; i--) {
                if (text[i] != ' ') {
                    needsCleanup = true;
                    break;
                }
            }
            if (needsCleanup) {
                memory.removeAttribute(StyleConstants.Underline);
                memory.removeAttribute(Attribute.HREF);
                memory.removeAttribute(Attribute.ARCHIVE);
                memory.removeAttribute(StyleConstants.Foreground);
                linkVisited = false;
            }
        }
    }

    @Override
    public void handleEndTag(Tag t, int position) {
        //System.out.println("handleEndTag(" + t.toString() + ")");
        eraseMemory(t);
        if (t.isBlock()) {
            builder.appendSpace(memory);
        }
        if (t.breaksFlow()) {
            builder.appendEnd(memory);
        }
    }

    @Override
    public void handleSimpleTag(Tag t, MutableAttributeSet a, int p) {
        //System.out.println("handleSimpleTag(" + t.toString() + ")");
        if (t == Tag.IMG) {
            String attribute = (String) a.getAttribute(Attribute.ALT);
            //couldn't care less about images that are not links
            if (attribute != null && memory.isDefined(Attribute.HREF)) {
                handleText(attribute.toCharArray(), p);
                handleText(new char[]{' '}, p + attribute.length());
            }
        }

        if (t.isBlock()) {
            builder.appendSpace(memory);
        }
        if (t.breaksFlow()) {
            builder.appendEnd(memory);
        }
    }

    @Override
    public void handleStartTag(Tag t, MutableAttributeSet a, int pos) {
        setMemory(t, a, pos);
    }

    @Override
    public void handleError(String errorMsg, int pos) {
        //System.out.println("handleError(" + errorMsg + ")");
    }

    public StyledDocument getDocument() {
        return builder.getDocument();
    }

    private void commitNames() {
        for (String hrefName : hrefList) {
            Integer i = nameMap.get(hrefName);
            if (i == null) {
                continue;
            }

            Element e = builder.getDocument().getCharacterElement(i);
            AttributeSet set = e.getAttributes();
            //The lists are needed since there is a weird edge case
            //when two hrefs from hreflist can point to the same place.
            //because the names have no "real" text between them 
            //so they shadow. Then if you move the text around
            //it will break when rebuilding.
            if (set.isDefined(Attribute.NAME)) {
                ((List) set.getAttribute(Attribute.NAME)).add(hrefName);
            } else {
                SimpleAttributeSet s2 = new SimpleAttributeSet();
                List list = new LinkedList();
                list.add(hrefName);
                s2.addAttribute(Attribute.NAME, list);
                builder.getDocument().setCharacterAttributes(e.getStartOffset(), 1, s2, false);
            }
            builder.getDocument().putProperty(hrefName, i);
        }
    }

    private void setMemory(Tag t, MutableAttributeSet attributes, int pos) {

        String attribute = (String) attributes.getAttribute(Attribute.ID);
        addNameAttribute(attribute);
        //TODO: still buggy
        //addStyles((String) attributes.getAttribute(Attribute.STYLE));

        if (t == Tag.BODY) {
            //Bug fix for swing parser stupidity
            builder.clear();
        } else if (t == Tag.I || t == Tag.EM) {
            StyleConstants.setItalic(memory, true);
        } else if (t == Tag.B) {
            StyleConstants.setBold(memory, true);
        } else if (t == Tag.U) {
            StyleConstants.setUnderline(memory, true);
        } else if (t == Tag.S || t == Tag.STRIKE) {
            StyleConstants.setStrikeThrough(memory, true);
        } else if (t == Tag.FONT) {

            attribute = (String) attributes.getAttribute(Attribute.FACE);
            if (attribute != null) {
                StyleConstants.setFontFamily(memory, attribute);
            }
            Color c = decodeColor((String) attributes.getAttribute(Attribute.COLOR));
            if (c != null) {
                StyleConstants.setForeground(memory, c);
            }
        } else if (t == Tag.A) {
            attribute = (String) attributes.getAttribute(Attribute.NAME);
            addNameAttribute(attribute);

            attribute = (String) attributes.getAttribute(Attribute.HREF);

            if (attribute == null) {
                return;
            }
            //Dont allow href links
            int index = attribute.indexOf('#');
            if (index == -1 && attribute.startsWith("file:/")) {
                //outside file link
                memory.addAttribute(Attribute.ARCHIVE, URI.create(attribute));
                linkVisited = true;
            } else if ((index == 0 && attribute.length() > 1) || attribute.startsWith((String) getDocument().getProperty("filename"))) {
                //inside file link
                attribute = attribute.substring(index + 1);
                hrefList.add(attribute);
                memory.addAttribute(Attribute.HREF, attribute);
                StyleConstants.setUnderline(memory, true);
                StyleConstants.setForeground(memory, Color.BLUE);
                linkVisited = true;
            }
        }
    }

    private void eraseMemory(Tag t) {
        //TODO Still buggy
        //removeStyles();

        if (t == Tag.I || t == Tag.EM) {
            memory.removeAttribute(StyleConstants.Italic);
        } else if (t == Tag.B) {
            memory.removeAttribute(StyleConstants.Bold);
        } else if (t == Tag.U) {
            memory.removeAttribute(StyleConstants.Underline);
        } else if (t == Tag.S || t == Tag.STRIKE) {
            memory.removeAttribute(StyleConstants.StrikeThrough);
        } else if (t == Tag.FONT) {
            memory.removeAttribute(StyleConstants.FontFamily);
            memory.removeAttribute(StyleConstants.Foreground);
        }
    }

    private void addNameAttribute(String name) {
        //only save the first time we encounter a name...
        if (name == null || nameMap.containsKey(name)) {
            return;
        }
        nameMap.put(name, builder.getLength());
    }

    private Color decodeColor(String code) {
        Color c = null;
        if (code != null) {
            code = code.trim();
            try {
                //it from the object Color.RED
                c = (Color) Color.class.getField(code.toLowerCase(Locale.ENGLISH)).get(Color.RED);
            } catch (Exception e) {
                try {
                    /*Try as a number*/
                    c = Color.decode(code);
                } catch (Exception f) { /*Give up*/

                }
            }
        }
        return c;
    }
}


/*
 * To change this template, choose Tools | Templates
 * and open the template in the editor.
 */
package util.swing;

import java.lang.reflect.Method;
import java.util.LinkedList;
import javax.swing.text.AttributeSet;
import javax.swing.text.BadLocationException;
import javax.swing.text.DefaultStyledDocument;
import javax.swing.text.DefaultStyledDocument.ElementSpec;
import javax.swing.text.Element;

/**
 * A fast buffered builder for creating DefaultStyledDocuments
 * @author i30817
 */
public class BufferedStyledDocumentBuilder {

    private final long MEMORY_CAPACITY_CHARS;
    private int bufferedChars;
    private final LinkedList<ElementSpec> textList = new LinkedList<ElementSpec>();
    private final char[] par = {'\n'};
    private final char[] space = {' '};
    private final DefaultStyledDocument doc;
    private final Append appendFunctor;

    private interface Append {

        void append();
    }

    private final class AppendStyledDocument implements Append {

        private final Method bulkInsert;
        private final Object[] bulkInsertArgs;

        public AppendStyledDocument() {
            try {
                Class[] args = new Class[]{Integer.TYPE, new ElementSpec[]{}.getClass()};
                bulkInsertArgs = new Object[2];
                bulkInsert = DefaultStyledDocument.class.getDeclaredMethod("insert", args);
                //naughty
                bulkInsert.setAccessible(true);
            } catch (Throwable ex) {
                throw new RuntimeException(ex);
            }
        }

        public void append() {
            ElementSpec[] arr = new ElementSpec[textList.size()];
            bulkInsertArgs[0] = doc.getLength();
            bulkInsertArgs[1] = textList.toArray(arr);
            try {
                bulkInsert.invoke(doc, bulkInsertArgs);
            } catch (Exception ex) {
                //Should never happen but whatever
                throw new RuntimeException(ex);
            }
        }
    }

    private final class AppendMyStyledDocument implements Append {

        private final Method bulkInsert;
        private final Object[] bulkInsertArgs;

        public AppendMyStyledDocument() {
            try {
                Class[] args = new Class[]{Integer.TYPE, new ElementSpec[]{}.getClass(), Integer.TYPE};
                bulkInsertArgs = new Object[3];
                bulkInsert = Class.forName("ui.documents.MyStyledDocument").getMethod("insert", args);
            } catch (Throwable ex) {
                throw new RuntimeException(ex);
            }
        }

        public void append() {
            ElementSpec[] arr = new ElementSpec[textList.size()];
            bulkInsertArgs[0] = doc.getLength();
            bulkInsertArgs[1] = textList.toArray(arr);
            bulkInsertArgs[2] = bufferedChars;
            try {
                bulkInsert.invoke(doc, bulkInsertArgs);
            } catch (Exception ex) {
                throw new RuntimeException(ex);
            }
        }
    }

    public BufferedStyledDocumentBuilder(DefaultStyledDocument document) {
        doc = document;

        //non-dependent runtime testing if it is my subclass
        boolean isMyClass = false;
        try {
            isMyClass = Class.forName("ui.documents.MyStyledDocument").isInstance(document);
        } catch (Throwable e) {
            //do nothing...
        }

        if (isMyClass) {
            appendFunctor = new AppendMyStyledDocument();
        } else {
            appendFunctor = new AppendStyledDocument();
        }
        //convert the free memory from bytes to dbytes (char size) and quarter it.
        MEMORY_CAPACITY_CHARS =  (Runtime.getRuntime().freeMemory() / 2L) / 4L;
    }

    public Integer getLength() {
        return Integer.valueOf(doc.getLength() + bufferedChars);
    }

    public DefaultStyledDocument getDocument() {
        return doc;
    }

    public void clear() {
        try {
            doc.remove(0, doc.getLength());
            bufferedChars = 0;
            textList.clear();
        } catch (BadLocationException ex) {
            throw new RuntimeException(ex);
        }
    }

    /**
     * Removes the last inserted \n.
     * No checking so you must know you inserted a \n by insertEnd before.
     */
    public void removeLast() {
        if (textList.isEmpty()) {
            try {
                doc.remove(doc.getLength() - 1, 1);
            } catch (BadLocationException ex) {
                throw new RuntimeException(ex);
            }

        } else {
            textList.removeLast();
            textList.removeLast();
            textList.removeLast();
            bufferedChars = bufferedChars - 1;

        }
    }

    public void appendSpace(AttributeSet currentAttributes) {
        append(space, currentAttributes);
    }

    public void appendEnd(AttributeSet currentAttributes) {
        bufferedChars += 1;
        ElementSpec e = new ElementSpec(currentAttributes.copyAttributes(), ElementSpec.ContentType, par, 0, 1);
        textList.add(e);
        e = new ElementSpec(null, ElementSpec.EndTagType);
        textList.add(e);
        //Every non first Element needs a start type so the stack based document tree doesn't go bonkers
        //Paragraph element is aleatory, using the first element to reuse the object.
        Element paragraph = doc.getParagraphElement(0);
        AttributeSet pattr = paragraph.getAttributes();
        e = new ElementSpec(pattr, ElementSpec.StartTagType);
        textList.add(e);
    }

    public void append(char[] s, AttributeSet currentAttributes) {
        textList.add(new ElementSpec(currentAttributes.copyAttributes(), ElementSpec.ContentType, s, 0, s.length));
        bufferedChars += s.length;

        if (bufferedChars > MEMORY_CAPACITY_CHARS) {
            commit();
        }
    }

    public void append(char[] s, int len, AttributeSet currentAttributes) {
        textList.add(new ElementSpec(currentAttributes.copyAttributes(), ElementSpec.ContentType, s, 0, len));
        bufferedChars += len;

        if (bufferedChars > MEMORY_CAPACITY_CHARS) {
            commit();
        }
    }

    public void commit() {
        appendFunctor.append();
        bufferedChars = 0;
        textList.clear();
    }
}


Use like this:


    private StyledDocument parseHTML(Reader reader,
            boolean ignoreCharset, String filename)
            throws IOException {

        if (htmEditor == null) {
            htmEditor = new ParserDelegator();
        }
        HTMLCallBack c = getHtmlCallback();
        c.getDocument().putProperty("filepath", filename);
        int fileNameIndex = filename.lastIndexOf('\\');
        if (fileNameIndex != -1) {
            filename = filename.substring(fileNameIndex + 1);
        }
        c.getDocument().putProperty("filename", filename);

        htmEditor.parse(reader, c, ignoreCharset);
        return c.getDocument();
    }

As Mr_Light said, JLabel seems to do the job perfectly fine.

I just wrote this code, so it’s pretty much untested. It turned out the JLabel must be embedded in a Frame, but I don’t actually show() the Frame, so that will be fine.

HTML support from JLabel in a utility class


   static JFrame                     htmlHolder;
   static JLabel                     htmlRender;

   private static void initHTML()
   {
      Runnable task = new Runnable()
      {
         @Override
         public void run()
         {
            htmlRender = new JLabel("<html>");
            htmlRender.setVerticalAlignment(SwingConstants.TOP);

            htmlHolder = new JFrame();
            htmlHolder.getContentPane().setLayout(new BorderLayout());
            htmlHolder.getContentPane().add(htmlRender);
         }
      };
      SwingUtilities.invokeLater(task);
   }

   public static void renderHTML(final BufferedImage img, final String html)
   {
      Runnable task = new Runnable()
      {
         @Override
         public void run()
         {
            htmlRender.setText("<html>" + html);
            htmlRender.setPreferredSize(new Dimension(img.getWidth(), img.getHeight()));
            htmlHolder.pack();

            Graphics g = img.getGraphics();
            htmlRender.paint(g);
            g.dispose();
         }
      };

      try
      {
         SwingUtilities.invokeAndWait(task);
      }
      catch (InterruptedException exc)
      {
         throw new IllegalStateException(exc);
      }
      catch (InvocationTargetException exc)
      {
         throw new IllegalStateException(exc);
      }
   }

Quick and dirty usage of utility class:


         initHTML();

         int w = 128;
         int h = 96;

         BufferedImage img = new BufferedImage(w, h, BufferedImage.TYPE_INT_ARGB);

         // make transparant, draw a cross
         {
            Graphics2D g2d = img.createGraphics();
            g2d.setComposite(AlphaComposite.getInstance(AlphaComposite.CLEAR, 0.0f));
            g2d.fillRect(0, 0, img.getWidth(), img.getHeight());
            g2d.dispose();

            Graphics g = img.getGraphics();
            g.setColor(Color.CYAN);
            g.drawLine(0, 0, w, h);
            g.drawLine(0, h, w, 0);
            g.dispose();
         }

         // render the HTML on top of it
         String html = "";
         html += "<div style=\"background-color:#8080C0; margin-top:4px\"><center>";
         html += "<font color=#C0C0FF family=arial size=4>Hello <u>there</u>, ";
         html += "I <font color=#666699>am</font> <i>your</i> curious ";
         html += "<div style=\"background-color:#E0E0FF; padding:8px;\">father</div>";
         renderHTML(img, html);

As you see, it supports basic stylesheets, like background-color, margin, padding, but the ‘advanced’ stuff will simply be ignored.

This is the generated image:

It’s more than enough for a ‘simple html parse’ I think!