Character encodings in complex enterprise environment

First, some up-front warnings: this question has nothing to do with gaming, and most importantly, I’m not directly involved with the project that’s failing (it’s a friend’s who I’m helping out debug).

Anyway, here it goes:
We have an online application that is in character encoding hell. If we make a request to the server, with the é character in a url query parameter, the browser encodes it as %C3%A9. The query parameter, as presented by struts and tomcat, treats %C3%A9 as é, which seems to be the Latin1 encoding.

This makes no sense to me, and it continues to happen, even when we force tomcat to use UTF-8 uri encoding. To make matters worse, we get anomalous results when writing the query parameter out with a jsp.

For example, if the jsp did something like:


Query parameter: ${query}
Hardcoded: é

and the result, viewing the page with Latin1 character encoding would be:


Query parameter: é
Hardcoded: é

but when set to view as UTF-8, we get: (the value of query was still é, when it’s stored as a String)


Query parameter: é
Hardcoded: ?

where the ? represents the unknown character symbol.

My jsp skills are not particularly great and we’re at an utter loss as to what’s causing these problems. So I guess this is a hail mary request to see if anyone has any help.

As for the last point, if you make sure that your JSPs are actually encoded as UTF-8 (use a decent text editor to check) and the HTML meta tag is also set to UTF-8, you will not have problems with hard-coded characters.

As for the problems with Tomcat and URL-encoding, I don’t know. Since Struts magically passes URL parameters to actions it’s hardcoded to use Latin1 for conversion?

How are you forcing the tomcat URI encoding? Using the server.xml file?

// Json

Json, it seems that struts is ust using the default j2ee spec character encoding, when none has been defined (ISO-8859-1). We recently just had this problem in our application. You should be able to force struts to use a different default character encoding via this method, which simply uses a servlet filter to set the encoding before struts has a chance to mangle the parameters.

PS: Also, make sure that whatever you’re inspecting the output with something that is also UTF-8 encoded, otherwise you may not actuall be looking at the results you think you are.

  • Immutate

Thanks for the replies, to tell Tomcat to use utf-8 we added the uri-encoding parameter to the server.xml (not sure exactly how it looks).

I can check the encoding of the jsps once the weekend is over. Unfortunately I can’t use tags because the generated jsp content is inserted into another page.

This sounds like it could be our problem, we’ll have to try it out. Do you know how to change Eclipse’s console to display different character sets?

lhkbob, you can set the encoding for the Eclipse console in the settings for your run configuration. When you edit the settings for a run configuration, the console encoding is under the Common tab.

Just to let everyone know, it was the problem with struts and its assumed character encoding. Thanks for the help, everything seems to be working okay.