Wednesday, September 23, 2009

Lost in translation: The Encoding Experience

I had written an undelivr folder viewer about a month back. We started testing some special chars through our mailing app recently and found that Programów was becoming Programów. We figured it had something to do with encoding. I checked the cfadmin setting for the default mail encoding and it was UTF-8. That should have been ok. I tried hardcoding charset="UTF-8" in the cfmail tag to see if that would help. Same results.

So then I switched gears to my local dev server and wrote up this quick script.
<cfset str ="Programów"/>
<cfmail from="" to="" subject="1.0" charset="utf-8">

This works perfectly fine in my local dev. When I threw this script on the server it still got mangled. So now we're dealing with an environment issue. I'm running Ubuntu locally and we are running windows on the server. So I started digging into the settings in cfadmin. I took the settings summary from both servers and threw it into Kompare. One of the differences I saw right away was the Java File Encoding. On the server it was set to cp1252 and on my Linux it was UTF-8. After much searching, I was able to track down how to change the JVM's default file encoding here: Java's file.encoding property on Windows platform. After adding -Dfile.encoding=UTF-8 to the jvm's server.args (typically found in jrun4/bin/jvm.conf), I restarted the instance and re-ran the test. Bingo, the special chars were preserved.

There must be an issue with some special chars moving from UTF-8 to cp1252 or vice versa. I just ran a test and this does affect all emails if you are spooling to disk. My assumption is that under the covers, java is writing the file using the default JVM file encoding, not the default coding set for the cfmail tag. This makes the cfmail default setting a little misleading. At any rate, setting the JVM encoding to UTF-8 did the trick.


No comments: