View Full Version : utf-8 chinese support issues

09-12-2009, 04:17 AM
I'm trying to run php on GAE throu quercus. I've already done a simple "greeting" example in php. But there're some strange problems in Chinese support.

All my php files are encoded in utf-8 w/o BOM. The chinese hard coded in php files are ok. But the chinese content stored in greeting beans are not. The only way I can make it display correctly is to change the getter and return the content in &#num; format.

I also tried the newly released resin 4.0.1 and the problem is the same.

Should I try other solutions? Any one knows how to configure it for better Chinese content display?

Besides, I'm trying to port CodeIgniter to GAE and have a simple demo: here (http://fillanocode.appspot.com/).

Thank you.

Fillano Feng

09-12-2009, 03:20 PM
Normally, PHP runs in 8-bit mode. So your greeting bean is getting its 16-bit Java string chopped to a PHP 8-bit string (basically forcing iso-8859-1).

If you set the php-ini unicode.semantics="on", Quercus will run in 16-bit string mode. So PHP strings will be 16-bits and keep all the information from the Java strings.

09-13-2009, 06:02 AM
Thanks for your reply.

I really got a way that worked but it's not what I want:

all my php files were encoded in utf-8 w/o BOM.
the application server is tomcat and w/o any further setting for QuercusServlet.
If I stored chinese string into java beans in php and get them later, it works fine.
If the chinese string is from java, I have to use
to return it as a byte array. I know it will then convert to string.
In browser, the encoding setting is UTF-8.

I tried to set unicode.semantics="on" in php.ini but I did't see anything changed.

If I changed the script-encoding in web.xml to UTF-8, the chinese string in php will not correctly display. So the above combination is the only way worked as I tried.

I just wonder is there any way to make it simple. If it worked in tomcat, I think it will work in GAE too.