PDA

View Full Version : Problem with UNICODE, php.ini & env._isUnicodeSemantics


domdorn
04-26-2010, 03:32 PM
Hi,

I'm trying to troubleshoot why this "unicode hell" is still not working on my system.

I have now set this into my php.ini

unicode.semantics=on
unicode.http_input_encoding=UTF-8
unicode.output_encoding=UTF-8
unicode.runtime_encoding=UTF-8

as far as I can see with the debugger, the php.ini is parsed correctly and

_env._quercus._iniMap[3] is unicode.semantics=1

... so I guess, that was parsed correctly. However,
_env._isUnicodeSemantics = false instead of true
which, I think might be the problem why I'm having all those issues with unicode.

can anyone confirm this?

thanks,
dominik

nam
04-28-2010, 08:24 PM
Just to double-check: how and when are you setting unicode.semantics=on? It's a servlet setting so it must be set in your web.xml (via a php.ini).

And do you a test case for your unicode issues?

domdorn
05-01-2010, 05:11 PM
web.xml:
<servlet>
<servlet-name>Quercus Servlet</servlet-name>
<servlet-class>com.caucho.quercus.servlet.QuercusServlet</servlet-class>

<!--
Specifies the encoding Quercus should use to read in PHP scripts.
-->
<init-param>
<param-name>script-encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>

<init-param>
<param-name>ini-file</param-name>
<param-value>
/WEB-INF/php.ini
</param-value>
</init-param>

<!--
Location of the license to enable php to java compilation.
-->
<init-param>
<param-name>license-directory</param-name>
<param-value>WEB-INF/licenses</param-value>
</init-param>
</servlet>


php.ini:
unicode.semantics=on
unicode.http_input_encoding=UTF-8
unicode.output_encoding=UTF-8
unicode.runtime_encoding=UTF-8


I created a small module to check the Env myself with the debugger,
basically just a

public void inspect_object(Env env, Object o)
{
System.out.println("test"); // here i set the breakpoint
}


I'm not quite sure how to create a repeatable unit test here. Any suggestions?

nam
05-04-2010, 11:32 PM
I have a fix for this, but just recently, the PHP team discontinued PHP6, so we do want to rethink how to add Unicode to Quercus. For starters, since we don't have to follow PHP anymore with respect to Unicode, we can rip out nonsense things we had to do to match PHP6. Quercus originally started out as being Unicode compatible, so hopefully it wouldn't be too much work to get it compatible again (with a unicode.semantics-like flag).

http://news.php.net/php.internals/47120

sblommers
05-10-2010, 03:55 PM
In my (recent) experience I found out that using UTF8 with Quercus doesn't work yet. We had issues with special entities that were ok in Drupal (running on QuercusServlet+Jetty) but not ok using a CLI and service-like approach.

I would recommend using UTF8 everywhere and try to convert from ISO-5589_1 in your application where it is necessary to UTF8.

Some ugly code I just wrote to get chinese, hebrew, japanese characters working in UTF8 (not actually using the languages but now we support it).


// TODO: check if newer version of Quercus solves the UTF-8 problem
protected Object ISO_8859_1_to_UTF_8(Object input) {
Object output = input;
try {
if(input != null && input.getClass().equals(String.class))
output = new String(input.toString().getBytes("ISO-8859-1"), "UTF8");
} catch (UnsupportedEncodingException e) {
System.out.println("Unable to convert from ISO-8859-1 to UTF-8. \nINPUT="+input);
e.printStackTrace();
}
return output;
}


This might not answer your question but maybe it helps you a bit.

Best regards,
Sebastiaan