View Single Post
Old 09-24-2013, 02:01 PM
tobia tobia is offline
Junior Member
Join Date: Jul 2011
Posts: 5
Question How to handle UTF-8 data without unicode semantics?

I've been reading Quercus's documentation and studying its sources, as well as reading past threads, trying to understand how it handles string encoding, but I feel I'm still missing something.

I need to call some modules (chiefly PDO) passing Unicode non-Latin1 characters in the query text (and possibly in prepared statement parameters) as well as receiving those characters in the resulting records. I'd like to do so using PHP string variables encoded in UTF-8, those that come for example from json_decode() and go into json_encode(). I'd also like to do this without enabling unicode semantics, if at all possible.

The PDO classes internally handle java.lang.String values, both for the query text and for varchar return values; PHP string variables (ConstStringValue) are represented as a java.lang.String as well, so they could theoretically contain any Unicode character. But I'm not sure how many encoding/decoding steps they go through along the chain.

I should mention that enabling unicode semantics makes everything work out of the box beautifully, because the PHP variables are no longer being encoded/decoded in strange places, they are just Unicode strings and are passed around as such.

My question is: how can I get this to work without unicode semantics, which breaks other random things?

$db = new PDO('java:comp/env/jdbc/something');

$qry = "select * from table where x = ".$db->quote(json_decode(...));
// Now $qry contains unicode characters encoded in UTF-8.

// How do I encode/decode $qry, so that it can be passed to the JDBC
// driver as a Java String with those same Unicode characters?
$cur = $db->query($qry);

$row = $cur->fetch(PDO::FETCH_NUM);
// How do I encode/decode $row values so that they are come up
// encoded in UTF-8 and can be passed for example to json_encode()?
Reply With Quote