Caucho Forums  

This forum is permanently closed because of spam. For free community support, please visit Google Groups:


Go Back   Caucho Forums > Quercus

Reply
 
Thread Tools Display Modes
  #1  
Old 11-07-2012, 02:17 PM
Jordi Jordi is offline
Member
 
Join Date: Jul 2012
Posts: 43
Default Quercus Encoding

Greetings,

I've tried many combinations trying to get accented characters working properly under Quercus, but in every combination seems to fail in some aspects (some work, others not).

Any of you have got a single working solution running in full UTF-8?

Actually I've got MySQL in UTF-8, resin-web.xml script-encoding set to ISO-8859-15 and unicode.semantics OFF. 70% of the things work, but the rest show garbled characters.
Tried setting MySQL to ISO-8859-15, O.S., unicode.semantics ON with UTF-8, but I couldn't find a solution as a whole.

Any ideas/notes would be of great help.

Thanks,
Jordi
Reply With Quote
  #2  
Old 11-08-2012, 08:04 AM
Jordi Jordi is offline
Member
 
Join Date: Jul 2012
Posts: 43
Default

Sorry, I'm using Resin Pro 4.0.32+, Linux with UTF-8 and trying to make Drupal Commons 6.29 work.
Reply With Quote
  #3  
Old 11-12-2012, 02:19 PM
Jordi Jordi is offline
Member
 
Join Date: Jul 2012
Posts: 43
Default

Greetings,

I'll be posting here my updates of using Drupal Commons under Quercus, all of them trying to get a consistent experience when dealing with latin1 characters under a platform (Drupal) built to be used under UTF-8.

My best working solution so far is the following:

Code:
Operating System (assuming GNU/Linux base)
No changes (modern distros are set by default to UTF-8)
Code:
MySQL (/etc/my.cnf)
No changes. (Tried, by the way, the following:)
[client]
#default-character-set = latin1 # utf8
[mysqld]
#default-character-set=latin1
#default-character-set = utf8
#skip-character-set-client-handshake
#character-set-server = latin1
#collation-server = latin1_spanish_ci
#init_connect='SET collation_connection = utf8_general_ci; SET NAMES utf8;'
#init_connect='SET collation_connection = latin1_spanish_ci; SET NAMES latin1;'
Code:
Resin (webapp/../WEB-INF/resin-web.xml)
 <script-encoding>iso-8859-15</script-encoding>
       <php-ini>
          <!--
          <unicode.semantics>on</unicode.semantics>
          <unicode.runtime_encoding>iso-8859-15</unicode.runtime_encoding>
          <unicode.output_encoding>utf-8</unicode.output_encoding>
          <unicode.http_input_encoding>utf-8</unicode.http_input_encoding>
          -->
      </php-ini>
With this configuration (tried this thread also) I get a working solution of accented characters (needed for Spain) of almost everything. The only thing that doesn't work is the following:
AJAX/AHAH posts (i.e.: activity streams, comments, ...) when using accented chars AND using Firefox. By default, all other browsers work without problems.

Any ideas?

Thanks,
Jordi

Last edited by Jordi; 11-12-2012 at 02:35 PM.
Reply With Quote
  #4  
Old 11-14-2012, 07:14 PM
nam nam is offline
Administrator
 
Join Date: Aug 2009
Posts: 337
Default

Hi Jordi,

If unicode works in PHP5, then it should also work in Quercus. You shouldn't need to use unicode.semantics=on because that turns on PHP6 unicode features that may break non-PHP6 apps.

Anyways, it's very likely you're running into database encoding issues. Quercus uses the MySQL JDBC driver and that driver does not allow us to use NO ENCODING, which we need for PHP. We instead use ISO-8859-1 and while it works for most combinations of MySQL client/connection/server encodings, some combinations obviously doesn't.

We'll need to add options for users to manually set the encodings.
Reply With Quote
  #5  
Old 11-15-2012, 10:37 AM
Jordi Jordi is offline
Member
 
Join Date: Jul 2012
Posts: 43
Default

OK Nam,

so, do you think that if we change JDBC driver encoding to UTF-8 and add <character-encoding>utf-8</character-encoding> in resin.xml will do the trick?

I guess the class that specifies ISO-8859-1 for the JDBC driver is com.caucho.quercus.lib.db.Mysqli and the encoding is passed from com.caucho.quercus.lib.db.JdbcDriverContext _setDefaultEncoding, isn't it?

The only doubt I have so far, is that there are lots of other places where ISO-8859-1 is fire-coded in resin.jar and don't know if they can be a problem or not (for instance: QuercusContext/QuercusServlet and script-encoding, Env, HtmlModule, StringModule, QuercusEngine, ....)

Best,
Jordi
Reply With Quote
  #6  
Old 11-15-2012, 10:53 AM
Jordi Jordi is offline
Member
 
Join Date: Jul 2012
Posts: 43
Default

Hello again,

just tried the above, but to no avail

Code:
JdbcDriverContext.java
... _setDefaultEncoding = "UTF8";

(log == finer)
...
InjectManager[web-app:production/webapp/default/ROOT] add bean SingletonBean[DataSource, {@Named('jdbc:mysql://localhost/commons?jdbcCompliantTruncation=false&characterEncoding=UTF8-0'), @Default()}, name=jdbc:mysql://localhost/commons?jdbcCompliantTruncation=false&characterEncoding=UTF8-0]
[12-11-15 11:50:06.259] {resin-port-8080-40} create: ManagedPoolItem[jdbc:mysql://localhost/commons?jdbcCompliantTruncation=false&characterEncoding=UTF8-0,0,ManagedConnectionImpl](active:0, total:0)
...
Code:
resin.xml
<character-encoding>utf-8</character-encoding>

resin-web.xml
<string-encoding>UTF-8</string-encoding>
Garbled completely
Jordi
Reply With Quote
  #7  
Old 03-12-2013, 01:43 AM
nam nam is offline
Administrator
 
Join Date: Aug 2009
Posts: 337
Default

First of all, I would like to apologize to all the Quercus users out there for this issue. We had completely misunderstood what was going on between the MySQL server and the JDBC driver. We had (wrongfully) concluded that the driver couldn't handle "binary" queries and that we had to write our own MySQL driver.

While we did some initial work on our own driver, the project started to stall and collect dust over the years. Nevertheless, the encoding problem is something that we had to fix and I finally came around to giving it another stab:

1. The first issue for Quercus is that the MySQL JDBC driver only accepts queries that are of the Java String type. This means that the JDBC driver will:
convert the String to the session's character_set_connection encoding
This is a problem because PHP strings are binary (they are just bytes and can be in an arbitrary encoding) and we don't want the driver to do any character set conversions. Thankfully, I discovered that we can prevent the JDBC driver from doing any encoding by escaping non-ASCII characters with a backlash ('\'). It's a hack, but it works!

2. The other issue is that, by default, the JDBC driver converts query results using the Java character set encoders/decoders. This means that the JDBC driver will:
decode the result set bytes using the columns' encodings, and substitute malformed bytes with a '?' placeholder
This is a problem because PHP applications sometimes set the wrong encoding for a table or use a VARCHAR instead of BINARY. And when the JDBC driver tries to decode bytes that doesn't map to any character in Java, the resultant String will contain '?' placeholders instead.

The key to the solution is the character_set_results setting. By default, it's set to NULL, which means that the client is responsible for decoding the query results, thereby alleviating server load. To match PHP's behavior, we need to set it to ISO8859_1 to tell the server to do all the encoding on the server-side instead of leaving it up to the client. And if the user wants to change that encoding, they can still do so by sending a "SET NAMES" query.

With these two issues resolved, MySQL encodings in Quercus now behave the same as in PHP. You don't need to mess with unicode.semantics anymore . Things just work, as they should. You can try the fixes by checking out our subversion trunk or waiting for our next upcoming release (4.0.36).
Reply With Quote
  #8  
Old 03-13-2013, 11:58 AM
Jordi Jordi is offline
Member
 
Join Date: Jul 2012
Posts: 43
Default

Absolute great news!!

Thanks a lot Nam!

One question, is it already uploaded to SVN? I checked out current revision (10008) and I wanted to be sure changes are uploaded.

Thanks again.
Best,
Jordi
Reply With Quote
  #9  
Old 03-13-2013, 01:31 PM
nam nam is offline
Administrator
 
Join Date: Aug 2009
Posts: 337
Default

r10008 should have the fix. The fix was in commit r9989, dated Feb 25, 2013.
Reply With Quote
  #10  
Old 03-15-2013, 01:03 PM
Jordi Jordi is offline
Member
 
Join Date: Jul 2012
Posts: 43
Default

Hi Nam,

I cannot get to work r10008. Resin starts but then fails with status 0 and no log whatsoever... Checked out again today, but still in the same revision.

Do you know anything I can do to try latest encoding fixes? I'd need to try out Drupal with UTF-8.

Best,
Jordi
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 11:20 AM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.