PDA

View Full Version : Jetty + Quercus + PHP + MySQL + Wordpress not able store UTF-8 properly reading works


humble
02-18-2010, 05:55 AM
Thanks for providing a java implementation of PHP and releasing it as an open source !

I have a strange scenario.

My Setup

Jetty 7
Quercus 4.0.3
PHP
MySQL 5.0.88
Wordpress 2.9.2

It works very well, if I don't specificy the JDBC entry in web.xml, that means the mysql connection goes through PHP script, doesn't use the container jdbc.
I can read and write non-english utf-8 content. It updates, retrieve and display them properly.

But when I change the web.xml to use the jdbc and configured the jdbc url as

jdbc:mysql://<host>:3306/<db>?useUnicode=true&amp;characterEncoding=UTF-8&amp;characterSetResults=UTF-8


I can read utf-8 content, but when I modify and submit (wordpress post), then it becomes non-readable, gibberish.

web.xml

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE web-app
PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN"
"http://java.sun.com/j2ee/dtds/web-app_2_2.dtd">

<web-app
xmlns="http://caucho.com/ns/resin" xmlns:resin="urn:java:com.caucho.resin">
<description>Caucho Technology's PHP Implementation</description>

<servlet>
<servlet-name>Quercus Servlet</servlet-name>
<servlet-class>com.caucho.quercus.servlet.QuercusServlet</servlet-class>

<!--
Specifies the encoding Quercus should use to read in PHP scripts.
-->
<init-param>
<param-name>script-encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>

<!--
Tells Quercus to use the following JDBC database and to ignore the
arguments of mysql_connect().
-->
<init-param>
<param-name>database</param-name>
<param-value>jdbc/mysql</param-value>
</init-param>

<!--
-->
<init-param>
<param-name>ini-file</param-name>
<param-value>WEB-INF/php.ini</param-value>
</init-param>

<!--
Location of the license to enable php to java compilation.
-->
<init-param>
<param-name>license-directory</param-name>
<param-value>WEB-INF/licenses</param-value>
</init-param>
</servlet>

<servlet-mapping>
<servlet-name>Quercus Servlet</servlet-name>
<url-pattern>*.php</url-pattern>
</servlet-mapping>

<welcome-file-list>
<welcome-file>index.php</welcome-file>
</welcome-file-list>
<resin:Dispatch>
<resin:IfFileExists/>
</resin:Dispatch>

<resin:Forward regexp="^" target='/index.php'/>
</web-app>


php.ini

unicode.output_encoding=utf-8
unicode.runtime_encoding=utf-8
unicode-semantics=on


I am sure having the connection goes through jdbc will be improve the performance as we can use connection pooling to pool the resources instead of creating as many connection needed.

Please do the needful.

nam
02-21-2010, 03:24 AM
For PHP applications, you should be using ISO-8859-1 instead of UTF-8

PHP doesn't enforce any encoding on the data sent to and returned by the database. For all intents and purposes, the data might as well be in binary (specifically ISO-8859-1). It is then up to the application (Wordpress in this situation) to interpret the bytes. But when you specify an encoding in the connection string other than ISO-8859-1, then the JDBC driver is interpreting the bytes and in turn garbling the data.

humble
02-24-2010, 02:10 PM
I tried with drupal today. When I tried to read the existing utf-8, it didn't work, but if I create new utf-8 content through then it properly display.

If I access the existing content by accessing the tables using phpmyadmin, it displays the utf-8 content, but the content created through quercus is garbled but readable through drupal code.

Is this because of mysql jdbc driver ? I don't understand where is the difference.

through php if I create utf-8 content, it stores as utf-8 which is viewable through phpmyadmin and through drupal ui as well.

through quercus if I create utf-8 content, it stores in different format than utf-8(iso-8859-1) which is not readable through phpmyadmin but readable through drupal ui.

humble
02-26-2010, 01:21 PM
or let me ask this way, is there any way we convert the mysql data to binary or some encoding which will be readable with quercus/jdbc ?

thanks

nam
02-26-2010, 11:07 PM
Can you do the following query and send me the results?

SHOW VARIABLES LIKE '%char%'

ching
02-27-2010, 10:26 AM
i meet the similar problem on my:

mediawiki 1.15.1
postgresql 8.4 (zh_cn.UTF8)
resin 4.0.3



1. i post "中文" to some page, hexstring [e4 b8 ad e6 96 87] in utf8;

2. mediawiki can create page correctly at first time. showing '中文', [e4 b8 ad e6 96 87];

3. before writing to postgresql, i log it, also correct, '中文' [e4 b8 ad e6 96 87];

4. after data written, i query database directly, i got '中文', hexstring [c3 a4 c2 b8 c2 ad c3 a6 c2 96 c2 87], just new String("中文".getBytes("UTF8"), "ISO-8859-1") i found;

5. then i purge the page in mediawiki, it gives me '中文', [c3 a4 c2 b8 c2 ad c3 a6 c2 96 c2 87];

6. i'm confused, i change database artificially to '中文', [e4 b8 ad e6 96 87], after purging, i got right answer again;

7. i've been doubtful of jdbc drivers, so i test a case within the same webapp

<%@ page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8" import="java.sql.*,java.util.*,javax.sql.*,javax.naming.*"%>
<%
/* raw jdbc also works ok
Class.forName("org.postgresql.Driver");
String url = "jdbc:postgresql://127.0.0.1:5432/mediawiki";
Properties props = new Properties();
props.setProperty("user","user");
props.setProperty("password","password");
Connection conn = DriverManager.getConnection(url, props);
*/
DataSource ds = (DataSource)new InitialContext().lookup("java:comp/env/jdbc/mediawiki");
Connection conn = ds.getConnection();

String newText = "中文";
PreparedStatement pst = conn.prepareStatement("UPDATE pagecontent SET old_text=? WHERE old_id = 25");
pst.setString(1, newText);
int n = pst.executeUpdate();
pst.close();

Statement st = conn.createStatement();
ResultSet rs = st.executeQuery("SELECT old_text FROM pagecontent WHERE old_id = 25");
if (rs.next()) {
out.print(rs.getString(1) + "<br>");
}
rs.close();
st.close();

conn.close();
%>

data saved and ouputed correctly ('中文' [e4 b8 ad e6 96 87]), jdbc setting and driver seems well, it's likely that something in quercus's database pooling system blocks me;

8. still i can't find why, so i installed a real php environment, it works perfectly, without any gibberish.

9. any misconfiguring i done to quercus? please correct me, my resin-web.xml:

<web-app xmlns="http://caucho.com/ns/resin">
<database jndi-name='jdbc/mediawiki'>
<driver type="org.postgresql.Driver">
<url>jdbc:postgresql://127.0.0.1:5432/mediawiki</url>
<user>username</user>
<password>password</password>
</driver>
</database>

<servlet-mapping url-pattern="*.php"
servlet-class="com.caucho.quercus.servlet.QuercusServlet">
<init>
<database>jdbc/mediawiki</database>
<compile>true</compile>
<script-encoding>iso-8859-1</script-encoding>
<php-ini>
<!-- with or without, don't work either
<unicode.output_encoding>utf-8</unicode.output_encoding>
<unicode.runtime_encoding>utf-8</unicode.runtime_encoding>
<unicode-semantics>on</unicode-semantics>
-->
<sendmail_from>root@locahost</sendmail_from>
<smtp_username>username</smtp_username>
<smtp_password>password</smtp_password>
</php-ini>
</init>
</servlet-mapping>

<welcome-file-list>index.php</welcome-file-list>
</web-app>



thanks for all help, very much~

wiesiek
03-09-2010, 08:42 PM
I would also love to know how to use UTF-8 with database connections under Quercus. It used to be sort of OK for the version 3.2.1 - I just had to use utf8_encode / utf8_decode. With version 4.0.* this approach does not work - no matter what I do I always get garbled data... Is it a known problem with the new releases? By the way - I am using the H2 Java SQL server (and no - it is not a problem with H2 nor its JDBC driver).

Wiesiek

sohail
06-18-2010, 05:15 PM
Hi friends, I am first time here and really found ti good by all means, Infact the topic is good to read and and sending this page to my other firends because they are always asking me for the good resources to read out

dicr
05-10-2011, 09:07 PM
Yes, the problem really exists - Quercus can't work with UTF-8 databases !
There are so many users in world using UTF-8 in data, so the problem must be solved. Before, I using resin 3.x and it was working fine. Currently no one of my web application works. Setting any character encodings not helps :mad:

dicr
05-10-2011, 11:01 PM
I'm using query:
SET NAMES 'utf8'
in my php code, but nothing changes. Seems like quercus filtering all "SET NAMES" queres, because event incorrect ones like "SET NAMES 'XXX'" not throw any exception.

Then I try query:
/*!40101 SET NAMES 'utf8' */
and it apply !!! Data become returned in 2-bytes format instead of simple "??????", but incorrect encoding.

All we need is to tell mysql driver to use ?characterEncoding=UTF8, but
I think this is filtered by quercus too...

Who know how to find workaround other then patching and recompiling quercus ?

We want to have any way to output UTF-8 data from mysql to html without any internal recoding to iso-8859-1.

dicr
05-10-2011, 11:14 PM
Data, returned by mysql from UTF-8 database to iso-8859-1 client in not just a binary data in "unknown" encoding !!!! But this data is corruped by mysql.
Each 2-byte char converted to 1-byte char and we always get "????????".

For example, string "тест test" contains 9 chars and takes 13 bytes in UTF-8. When using "iso-8859-1" client encoding, mysql return 9 (not 13)-bytes string "???? test" with 9 chars. So, UTF8 data is corrupted by mysql and can't be used anyway by iso-8859-1 client. It is not just a binary 13-bytes data, which php can output withowt any conversion to UTS-8 html, but it is corruped data.

So, using iso-8859-1 characterEncoding in mysql driver for 2-bytes encodings is not acceptable !!! Please, let us to specify driver characterEncoding instead of hard coded value !!!

dicr
05-10-2011, 11:27 PM
I found, that we can read UTF-8 data by adding to driver url param:
&characterSetResult=utf8, which is not overriding by quercus !

dicr
05-11-2011, 12:33 AM
I found working configuration for Resin 4.0.16:

resin-web.xml

<script-encoding>utf-8</script-encoding>
<php-ini>
<unicode.semantics>on</unicode.semantics>
<unicode.runtime_encoding>iso-8859-1</unicode.runtime_encoding>
<unicode.output_encoding>utf-8</unicode.output_encoding>
<unicode.http_input_encoding>utf-8</unicode.http_input_encoding>
</php-ini>



<url>jdbc:mysql://host:3306/db?useUnicode=true&amp;characterSetResult=utf8</url>


php

mysql_query("/*!40101 set names 'utf8' */", $this->link);


This work both for display and input new UTF-8 data.

andreluiz1111
05-15-2011, 08:44 PM
Guys I'm with a similar problem, except that:
POSTGRES database use and enoding LATIN1
JAVA and PHP configured to use encoding ISO-8859-1.

I must have some setting wrong

During the project execution in a PHP environment using JAVA QUERCUS that the following problems:
1. A PHP project running normally in XAMPP accessing bank POSTGRES, encoding LATIN1 via ODBC
2. In NetBeans 6.9.1 has been installed and the Quercus PHP Project
3. The project JAVA accesses the same database from PHP and displays the contents of the fields correctly (in Portuguese accent)
4. When the JSP page calls the PHP, it runs normally but the displayed contents of the bank does not recognize LATIN1 encoding (accents and characters in Portuguese)
5. It is as if done in Java via JDBC access recognizes the POSTGRES database with encoding LATIN1, but when you run PHP page accessing the database with encoding POSTGRES LATIN1 via any ODBC configuration is incompatible.


meta-inf
<? Xml version = "1.0" encoding = "iso-8859-1"?>
<Context AntiJARLocking="true" path="/siaco"/>

<! - Context path = "/ project">
<Resource auth = "Container" driverClassName = "org.postgresql.Driver" maxActive = "4" maxidle = "2" maxWait = "5000" name = "jdbc / postgresql" password = "xxx" type = "javax.sql. DataSource "url =" jdbc: postgresql: / / localhost: 5432/projeto02 "username =" postgres "/>
</ Context


web.xml
<? Xml version = "1.0" encoding = "UTF-8"?>
<Web-app version = "2.5" xmlns = "http://java.sun.com/xml/ns/javaee" xmlns: xsi = "http://www.w3.org/2001/XMLSchema-instance" xsi : schemaLocation = "http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd">
<session-config>
<session-timeout>
30
</ Session-timeout>
</ Session-config>
<! - Quercus Servlet ->
<servlet>
<servlet-name> quercusServlet </ servlet-name>
<servlet-class> com.caucho.quercus.servlet.QuercusServlet </ servlet-class>
</ Servlet>
<servlet-mapping>
<servlet-name> quercusServlet </ servlet-name>
<url-pattern> *. php </ url-pattern>
</ Servlet-mapping>
<context-param>
<param-name> ini-file </ param-name>
<param-value> WEB-INF/php5.ini </ param-value>
</ Context-param>
<context-param>
<param-name> script-Encoding </ param-name>
<param-value> ISO-8859-1 </ param-value>
</ Context-param>

<! - <filter>
<filter-name> Project Filter </ filter-name>
<filter-class> br.projeto.filtro.projetoFilter </ filter-class>
</ Filter>
<filter-mapping>
<filter-name> Project Filter </ filter-name>
<url-pattern> / project </ url-pattern>
</ Filter-mapping> ->
<welcome-file-list>
<welcome-file> index.jsp </ welcome-file>
</ Welcome-file-list>
</ Web-app>

resin-web
<? Xml version = "1.0" encoding = "UTF-8"?>
<web-app xmlns="http://caucho.com/ns/resin">
<servlet>
<servlet-name> quercusServlet </ servlet-name>
<servlet-class> com.caucho.quercus.servlet.QuercusServlet </ servlet-class>
</ Servlet>
<servlet-mapping>
<servlet-name> quercusServlet </ servlet-name>
<url-pattern> *. php </ url-pattern>
<init>
<script-encoding> ISO-8859-1 </ script-encoding>
</ Init>
<php-ini>
<unicod.output_encoding> iso-8859-1 </ unicod.output_encoding>
<unicod.runtime_encoding> iso-8859-1 </ unicod.runtime_encoding>
</ Php-ini>
<init>
<ini-file> WEB-INF/php.ini </ ini-file>
<compile> true </ compile>
</ Init>
</ Servlet-mapping>
</ Web-app>