无法使用Hibernate / PostgreSQL将Euro-sign存储到LOB String属性中 [英] Cannot store Euro-sign into LOB String property with Hibernate/PostgreSQL

查看:240
本文介绍了无法使用Hibernate / PostgreSQL将Euro-sign存储到LOB String属性中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法使用Hibernate 3.6.10在PostgreSQL 8.4中将特殊字符(例如欧元符号(€))写入和读回LOB字符串属性。



我知道的是,PostgreSQL提供了两种不同的方式来存储大字符对象在表的列。它们可以直接存储到表列中,也可以间接存储在单独的表中(实际上称为pg_largeobject)。在后一种情况下,列保存对pg_largeobject中的行的引用(OID)。



Hibernate 3.6.10中的默认行为是间接OID方法。然而,可以向Lob属性添加额外的注释@ org.hibernate.annotations.Type(type =org.hibernate.type.TextType)以获得直接存储行为。



这两种方法都很好,除非我想使用特殊字符,如欧元符号(€)。在这种情况下,直接存储机制继续工作,但间接存储机制断开。



我想演示一个例子。我创建了一个带有2个@Lob属性的测试实体。一个遵循直接存储原则,另一个是间接存储:

  @Basic 
@Lob
@ Column(name =CLOB_VALUE_INDIRECT_STORAGE,length = 2147483647)
public String getClobValueIndirectStorage()

  @Basic 
@Lob
@ org.hibernate.annotations.Type(type =org .hibernate.type.TextType)
@Column(name =CLOB_VALUE_DIRECT_STORAGE,length = 2147483647)
public String getClobValueDirectStorage()

如果我创建一个实体,用欧元符号填充这两个属性,然后将其持久化到数据库我看到以下当我做一个SELECT我看到

  id | clob_value_direct_storage | clob_value_indirect_storage 
---- + --------------------------- + ------------ ----------------
6 | €| 910579

如果我查询表pg_largeobject,我看到:

  loid | pageno | data 
-------- + -------- + ------
910579 | 0 | \254

pg_largeobject的'data'列的类型为bytea,存储为原始字节。表达式\254表示一个单字节,在UTF-8中表示字符¬。这正是我从数据库加载实体时得到的值。



欧元符号UTF-8由3个字节组成,因此我将期望数据列具有3个字节而不是1个。



这不仅发生在欧元符号,而且发生在许多特殊字符。这是Hibernate中的一个问题吗?还是JDBC驱动?有什么方法可以调整这种行为吗?



提前感谢,

感谢您,
Franck de Bruijn

解决方案

在Hibernate和PostgreSQL JDBC驱动程序的源代码中进行了大量的挖掘后,我设法找到根本原因的问题。最后,调用BlobOutputStream(由JDBC驱动程序提供)的write()方法将Clob的内容写入数据库。此方法如下所示:

  public void write(int b)throws java.io.IOException 
{
checkClosed();
try
{
if(bpos> = bsize)
{
lo.write(buf);
bpos = 0;
}
buf [bpos ++] =(byte)b;
}
catch(SQLException se)
{
throw new IOException(se.toString());这个方法需要一个'int'(32位/秒) 4字节)作为参数,并将其转换为有效地丢失3字节信息的字节(8位/ 1字节)。 Java中的字符串表示是UTF-16编码的,意味着每个字符由16位/ 2字节表示。欧元符号的int值为8364.转换为字节后,值172保留(以八位字节表示254)。



我不知道现在最好的分辨率就是这个问题。 IMHO JDBC驱动程序应该负责将Java UTF-16字符编码/解码为数据库需要的任何编码。但是,我没有看到任何调整的可能性在JDBC驱动程序代码改变它的行为(我不想写和维护我自己的JDBC驱动程序代码)。



因此,我使用自定义ClobType扩展了Hibernate,并在写入数据库之前管理将UTF-16字符转换为UTF-8,反之亦然。



解决方案太大了,只是简单的粘贴在这个答案。



干杯,
Franck


如果你有兴趣,请给我一条线。

I am having trouble writing and reading back special characters like the Euro-sign (€) into LOB String properties in PostgreSQL 8.4 with Hibernate 3.6.10.

What I know is that PostgreSQL provides two distinct ways to store large character objects in a column of a table. They can be stored either directly into that table column or indirectly in a separate table (it's actually called pg_largeobject). In the latter case, the column holds a reference (OID) to the row in pg_largeobject.

The default behaviour in Hibernate 3.6.10 is the indirect OID approach. However, it is possible to add an extra annotation @org.hibernate.annotations.Type(type="org.hibernate.type.TextType") to the Lob property to get the direct storage behaviour.

Both approaches work fine, except for the moment that I want to work with special characters like the Euro sign (€). In that case the direct storage mechanism keeps working, but the indirect storage mechanism breaks.

I'd like to demonstrate that with an example. I created a test entity with 2 @Lob properties. One follows the direct storage principle, the other the indirect storage:

@Basic
@Lob
@Column(name = "CLOB_VALUE_INDIRECT_STORAGE", length = 2147483647)
public String getClobValueIndirectStorage()

and

@Basic
@Lob
@org.hibernate.annotations.Type(type="org.hibernate.type.TextType")
@Column(name = "CLOB_VALUE_DIRECT_STORAGE", length = 2147483647)
public String getClobValueDirectStorage()

If I create an entity, populate both properties with the Euro sign and then persist it towards the database I see the following when I do a SELECT I see

 id | clob_value_direct_storage | clob_value_indirect_storage
----+---------------------------+----------------------------
  6 | €                         | 910579                     

If I then query the table pg_largeobject I see:

  loid  | pageno | data
--------+--------+------
 910579 |      0 | \254

The 'data' column of pg_largeobject is of type bytea, which means that the information is stored as raw bytes. The expression '\254' represents one single byte and in UTF-8 represents the character '¬'. This is exactly the value that I get back when I load the entity back from the database.

The Euro sign in UTF-8 consists of 3 bytes, so I would have expected the 'data' column to have 3 bytes and not 1.

This does not only occur for the Euro sign, but for many special characters. Is this a problem in Hibernate? Or the JDBC driver? Is there a way I can tweak this behaviour?

Thanks in advance,
Kind regards,
Franck de Bruijn

解决方案

After a lot of digging around in the source code of Hibernate and the PostgreSQL JDBC driver I managed to find the root cause of the problem. In the end the write() method of the BlobOutputStream (provided by the JDBC driver) is invoked to write the contents of the Clob into the database. This method looks like this:

public void write(int b) throws java.io.IOException
{
    checkClosed();
    try
    {
        if (bpos >= bsize)
        {
            lo.write(buf);
            bpos = 0;
        }
        buf[bpos++] = (byte)b;
    }
    catch (SQLException se)
    {
        throw new IOException(se.toString());
    }
}

This method takes an 'int' (32 bits/4 bytes) as argument and converts it to a 'byte' (8 bits/1 byte) effectively losing 3 bytes of information. String representations within Java are UTF-16 encoded, meaning that each character is represented by 16 bits/2 bytes. The Euro-sign has the int value 8364. After conversion to byte, the value 172 remains (in octet representation 254).

I am not sure what now the best resolution is to this problem. IMHO the JDBC driver should be responsible for encoding/decoding the Java UTF-16 characters to whatever encoding the database needs. However, I do not see any tweaking possibilities in the JDBC driver code to alter its behaviour (and I do not want to write and maintain my own JDBC driver code).

Therefore, I extended Hibernate with a custom ClobType and managed to convert the UTF-16 characters to UTF-8 before writing to the database and vice-versa when retrieving the Clob.

The solutions is too large to just simple paste in this answer. If you are interested, drop me a line, and I send it to you.

Cheers, Franck

这篇关于无法使用Hibernate / PostgreSQL将Euro-sign存储到LOB String属性中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆