Perl/DBI/DBD :: ODBC中的自动字符编码处理 [英] Automatic character encoding handling in Perl / DBI / DBD::ODBC

查看:149
本文介绍了Perl/DBI/DBD :: ODBC中的自动字符编码处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将Perl与DBI/DBD::ODBC一起使用,以从SQL Server数据库中检索数据,并且在字符编码方面存在一些问题.

I'm using Perl with DBI / DBD::ODBC to retrieve data from an SQL Server database, and have some issues with character encoding.

数据库的默认排序规则为SQL_Latin1_General_CP1_CI_AS,因此varchar列中的数据使用Microsoft的Latin-1版本(又称为

The database has a default collation of SQL_Latin1_General_CP1_CI_AS, so data in varchar columns is encoded in Microsoft's version of Latin-1, AKA windows-1252.

似乎没有一种方法可以在DBI/DBD :: ODBC中透明地处理此问题.我得到的数据仍然编码为windows-1252,例如,€"被编码为字节0x80、0x93和0x94.当我将它们写入UTF-8编码的XML文件而不先对其进行解码时,它们被写为Unicode字符0x80、0x93和0x94,而不是0x20AC,0x201C,0x201D,这显然是不正确的.

There doesn't seem to be a way to handle this transparently in DBI/DBD::ODBC. I get data back still encoded as windows-1252, for instance, € " " are encoded as bytes 0x80, 0x93 and 0x94. When I write those to an UTF-8 encoded XML file without decoding them first, they are written as Unicode characters 0x80, 0x93 and 0x94 instead of 0x20AC, 0x201C, 0x201D, which is obviously not correct.

我当前的解决方法是在每个fetch之后的每一列上调用$val = Encode::decode('windows-1252', $val).这行得通,但似乎不是执行此操作的正确方法.

My current workaround is to call $val = Encode::decode('windows-1252', $val) on every column after every fetch. This works, but hardly seems like the proper way to do this.

没有办法告诉DBIDBD::ODBC为我执行此转换吗?

Isn't there a way to tell DBI or DBD::ODBC to do this conversion for me?

我正在使用ActivePerl(5.12.2 Build 1202),ActivePerl提供了DBI(1.616)和DBD::ODBC(1.29),并用ppm更新了;在托管数据库的同一服务器(SQL Server 2008 R2)上运行.

I'm using ActivePerl (5.12.2 Build 1202), with DBI (1.616) and DBD::ODBC (1.29) provided by ActivePerl and updated with ppm; running on the same server that hosts the database (SQL Server 2008 R2).

我的连接字符串是:

dbi:ODBC:Driver={SQL Server Native Client 10.0};Server=localhost;Database=$DB_NAME;Trusted_Connection=yes;

谢谢.

推荐答案

DBD :: ODBC(和ODBC API)不知道基础列的字符集,因此DBD :: ODBC无法对返回的8位数据执行任何操作,它只能按原样返回它,而您需要知道它是什么并对其进行解码.如果将列绑定为SQL_WCHAR/SQL_WVARCHAR,则驱动程序/sql_server应将字符转换为UCS2,而DBD :: ODBC应将列视为SQL_WCHAR/SQL_WVARCHAR.当以Unicode模式构建DBD :: ODBC时,将SQL_WCHAR列视为UCS2,并在UTF-8中进行解码和重新编码,Perl应该将它们视为Unicode字符.

DBD::ODBC (and ODBC API) does not know the character set of the underlying column so DBD::ODBC cannot do anything with 8 bit data returned, it can only return it as it is and you need to know what it is and decode it. If you bind the columns as SQL_WCHAR/SQL_WVARCHAR the driver/sql_server should translate the characters to UCS2 and DBD::ODBC should see the columns as SQL_WCHAR/SQL_WVARCHAR. When DBD::ODBC is built in unicode mode SQL_WCHAR columns are treat as UCS2 and decoded and re-encoded in UTF-8 and Perl should see them as unicode characters.

您需要在bind_columns之后将SQL_WCHAR设置为绑定类型,因为绑定类型不像参数类型那样粘滞.

You need to set SQL_WCHAR as the bind type after bind_columns as bind types are not sticky like parameter types.

如果您要继续读取Windows 1252作为字节的varchar数据,那么当前您别无选择,只能对其进行解码.我并不急于向DBD :: ODBC添加某些内容来为您完成此操作,因为这是任何人第一次向我提及此内容.您可能希望查看DBI回调,因为在这些回调中解码返回的数据可能更容易(例如fetch方法).

If you want to continue reading your varchar data which windows 1252 as bytes then currently you have no choice but to decode them. I'm not in a rush to add something to DBD::ODBC to do this for you since this is the first time anyone has mentioned this to me. You might want to look at DBI callbacks as decoding the returned data might be more easily done in those (say the fetch method).

您可能还想研究较新的SQL Server ODBC驱动程序中的对字符数据执行转换"设置,尽管我自己对此经验很少.

You might also want to investigate the "Perform Translation for character data" setting in newer SQL Server ODBC Drivers although I have little experience with it myself.

这篇关于Perl/DBI/DBD :: ODBC中的自动字符编码处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆