将SQL_Latin1_General_CP1_CI_AS编码为UTF-8 [英] Encoding SQL_Latin1_General_CP1_CI_AS into UTF-8

查看:1896
本文介绍了将SQL_Latin1_General_CP1_CI_AS编码为UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用DomDocument生成一个PHP文件,我需要处理亚洲字符。我使用pdo_mssql驱动程序从MSSQL2008服务器提取数据,并对XML属性值应用utf8_encode()。只要没有特殊字符,一切都可以正常工作。



服务器是MS SQL Server 2008 SP3



数据库,表和列归类都是SQL_Latin1_General_CP1_CI_AS



我正在使用PHP 5.2.17



这是我的PDO对象:

  $ pdo = new PDO(mssql:host = MyServer,1433; dbname = MyDatabase,user123,password123) ; 

我的查询是一个基本的SELECT。



我知道将特殊字符存储到SQL_Latin1_General_CP1_CI_AS列中并不是很好,但是理想情况下,使其工作而不改变它,这是非常好的,因为其他非PHP程序已经使用该列,它的工作正常。在SQL Server Management Studio中,我可以正确看到亚洲字符。



考虑到上述所有细节,我应该如何处理数据?

解决方案

我发现如何解决它,所以希望这将有助于某人。



首先,SQL_Latin1_General_CP1_CI_AS是CP-1252和UTF-8的奇怪组合。
基本的字符是CP-1252,所以这就是为什么我只需要做的就是UTF-8,一切正常。亚洲和其他UTF-8字符是以2个字节编码的,php pdo_mssql驱动程序似乎讨厌不同长度的字符,所以似乎做了一个CAST到varchar(而不是nvarchar),然后所有的2个字节的字符都成为问号(' ?')。



我通过将其转换为二进制来修复它,然后用php重建文本:

  SELECT CAST(MY_COLUMN AS VARBINARY(MAX))FROM MY_TABLE; 

在php:

  //二进制到十六进制
$ hex = bin2hex($ bin);

//然后从十六进制到字符串
$ str =;
($ i = 0; $ i< strlen($ hex)-1; $ i + = 2)
{
$ str。= chr(hexdec($ hex [$ i] 。$六角[$ I + 1]));
}
//然后从UCS-2LE / SQL_Latin1_General_CP1_CI_AS(这是DB中的列格式)到UTF-8
$ str = iconv('UCS-2LE','UTF- 8',$ str);


I'm generating a XML file with PHP using DomDocument and I need to handle asian characters. I'm pulling data from the MSSQL2008 server using the pdo_mssql driver and I apply utf8_encode() on the XML attribute values. Everything works fine as long as there's no special characters.

The server is MS SQL Server 2008 SP3

The database, table and column collation are all SQL_Latin1_General_CP1_CI_AS

I'm using PHP 5.2.17

Here's my PDO object:

$pdo = new PDO("mssql:host=MyServer,1433;dbname=MyDatabase", user123, password123);

My query is a basic SELECT.

I know storing special characters into SQL_Latin1_General_CP1_CI_AS columns isn't great, but ideally it would be nice to make it work without changing it, because other non-PHP programs already use that column and it works fine. In SQL Server Management Studio I can see the asian characters correctly.

Considering all the details above, how should I process the data?

解决方案

I found how to solve it, so hopefully this will be helpful to someone.

First, SQL_Latin1_General_CP1_CI_AS is a strange mix of CP-1252 and UTF-8. The basic characters are CP-1252, so this is why all I had to do was UTF-8 and everything worked. The asian and other UTF-8 characters are encoded on 2 bytes and the php pdo_mssql driver seems to hate varying length characters so it seems to do a CAST to varchar (instead of nvarchar) and then all the 2 byte characters become question marks ('?').

I fixed it by casting it to binary and then I rebuild the text with php:

SELECT CAST(MY_COLUMN AS VARBINARY(MAX)) FROM MY_TABLE;

In php:

//Binary to hexadecimal
$hex = bin2hex($bin);

//And then from hex to string
$str = "";
for ($i=0;$i<strlen($hex) -1;$i+=2)
{
    $str .= chr(hexdec($hex[$i].$hex[$i+1]));
}
//And then from UCS-2LE/SQL_Latin1_General_CP1_CI_AS (that's the column format in the DB) to UTF-8
$str = iconv('UCS-2LE', 'UTF-8', $str);

这篇关于将SQL_Latin1_General_CP1_CI_AS编码为UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆