在Linux上使用pyodbc在nvarchar mssql字段中插入unicode或utf-8字符 [英] using pyodbc on linux to insert unicode or utf-8 chars in a nvarchar mssql field

查看:130
本文介绍了在Linux上使用pyodbc在nvarchar mssql字段中插入unicode或utf-8字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Ubuntu 9.04

我已经安装了以下软件包版本:

I have installed the following package versions:

unixodbc and unixodbc-dev: 2.2.11-16build3
tdsodbc: 0.82-4
libsybdb5: 0.82-4
freetds-common and freetds-dev: 0.82-4

我已经这样配置/etc/unixodbc.ini:

[FreeTDS]
Description             = TDS driver (Sybase/MS SQL)
Driver          = /usr/lib/odbc/libtdsodbc.so
Setup           = /usr/lib/odbc/libtdsS.so
CPTimeout               = 
CPReuse         = 
UsageCount              = 2

我已经这样配置/etc/freetds/freetds.conf:

[global]
    tds version = 8.0
    client charset = UTF-8

我从http://github.com/mkleehammer/pyodbc获取了pyodbc修订版31e2fae4adbf1b2af1726e5668a3414cf46b454f,并使用"python setup.py install"安装了它

I have grabbed pyodbc revision 31e2fae4adbf1b2af1726e5668a3414cf46b454f from http://github.com/mkleehammer/pyodbc and installed it using "python setup.py install"

我在本地网络上安装了 Microsoft SQL Server 2000 的Windows计算机,启动并侦听了本地IP地址10.32.42.69.我有一个名为"Common"的空数据库.我的用户"sa"的密码为秘密",并具有全部特权.

I have a windows machine with Microsoft SQL Server 2000 installed on my local network, up and listening on the local ip address 10.32.42.69. I have an empty database created with name "Common". I have the user "sa" with password "secret" with full priviledges.

我正在使用以下python代码来建立连接:

I am using the following python code to setup the connection:

import pyodbc
odbcstring = "SERVER=10.32.42.69;UID=sa;PWD=secret;DATABASE=Common;DRIVER=FreeTDS"
con = pyodbc.connect(s)
cur = con.cursor()
cur.execute('''
CREATE TABLE testing (
    id INTEGER NOT NULL IDENTITY(1,1), 
    name NVARCHAR(200) NULL, 
    PRIMARY KEY (id)
)
    ''')
con.commit()

到目前为止,

一切工作.我在服务器上使用了SQLServer的企业管理器,并且有新表. 现在,我想在表上插入一些数据.

Everything WORKS up to this point. I have used SQLServer's Enterprise Manager on the server and the new table is there. Now I want to insert some data on the table.

cur = con.cursor()
cur.execute('INSERT INTO testing (name) VALUES (?)', (u'something',))

那失败了!!这是我得到的错误:

That fails!! Here's the error I get:

pyodbc.Error: ('HY004', '[HY004] [FreeTDS][SQL Server]Invalid data type 
(0) (SQLBindParameter)'

由于我的客户端配置为使用UTF-8,所以我认为我可以通过将数据编码为UTF-8来解决.那行得通,但随后我又得到了奇怪的数据:

Since my client is configured to use UTF-8 I thought I could solve by encoding data to UTF-8. That works, but then I get back strange data:

cur = con.cursor()
cur.execute('DELETE FROM testing')
cur.execute('INSERT INTO testing (name) VALUES (?)', (u'somé string'.encode('utf-8'),))
con.commit()
# fetching data back
cur = con.cursor()
cur.execute('SELECT name FROM testing')
data = cur.fetchone()
print type(data[0]), data[0]

这没有错误,但是返回的数据与发送的数据不同!我得到:

That gives no error, but the data returned is not the same data sent! I get:

<type 'unicode'> somé string

也就是说,pyodbc不会直接接受unicode对象,但是它将unicode对象返回给我!而且编码混乱了!

That is, pyodbc won't accept an unicode object directly, but it returns unicode objects back to me! And the encoding is being mixed up!

现在问这个问题:

我希望代码在NVARCHAR和/或NTEXT字段中插入unicode数据.当我查询回来时,我想要我插入的相同数据.

I want code to insert unicode data in a NVARCHAR and/or NTEXT field. When I query back, I want the same data I inserted back.

这可以通过不同地配置系统,或者使用包装器功能在插入或检索时能够将数据正确地从unicode转换为

That can be by configuring the system differently, or by using a wrapper function able to convert the data correctly to/from unicode when inserting or retrieving

问的不是很多吗?

推荐答案

我记得使用odbc驱动程序时会遇到这种愚蠢的问题,即使那是Java + oracle组合.

I can remember having this kind of stupid problems using odbc drivers, even if that time it was a java+oracle combination.

核心是odbc驱动程序在将查询字符串发送到DB时显然对它进行了编码.即使该字段是Unicode,并且即使您提供Unicode,在某些情况下也似乎无关紧要.

The core thing is that odbc driver apparently encodes the query string when sending it to the DB. Even if the field is Unicode, and if you provide Unicode, in some cases it does not seem to matter.

您需要确保驱动程序发送的内容与数据库(不仅是服务器,而且还有数据库)具有相同的编码.否则,您当然会得到一些时髦的字符,因为在编码/解码时,客户端或服务器都会混淆.您是否知道您的服务器将其用作解码数据的默认字符集(如MS所说的代码点)?

You need to ensure that what is sent by the driver has the same encoding as your Database (not only server, but also database). Otherwise, of course you get funky characters because either the client or the server is mixing things up when encoding/or decoding. Do you have any idea of the charset (codepoint as MS like to say) that your server is using as a default for decoding data?

例如,请参见该MS页.对于Unicode字段,排序规则仅用于定义列中的排序顺序,用于指定数据的存储方式.

See that MS page for example. For Unicode fields, collation is used only to define the sort order in the column, not to specify how the data is stored.

如果您将数据存储为Unicode,则有一种独特的方式来表示它,这就是Unicode的目的:无需定义与要使用的所有语言兼容的字符集:)

If you store your data as Unicode, there is an Unique way to represent it, that's the purpose of Unicode: no need to define a charset that is compatible with all the languages that you are going to use :)

这里的问题是当我向服务器提供不是 Unicode的数据时会发生什么?".例如:

The question here is "what happens when I give data to the server that is not Unicode?". For example:

  • 当我向服务器发送UTF-8字符串时,它如何理解?
  • 当我向服务器发送UTF-16字符串时,它如何理解?
  • 当我向服务器发送Latin1字符串时,它如何理解?

从服务器的角度来看,所有这三个字符串仅是字节流.服务器无法猜测您对其进行编码的编码.这意味着,如果您的odbc客户端最终向服务器发送 bytestrings (编码字符串)而不是发送 unicode 数据,则会 遇到麻烦:如果这样做,则服务器将使用预定义的编码(这是我的问题:服务器将使用哪种编码?由于不是猜测,因此必须是参数值),并且字符串是否已使用其他编码进行了编码, dzing ,数据将被破坏.

From the server perspective, all these 3 strings are only a stream of bytes. The server cannot guess the encoding in which you encoded them. Which means that you will get troubles if your odbc client ends up sending bytestrings (an encoded string) to the server instead of sending unicode data: if you do so, the server will use a predefined encoding (that was my question: what encoding the server will use? Since it is not guessing, it must be a parameter value), and if the string had been encoded using a different encoding, dzing, data will get corrupted.

与Python中的操作完全相似:

It's exactly similar as doing in Python:

uni = u'Hey my name is André'
in_utf8 = uni.encode('utf-8')
# send the utf-8 data to server
# send(in_utf8)

# on server side
# server receives it. But server is Japanese.
# So the server treats the data with the National charset, shift-jis:
some_string = in_utf8 # some_string = receive()    
decoded = some_string.decode('sjis')

只需尝试一下.好有趣.解码后的字符串应该是嘿,我的名字是安德烈",但是应该是嘿,我的名字是安德烈". é被日文テ ゥ

Just try it. It's fun. The decoded string is supposed to be "Hey my name is André", but is "Hey my name is Andrテゥ". é gets replaced by Japanese テゥ

因此,我的建议是:您需要确保pyodbc能够直接将数据作为Unicode发送.如果pyodbc无法做到这一点,您将得到意想不到的结果.

Hence my suggestion: you need to ensure that pyodbc is able to send directly the data as Unicode. If pyodbc fails to do this, you will get unexpected results.

我以客户端到服务器的方式描述了该问题.但是,当从服务器向客户端进行通信时,可能会出现相同类型的问题.如果客户端无法理解Unicode数据,则可能会遇到麻烦.

And I described the problem in the Client to Server way. But the same sort of issues can arise when communicating back from the Server to the Client. If the Client cannot understand Unicode data, you'll likely get into troubles.

实际上,FreeTDS会为您处理事情,并将所有数据转换为UCS2 unicode. ().

Actually, FreeTDS takes care of things for you and translates all the data to UCS2 unicode. (Source).

  • 服务器<-> FreeTDS:UCS2数据
  • FreeTDS<-> pyodbc:编码的字符串,以UTF-8编码(来自/etc/freetds/freetds.conf)

因此,如果您将UTF-8数据传递给pyodbc,我希望您的应用程序能够正常运行.实际上,正如 django-pyodbc票据所述, django-pyodbc与pyodbc在UTF-8中通信,所以应该没事.

So I would expect your application to work correctly if you pass UTF-8 data to pyodbc. In fact, as this django-pyodbc ticket states, django-pyodbc communicates in UTF-8 with pyodbc, so you should be fine.

但是, cramm0 表示FreeTDS 0.82并非完全没有错误,并且0.82与可以在此处.您可能应该尝试使用修补的FreeTDS

However, cramm0 says that FreeTDS 0.82 is not completely bugfree, and that there are significant differences between 0.82 and the official patched 0.82 version that can be found here. You should probably try using the patched FreeTDS

已编辑:删除了旧数据,该数据与FreeTDS无关,但仅与Easysoft商业odbc驱动程序有关.抱歉.

这篇关于在Linux上使用pyodbc在nvarchar mssql字段中插入unicode或utf-8字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆