处理unicode数据 [英] handling unicode data

查看:52
本文介绍了处理unicode数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,


我开始学习python但是我在如何处理数据编码方面遇到了一些困难我是''从数据库中读取。我正在使用

pymssql访问存储在SqlServer数据库中的数据,

以下是我用于测试目的的脚本。


------------------------------------------ -----------------------------------

导入pymssql


mssqlConnection =

pymssql.connect(host =''localhost'',user =''sa'',passwor d =''password'',database ='' TestDB'')

cur = mssqlConnection.cursor()

query ="选择ID,来自TestTable的Term,其中ID> 200和ID< 300;

cur.execute(查询)

row = cur.fetchone()

results = []

而行不是无:

term = row [1]

打印类型(行[1])

打印术语

results.append(term)

row = cur.fetchone()

cur.close()

mssqlConnection .close()

打印结果

---------------------------- -------------------------------------------------


在控制台输出中,我希望看到的记录是Fran?a

我得到以下内容:


"< type''str''>" - 当我打印类型(打印类型(行[1]))

" Fran + a" - 当我打印术语时变量(印刷术语)

" Fran \ xd8a" - 当我打印所有查询结果时(打印结果)

Term中的值TestTable中的列存储为unicode(

列'的数据类型为nvarchar),但是,我读的值为/ b $ b的python数据类型不是unicode。

这似乎都是编码问题,但我看不出我在做什么

错了..

任何想法?


提前感谢,

Filipe

解决方案

Filipe写道:

在控制台输出中,我希望看到的记录Fran?a
我得到以下内容:

"< type''tr''>" - 当我打印类型(打印类型(行[1]))
Fran + a时 - 当我打印术语时变量(印刷术语)
Fran \ xd8a - 当我打印所有查询结果(打印结果)时

Term中的值TestTable中的列存储为unicode(
列的数据类型为nvarchar),但是,我正在读取的值的python数据类型不是unicode。
这一切似乎都是一个编码问题,但我看不出我在做什么
错误..




看起来像DB-API驱动程序返回8-位ISO-8859-1字符串而不是Unicode

字符串。可能有一些配置选项;在最坏的情况下,你可以做一些事情,你可以做一些像


def unicodify(价值):

if isinstance (value,str):

value = unicode(value," iso-8859-1")

返回值


term = unicodify(row [1])


但是如果你能让DB-API驱动程序做正确的事情肯定会更好。


< / F>

blockquote>

Fredrik Lundh写道:

看起来像DB-API驱动程序返回8位ISO-8859-1字符串而不是Unicode
字符串。可能有一些配置选项;看看


你想在哪里指出OP?

在最坏的情况下,你可以做一些像

def unicodify (值):
if isinstance(value,str):
value = unicode(value," iso-8859-1")
返回值

术语= unicodify(row [1])

但是如果你能让DB-API驱动程序做正确的事情肯定会更好。




似乎pymssql不支持这样的事情。


此外,似乎DB-Library(pymssql使用的API)总是

返回CP_ACP字符(除非启用ANSI到OEM转换);

所以正确要使用的编码是mbcs。


请注意,Microsoft计划放弃DB-Library,因此切换到其他模块可能最好是
用于SQL Server访问。


问候,

Martin


嗨Fredrik,


感谢您的回复。

而不是:

term = row [1]

我试过:

term = unicode(row [1]," iso-8859-1")


但是在打印term时会返回以下错误:

Traceback(最近一次调用最后一次):

文件" test.py",第11行,在?

打印期限
文件" c:\Program Files \Python24 \lib \ encodings\cp437.py",第18行,在

编码

return codecs.charmap_encode(input,errors,encoding_map)

UnicodeEncodeError:''charmap''编解码器无法编码字符u''\ xd8''

位置31:角色映射到< undefined>


是否有可能某些unicode字符串无法打印到控制台?

这很奇怪,因为我可以手动编写在控制台中相同的字符串

我正在尝试打印。

除了iso-8859-1之外,我还尝试了其他编码,但得到的相同

错误。


您认为这与DB-API驱动程序有关吗?如果我必须在那里换一些东西,我甚至不知道从哪里开始:|


干杯,

菲利普


Hi all,

I''m starting to learn python but am having some difficulties with how
it handles the encoding of data I''m reading from a database. I''m using
pymssql to access data stored in a SqlServer database, and the
following is the script I''m using for testing purposes.

-----------------------------------------------------------------------------
import pymssql

mssqlConnection =
pymssql.connect(host=''localhost'',user=''sa'',passwor d=''password'',database=''TestDB'')
cur = mssqlConnection.cursor()
query="Select ID, Term from TestTable where ID > 200 and ID < 300;"
cur.execute(query)
row = cur.fetchone()
results = []
while row is not None:
term = row[1]
print type(row[1])
print term
results.append(term)
row = cur.fetchone()
cur.close()
mssqlConnection.close()
print results
-----------------------------------------------------------------------------

In the console output, for a record where I expected to see "Fran?a"
I''m getting the following:

"<type ''str''>" - When I print the type (print type(row[1]))
"Fran+a" - When I print the "term" variable (print term)
"Fran\xd8a" - When I print all the query results (print results)
The values in "Term" column in "TestTable" are stored as unicode (the
column''s datatype is nvarchar), yet, the python data type of the values
I''m reading is not unicode.
It all seems to be an encoding issue, but I can''t see what I''m doing
wrong..
Any thoughts?

thanks in advance,
Filipe

解决方案

Filipe wrote:

In the console output, for a record where I expected to see "Fran?a"
I''m getting the following:

"<type ''str''>" - When I print the type (print type(row[1]))
"Fran+a" - When I print the "term" variable (print term)
"Fran\xd8a" - When I print all the query results (print results)

The values in "Term" column in "TestTable" are stored as unicode (the
column''s datatype is nvarchar), yet, the python data type of the values
I''m reading is not unicode.
It all seems to be an encoding issue, but I can''t see what I''m doing
wrong..



looks like the DB-API driver returns 8-bit ISO-8859-1 strings instead of Unicode
strings. there might be some configuration option for this; see

in worst case, you could do something like

def unicodify(value):
if isinstance(value, str):
value = unicode(value, "iso-8859-1")
return value

term = unicodify(row[1])

but it''s definitely better if you can get the DB-API driver to do the right thing.

</F>


Fredrik Lundh wrote:

looks like the DB-API driver returns 8-bit ISO-8859-1 strings instead of Unicode
strings. there might be some configuration option for this; see

Where did you want to point the OP here?
in worst case, you could do something like

def unicodify(value):
if isinstance(value, str):
value = unicode(value, "iso-8859-1")
return value

term = unicodify(row[1])

but it''s definitely better if you can get the DB-API driver to do the right thing.



It seems pymssql does not support such a thing.

Also, it appears that DB-Library (the API used by pymssql) always
returns CP_ACP characters (unless ANSI-to-OEM conversion is enabled);
so the "right" encoding to use is "mbcs".

Notice that Microsoft plans to abandon DB-Library, so it might be
best to switch to a different module for SQL Server access.

Regards,
Martin


Hi Fredrik,

Thanks for the reply.
Instead of:
term = row[1]
I tried:
term = unicode(row[1], "iso-8859-1")

but the following error was returned when printing "term":
Traceback (most recent call last):
File "test.py", line 11, in ?
print term
File "c:\Program Files\Python24\lib\encodings\cp437.py", line 18, in
encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: ''charmap'' codec can''t encode character u''\xd8'' in
position 31: character maps to <undefined>

Is it possible some unicode strings are not printable to the console?
It''s odd, because I can manually write in the console the same string
I''m trying to print.
I also tried other encodings, besides iso-8859-1, but got the same
error.

Do you think this has something to do with the DB-API driver? I don''t
even know where to start if I have to change something in there :|

Cheers,
Filipe


这篇关于处理unicode数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆