将Unicode转换为ASCII NIGHTMARE [英] Convertion of Unicode to ASCII NIGHTMARE

查看:81
本文介绍了将Unicode转换为ASCII NIGHTMARE的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




我正在使用cx_Oracle从oracle数据库中读取。我正在使用apsw写一个

SQLite数据库。

oracle数据库为euopean项目返回utf-8字符

名称,即ASCII视角下的特殊字符。


我收到以下错误:

SQLiteCur.execute(sql,row)
UnicodeDecodeError: ''ascii''编解码器无法解码位置12中的字节0xdc:序数不在>范围内(128)




我现在用google搜索了serval天仍然无法将其编码为

ascii。


我按如下方式编码SQL:


sql = 插入%s值%s %(SQLiteTable,paramstr)

sql.encode(''ascii'',''忽略'')


然后我编写每个行值像这样从Oracle返回:


row = map(encodestr,row)

SQLiteCur.execute(sql,row)


其中encodestr如下:


def encodestr(item):

if isinstance(item,types.StringTypes):

返回unicodedata.normalize(''NFKD'',unicode(item,''utf-8'',

''忽略''))。encode(''ASCII'' ,''忽略'')

否则:

返回商品


我已经尝试过上述类似功能,

以上各种谷歌搜索的许可。但我仍然得到上面的例外情况




SQLiteCur.execute(sql,row)


,异常重新放在一个字段中的数据。


最后我在SQL

语句中使用了oracles convert函数但是想要理解为什么会发生这种情况以及为什么

很难在python中转换字符串。我从其他人那里读过很多关于这个问题的投诉,其中一些人已经写了

自定义剥离程序。我还没有尝试过自定义例程,因为我

认为它应该可以在python中使用。


谢谢,

解决方案

ChaosKCW写道:



我正在使用cx_Oracle从oracle数据库中读取。我正在使用apsw写一个
SQLite数据库。

oracle数据库为euopean项目名称返回utf-8字符,即从ASCII角度看特殊的charcaters。


并且cx_Oracle是否将那些返回为Unicode对象或纯字符串

包含UTF-8字节序列?在这两种情况之间区分

是非常重要的,而且我没有任何经验可以在这里给出建议cx_Oracle



我收到以下错误:

SQLiteCur.execute(sql,row)
UnicodeDecodeError:''ascii''编解码器无法解码12位的字节0xdc :ordinal not
in range(128)



看起来你可能有你要呈现的Unicode对象

sqlite。在任何情况下,使用我使用的早期版本的pysqlite,

你需要连接一个特殊的unicode_results参数,尽管

更高版本应该与Unicode对象一起使用没有特殊的

配置。看到这里有一个帖子(我似乎已经参与了
参与,巧合):

http://mail.python.org/pipermail/pyt...ne/107526.html

我现在用google搜索了serval的日子,仍然无法将其编码为
ascii。




这很难找到out - 虽然先前的搜索确实发生了b
发现了一些关于它的讨论,我只是尝试了并且未能找到

启发性文档 - 我当然没有看到很多引用

在官方的pysqlite网站上。


Paul


ChaosKCW写道:



我正在使用cx_Oracle从oracle数据库中读取。我正在使用apsw写一个
SQLite数据库。

oracle数据库为euopean项目名称返回utf-8字符,即从ASCII角度看特殊的charcaters。


我不确定你是否正确使用这些条款。从您的描述

下面看来,您的数据似乎是从Oracle数据库返回的,而不是包含UTF-8编码数据的常规字符串。这些

欧洲字符不是ASCII视角下的特殊字符;他们

根本不是ASCII字符集中的字符。

我收到以下错误:

SQLiteCur .execute(sql,row)
UnicodeDecodeError:''ascii''编解码器无法解码位置12中的字节0xdc:序数不在>范围内(128)
我已经用google搜索serval天了并且仍然无法将其编码为
ascii。




不要。你不能。这些字符在ASCII字符集中不存在。

SQLite 3.0处理UTF-8编码的SQL语句。

http://www.sqlite.org/version3.html

我编码SQL如下:

sql ="插入%s值%s" %(SQLiteTable,paramstr)
sql.encode(''ascii'',''忽略'')




.encode()方法返回一个新价值;它不会改变对象。


sql = sql.encode(''utf-8'')


-

Robert Kern
ro ********* @ gmail.com


我开始相信整个世界都是一个谜,一个无害的谜团

因我们自己的疯狂企图而变得可怕把它解释为好像它有一个潜在的真相。

- Umberto Eco







BLOCKQUOTE>>别。你不能。这些字符在ASCII字符集中不存在。

SQLite 3.0处理UTF-8编码的SQL语句。




这不完全正确 - 如果丢失信息就可以了。 OPs

代码将UTF-8标准化为NFKD,像变形金刚一样变换为一个两个字符的序列,基本上说是带有两个点的a顶部" ;.使用

''忽略''指定为编码器的参数,这应该是

字母a。

问候,


Diez


Hi

I am reading from an oracle database using cx_Oracle. I am writing to a
SQLite database using apsw.

The oracle database is returning utf-8 characters for euopean item
names, ie special charcaters from an ASCII perspective.

I get the following error:

SQLiteCur.execute(sql, row)
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0xdc in position 12: ordinal not in >range(128)



I have googled for serval days now and still cant get it to encode to
ascii.

I encode the SQL as follows:

sql = "insert into %s values %s" % (SQLiteTable, paramstr)
sql.encode(''ascii'', ''ignore'')

I then code each of the row values returned from Oracle like this:

row = map(encodestr, row)
SQLiteCur.execute(sql, row)

where encodestr is as follows:

def encodestr(item):
if isinstance(item, types.StringTypes):
return unicodedata.normalize(''NFKD'', unicode(item, ''utf-8'',
''ignore'')).encode(''ASCII'', ''ignore'')
else:
return item

I have tried a thousand of similiar functions to the above,
permitations of the above from various google searches. But I still get
the above exception on the line:

SQLiteCur.execute(sql, row)

and the exception is reslated to the data in one field.

Int the end I resorted to using oracles convert function in the SQL
statement but would like to understand why this is happening and why
its so hard to convert the string in python. I have read many
complaints about this from other people some of whom have written
custom stripping routines. I havent tried a custom routine yet, cause I
think it should be possilble in python.

Thanks,

解决方案

ChaosKCW wrote:

Hi

I am reading from an oracle database using cx_Oracle. I am writing to a
SQLite database using apsw.

The oracle database is returning utf-8 characters for euopean item
names, ie special charcaters from an ASCII perspective.
And does cx_Oracle return those as Unicode objects or as plain strings
containing UTF-8 byte sequences? It''s very important to distinguish
between these two cases, and I don''t have any experience with cx_Oracle
to be able to give advice here.
I get the following error:

SQLiteCur.execute(sql, row)
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0xdc in position 12: ordinal not
in range(128)


It looks like you may have Unicode objects that you''re presenting to
sqlite. In any case, with earlier versions of pysqlite that I''ve used,
you need to connect with a special unicode_results parameter, although
later versions should work with Unicode objects without special
configuration. See here for a thread (in which I seem to have
participated, coincidentally):

http://mail.python.org/pipermail/pyt...ne/107526.html
I have googled for serval days now and still cant get it to encode to
ascii.



This is a tough thing to find out - whilst previous searches did
uncover some discussions about it, I just tried and failed to find the
enlightening documents - and I certainly didn''t see many references to
it on the official pysqlite site.

Paul


ChaosKCW wrote:

Hi

I am reading from an oracle database using cx_Oracle. I am writing to a
SQLite database using apsw.

The oracle database is returning utf-8 characters for euopean item
names, ie special charcaters from an ASCII perspective.
I''m not sure that you are using those terms correctly. From your description
below, it seems that your data is being returned from the Oracle database as
unicode strings rather than regular strings containing UTF-8 encoded data. These
European characters are not "special characters from an ASCII perspective;" they
simply aren''t characters in the ASCII character set at all.
I get the following error:

SQLiteCur.execute(sql, row)
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0xdc in position 12: ordinal not in >range(128)
I have googled for serval days now and still cant get it to encode to
ascii.



Don''t. You can''t. Those characters don''t exist in the ASCII character set.
SQLite 3.0 deals with UTF-8 encoded SQL statements, though.

http://www.sqlite.org/version3.html
I encode the SQL as follows:

sql = "insert into %s values %s" % (SQLiteTable, paramstr)
sql.encode(''ascii'', ''ignore'')



The .encode() method returns a new value; it does not change an object inplace.

sql = sql.encode(''utf-8'')

--
Robert Kern
ro*********@gmail.com

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco


> Don''t. You can''t. Those characters don''t exist in the ASCII character set.

SQLite 3.0 deals with UTF-8 encoded SQL statements, though.



That is not entirely correct - one can, if losing information is ok. The OPs
code that normalized UTF-8 to NFKD, an umlaut like ?¤ is transformed to a
two-character-sequence basically saying "a with two dots on top". With
''ignore'' specified as parameter to the encoder, this should be result in
the letter a.
Regards,

Diez


这篇关于将Unicode转换为ASCII NIGHTMARE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆