使用python从sqlite db读取unicode [英] Reading unicode from sqlite db using python

查看:66
本文介绍了使用python从sqlite db读取unicode的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

必须检索以 unicode(在数据库中)存储的数据并将其转换为不同的形式.

The data stored in unicode (in database) has to be retrieved and convert into a different form.

以下代码片段

def convert(content):
    content = content.replace("ஜௌ", "n\[s");
    return content;

mydatabase = "database.db"
connection = sqlite3.connect(mydatabase)
cursor = connection.cursor()
query = ''' select unicode_data from table1'''
cursor.execute(query)
for row in cursor.fetchone():
    print convert(row)

在转换方法中产生以下错误消息.

yields the following error message in convert method.

exceptions.UnicodeDecodeError: 'ascii' 编解码器无法解码字节 0xe0位置 0:序号不在范围内(128)

exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)

如果数据库内容为"ஜௌஜௌஜௌ",则输出应为"n\[sn\[sn\[s"

If the database content is "ஜௌஜௌஜௌ", the output should be "n\[sn\[sn\[s"

文档建议在创建 unicode 字符串时使用 ignore 或 replace 来避免错误.

The documentation suggests to use ignore or replace to avoid the error, when creating the unicode string.

当迭代改变如下:

for row in cursor.fetchone():
    print convert(unicode(row, errors='replace'))

它回来了

exceptions.TypeError: 不支持解码 Unicode

exceptions.TypeError: decoding Unicode is not supported

通知该行已经是一个 unicode.

which informs that row is already a unicode.

对此的任何了解都非常感谢.提前致谢.

Any light on this to make it work is highly appreciated. Thanks in advance.

推荐答案

content = content.replace("ஜௌ", "n\[s");

建议您的意思是:

content = content.replace(u'ஜௌ', ur'n\[s');

或者在文件编码不确定的情况下为了安全:

or for safety where the encoding of your file is uncertain:

content = content.replace(u'\u0B9C\u0BCC', ur'n\[s');

您拥有的内容已经是 Unicode,因此您应该对其进行 Unicode 字符串替换.没有 u"ஜௌ" 是一个字节串,它代表那些依赖于源文件字符集的某些编码中的字符.(仅在最明确的情况下,即 ASCII 字符,字节字符串才能与 Unicode 字符串顺利配合使用.)

The content you have is already Unicode, so you should do Unicode string replacements on it. "ஜௌ" without the u is a string of bytes that represents those characters in some encoding dependent on your source file charset. (Byte strings work smoothly together with Unicode strings only in the most unambiguous cases, which is for ASCII characters.)

(r 字符串意味着不必担心包含裸反斜杠.)

(The r-string means not having to worry about including bare backslashes.)

这篇关于使用python从sqlite db读取unicode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆