adodbapi /字符串编码问题 [英] adodbapi / string encoding problem

查看:56
本文介绍了adodbapi /字符串编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




我通过urllib2阅读了一个网页。 读取调用的结果类型为

''str''。这个字符串可以通过

文件(''out.html'',''w'')写入。(write(html))。然后我使用adodbapi将字符串写入Access数据库中的Memofield

。如果我读回文本,我会得到一个

unicode字符串,由于编码

问题,无法通过文件(...)写入光盘。我如何解码unicode字符串以获取原始数据

返回?


问候,

Achim

Hi,

I read a webpage via urllib2. The result of the ''read'' call is of type
''str''. This string can be written to disc via
file(''out.html'',''w'').write(html). Then I write the string into a Memofield
in an Access database, using adodbapi. If I read the text back I get a
unicode string, which can not written to disc via file(...) due to encoding
problems. How do I have to decode the unicode string to get my original data
back?

regards,
Achim

推荐答案

Achim Domma写道:
Achim Domma wrote:


我通过urllib2阅读了一个网页。 ''read''调用的结果是
''str''。这个字符串可以通过
文件(''out.html'',''w'')写入。(write(html))。然后我使用adodbapi将字符串写入Access数据库中的Memofield
。如果我读回文本,我会得到一个
unicode字符串,由于
编码问题,无法通过文件(...)写入光盘。我如何解码unicode字符串以获取我的
原始数据?
Hi,

I read a webpage via urllib2. The result of the ''read'' call is of type
''str''. This string can be written to disc via
file(''out.html'',''w'').write(html). Then I write the string into a Memofield
in an Access database, using adodbapi. If I read the text back I get a
unicode string, which can not written to disc via file(...) due to
encoding problems. How do I have to decode the unicode string to get my
original data back?




你必须* EN * - 将Unicode编码为字符串,同样地,字符串

最初是以* DE *编码为Unicode,以确保获得相同的字符串返回

;具体来说,你必须使用相同的*编解码器*

(代表COder-DECoder)。我不知道使用什么编解码器adodbapi

(Python的普通默认编解码器是ASCII,这是最小的

共同点)周围的每个编码 - 如果adodbapi

没有偷偷地插入不同的编解码器,那么就不可能解码任何可能导致编码问题的东西回来;-)。

Alex



You have to *EN*-code Unicode into string, with the same way the string
had been *DE*-coded to Unicode originally, in order to be sure to get
the same string back; specifically, you have to use the same *codec*
(which stands for COder-DECoder). I don''t know what codec adodbapi is
using (Python''s normal default codec is ASCII, which is the "minimum
common denominator" of just about every encoding around -- if adodbapi
hadn''t surreptitiously inserted a different codec, it''s impossible that
anything would be decoded that might cause problems in encoding it back;-).
Alex


Achim Domma写道:
Achim Domma wrote:
I通过urllib2阅读网页。 ''read''调用的结果是
''str''。这个字符串可以通过
文件(''out.html'',''w'')写入。(write(html))。然后我使用adodbapi将字符串写入Access数据库中的Memofield
。如果我读回文本,我会得到一个
unicode字符串,由于
编码问题,无法通过文件(...)写入光盘。我如何解码unicode字符串以获取原始数据?
I read a webpage via urllib2. The result of the ''read'' call is of type
''str''. This string can be written to disc via
file(''out.html'',''w'').write(html). Then I write the string into a Memofield
in an Access database, using adodbapi. If I read the text back I get a
unicode string, which can not written to disc via file(...) due to
encoding problems. How do I have to decode the unicode string to get my
original data back?




您必须知道原始文件的编码。


假设(1)你有西欧人物,包括欧元符号,

(2)他们被正确翻译成unicode和(3)你想让他们回来/>
那样:



You have to know the encoding of the original file.

Assuming (1) you had western european characters including the euro sign,
(2) they were correctly translated into unicode and (3) you want them back
that way:

s = u" ??ü??ü" .encode (" iso-8859-15")
s
''\ xe4 \ xf6 \ xfc \ xc4 \ xd6 \ xdc''print s
??ü ??ütype(s)
< type''str''>
s = u"??ü??ü".encode("iso-8859-15")
s ''\xe4\xf6\xfc\xc4\xd6\xdc'' print s ??ü??ü type(s) <type ''str''>




或更一般:


unicodeFromAccess.encode(targetEncoding)

Peter



Or more general:

unicodeFromAccess.encode(targetEncoding)

Peter


" Alex Martelli" <人*** @ aleax.it>在留言中写道

news:0Z ********************** @ news1.tin.it ...
"Alex Martelli" <al***@aleax.it> wrote in message
news:0Z**********************@news1.tin.it...
你必须* EN * - 将Unicode编码成字符串,就像字符串
最初为* DE *编码为Unicode一样,以确保获得相同的字符串背部;具体来说,你必须使用相同的*编解码器*
You have to *EN*-code Unicode into string, with the same way the string
had been *DE*-coded to Unicode originally, in order to be sure to get
the same string back; specifically, you have to use the same *codec*



[...]


谢谢Alex,

我理解,但是看看adodbapi代码,我找不到任何

调用编码/解码。转换似乎发生在win32com的某个地方。

一旦它被转换为

Variant,不知道你是否会得到你的数据。 ;-)


Achim


[...]

Thanks Alex,

I understand that, but looking at the adodbapi code I could not find any
call to encode/decode. The conversion seems to happen somewhere in win32com.
Don''t know if you will ever get your data back, once it''s converted to
Variant. ;-)

Achim


这篇关于adodbapi /字符串编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆