unicode到ascii转换 [英] unicode to ascii converting

查看:79
本文介绍了unicode到ascii转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hello tlistmembers,


我正在使用编码功能将unicode转换为ascii。在某一点

这段代码运行得很好,但现在它已经坏了。


我正在读一个unicode的文本文件(I我不确定哪种味道或位深度。因为我一次在文件中读取一行

(readlines())它转换为ascii。很简单。与此同时,我使用bz2模块将bz2压缩到bz2但是工作得很好。代码

和下面显示的错误报告。我不确定该怎么做。


我认为因为它报告序数不在范围内,所以

与字符宽度有关我正在读书?


Peter W.

def encode_file(file_path,encode_type,compress =''N''):

"""

更改文件的编码

"""

new_encode = encode_type

old_file_path = file_path +''。old''

new_file_path = file_path

os.rename(file_path,old_file_path)

file_in = file(old_file_path,''r'')


如果压缩==''Y''或压缩==''y'':

bz_file_path = file_path +''。bz2''

bz_file_out = bz2.BZ2File(bz_file_path,''w'')

for file_in.readlines()中的行:

bz_file_out.write(line.encode(new_encode))

bz_file_out.close()


else:

file_out = fil e(file_path,''w'')

for file_in.readlines():

file_out.write(line.encode(new_encode))

file_out.close()


file_in.close()

os.remove(old_file_path)


错误报告:


解析

X:\ GenomeQuebec_repository\microarray \ HIS \ M15K \ Step p_1_repository\HISH0224.txt <回溯(最近一次调用最后一次):

文件" C:\Program Files \ActiveState Komodo 2.5 \ callkomodo \ kdb.py",line

433,在_do_start

self.kdb.run(code_ob,locals,locals)

文件" C:\Python23 \lib \\ bdb.py",第350行,在运行中

exec cmd在全局,本地人中

文件C:\Python23 \Lib\site-packages \\ \\ xBio \Scripts\unicodeToAscii.py",

第158行,在?

main()

文件C:\\ \\Python23\Lib\site-packages\xBio \Scripts\unicodeToAscii.py",

第75行,主要是
encode_file(fileToProcess,options.encode,''Y'')

文件C:\Python23 \Lib\site-packages \ xBio \Scripts\unicodeToAscii.py,

第144行,在encode_file中

bz_file_out.write(line.encode(new_encode))

UnicodeDecodeError:''ascii''编解码器无法解码位置0的字节0xff:

ordinal不在范围内(128)

Hello tlistmembers,

I am using the encoding function to convert unicode to ascii. At one point
this code was working just fine, however, now it has broken.

I am reading a text file that has is in unicode (I am unsure of which
flavour or bit depth). as I read in the file one line at a time
(readlines()) it converts to ascii. Simple enough. At the same time I am
copressing to bz2 with the bz2 module but that works just fine. The code
is and error reported appears below. I am unsure what to do.

I assume that because it is reporting that ordinal is not in range, that
something to do with the character width that I am reading?

Peter W.

def encode_file(file_path, encode_type, compress=''N''):
"""
Changes encoding of file
"""
new_encode = encode_type
old_file_path = file_path + ''.old''
new_file_path = file_path
os.rename(file_path,old_file_path)
file_in = file(old_file_path,''r'')

if compress == ''Y'' or compress == ''y'':
bz_file_path = file_path + ''.bz2''
bz_file_out = bz2.BZ2File(bz_file_path, ''w'')
for line in file_in.readlines():
bz_file_out.write(line.encode(new_encode))
bz_file_out.close()

else:
file_out = file(file_path,''w'')
for line in file_in.readlines():
file_out.write(line.encode(new_encode))
file_out.close()

file_in.close()
os.remove(old_file_path)

ERROR Reported:

Parsing
X:\GenomeQuebec_repository\microarray\HIS\M15K\Ste p_1_repository\HISH0224.txt
Traceback (most recent call last):
File "C:\Program Files\ActiveState Komodo 2.5\callkomodo\kdb.py", line
433, in _do_start
self.kdb.run(code_ob, locals, locals)
File "C:\Python23\lib\bdb.py", line 350, in run
exec cmd in globals, locals
File "C:\Python23\Lib\site-packages\xBio\Scripts\unicodeToAscii.py",
line 158, in ?
main()
File "C:\Python23\Lib\site-packages\xBio\Scripts\unicodeToAscii.py",
line 75, in main
encode_file(fileToProcess, options.encode, ''Y'')
File "C:\Python23\Lib\site-packages\xBio\Scripts\unicodeToAscii.py",
line 144, in encode_file
bz_file_out.write(line.encode(new_encode))
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0xff in position 0:
ordinal not in range(128)

推荐答案



" Peter Wilkinson" < PW ******** @ videotron.ca>在留言中写道

新闻:ma ************************************ ** @ pyth on.org ...

"Peter Wilkinson" <pw********@videotron.ca> wrote in message
news:ma**************************************@pyth on.org...
Hello tlistmembers,

我正在使用编码功能将unicode转换为ascii。有一次
这段代码工作得很好,但是,现在已经坏了。

我正在读一个unicode的文本文件(我不确定是哪个
味道或位深度)。因为我一次在文件中读取一行
(readlines())它转换为ascii。很简单。同时我用bz2模块压缩到bz2但是工作得很好。代码
是和下面显示的错误报告。我不确定该怎么做。

我认为因为它报告序数不在范围内,这与我正在阅读的字符宽度有关吗?

Peter W.

def encode_file(file_path,encode_type,compress =''N''):
"""
更改文件的编码
"""
new_encode = encode_type
old_file_path = file_path +''。old''
new_file_path = file_path
os.rename(file_path,old_file_path)
file_in = file(old_file_path,''r'')

如果压缩==''Y''或压缩==''y'':
bz_file_path = file_path + '.bz2''
bz_file_out = bz2.BZ2File(bz_file_path,''w'')
对于file_in.readlines()中的行:
bz_file_out.write(line.encode(new_encode) ))
bz_file_out.close()

否则:
file_out =文件(file_path,'w'')
for file_in.readlines()中的行:
file_out.write(line.encode(new_encode))
file_out.close()

file_in.close()
os.remove(old_file_path)

错误报告:

解析

X: \ GenomeQuebec_repository\microarray\HIS \ M15K \ Step p_1_repository\HISH0224.tx

t Traceback(最近一次调用最后一次):
文件C:\Program Files \ ActiveState Komodo 2.5\callkomodo\kdb.py",line
433,in _do_start
self.kdb.run(code_ob,locals,locals)
文件C:\\ \\ _Python23 \lib \bdb.py",第350行,在运行中
exec cmd在全局,本地人中
文件C:\Python23 \Lib\site-packages \ xBio \Scripts\unicodeToAscii.py",
第158行,在?
main()
文件" C:\Python23 \Lib \site-packages \ xBio \\ \\Scrip ts\unicodeToAscii.py",
第75行,主要是
encode_file(fileToProcess,options.encode,''Y'')
文件" C:\Python23 \ Lib \ site-packages \ xBio \Scripts\unicodeToAscii.py",
第144行,在encode_file中
bz_file_out.write(line.encode(new_encode))
UnicodeDecodeError:'' ascii''编解码器不能解码位置0的字节0xff:
序数不在范围内(128)
Hello tlistmembers,

I am using the encoding function to convert unicode to ascii. At one point
this code was working just fine, however, now it has broken.

I am reading a text file that has is in unicode (I am unsure of which
flavour or bit depth). as I read in the file one line at a time
(readlines()) it converts to ascii. Simple enough. At the same time I am
copressing to bz2 with the bz2 module but that works just fine. The code
is and error reported appears below. I am unsure what to do.

I assume that because it is reporting that ordinal is not in range, that
something to do with the character width that I am reading?

Peter W.

def encode_file(file_path, encode_type, compress=''N''):
"""
Changes encoding of file
"""
new_encode = encode_type
old_file_path = file_path + ''.old''
new_file_path = file_path
os.rename(file_path,old_file_path)
file_in = file(old_file_path,''r'')

if compress == ''Y'' or compress == ''y'':
bz_file_path = file_path + ''.bz2''
bz_file_out = bz2.BZ2File(bz_file_path, ''w'')
for line in file_in.readlines():
bz_file_out.write(line.encode(new_encode))
bz_file_out.close()

else:
file_out = file(file_path,''w'')
for line in file_in.readlines():
file_out.write(line.encode(new_encode))
file_out.close()

file_in.close()
os.remove(old_file_path)

ERROR Reported:

Parsing
X:\GenomeQuebec_repository\microarray\HIS\M15K\Ste p_1_repository\HISH0224.tx
t Traceback (most recent call last):
File "C:\Program Files\ActiveState Komodo 2.5\callkomodo\kdb.py", line
433, in _do_start
self.kdb.run(code_ob, locals, locals)
File "C:\Python23\lib\bdb.py", line 350, in run
exec cmd in globals, locals
File "C:\Python23\Lib\site-packages\xBio\Scripts\unicodeToAscii.py",
line 158, in ?
main()
File "C:\Python23\Lib\site-packages\xBio\Scripts\unicodeToAscii.py",
line 75, in main
encode_file(fileToProcess, options.encode, ''Y'')
File "C:\Python23\Lib\site-packages\xBio\Scripts\unicodeToAscii.py",
line 144, in encode_file
bz_file_out.write(line.encode(new_encode))
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0xff in position 0:
ordinal not in range(128)



我之前遇到过这个问题和解决方案我我们提出了一个

修复工作,但可能不是最好的,


def is_ord(strng):

new_text =''''

for i in strng:

if ord(i)> 127:

new_text = new_text +''''

else:

new_text = new_text + i

返回new_text


#Then just,


text_from_file = is_ord(text_from_file)


Tom


I''ve encountered this problem before and the solution I''ve come up with a
fix that works but is probably not the best,

def is_ord (strng):
new_text = ''''
for i in strng:
if ord(i) > 127:
new_text = new_text + ''''
else:
new_text = new_text + i
return new_text

#Then just,

text_from_file = is_ord(text_from_file)

Tom


谢谢Tom B.,


我现在就试试....


最好找出_why_这首先发生了。我会

继续做几天的搜索。

Peter W.

下午02:04 8/6 / 2004年,Tom B.写道:
Thanks Tom B.,

I will try that for now ....

It would be good to find out _why_ this happens in the first place. I will
keep do a little searching on this for a few days.
Peter W.
At 02:04 PM 8/6/2004, Tom B. wrote:
" Peter Wilkinson" < PW ******** @ videotron.ca>在消息中写道
新闻:ma ************************************** @ pyt hon.org ...
"Peter Wilkinson" <pw********@videotron.ca> wrote in message
news:ma**************************************@pyt hon.org...
Hello tlistmembers,

我正在使用编码功能将unicode转换为ascii。有一次
这段代码工作得很好,但是,现在已经坏了。

我正在读一个unicode的文本文件(我不确定是哪个
味道或位深度)。因为我一次在文件中读取一行
(readlines())它转换为ascii。很简单。同时我用bz2模块压缩到bz2但是工作得很好。代码
是和下面显示的错误报告。我不确定该怎么做。

我认为因为它报告序数不在范围内,这与我正在阅读的字符宽度有关吗?

Peter W.

def encode_file(file_path,encode_type,compress =''N''):
"""
更改文件的编码
"""
new_encode = encode_type
old_file_path = file_path +''。old''
new_file_path = file_path
os.rename(file_path,old_file_path)
file_in = file(old_file_path,''r'')

如果压缩==''Y''或压缩==''y'':
bz_file_path = file_path + '.bz2''
bz_file_out = bz2.BZ2File(bz_file_path,''w'')
对于file_in.readlines()中的行:
bz_file_out.write(line.encode(new_encode) ))
bz_file_out.close()

否则:
file_out =文件(file_path,'w'')
for file_in.readlines()中的行:
file_out.write(line.encode(new_encode))
file_out.close()

file_in.close()
os.remove(old_file_path)

错误报告:

解析
Hello tlistmembers,

I am using the encoding function to convert unicode to ascii. At one point
this code was working just fine, however, now it has broken.

I am reading a text file that has is in unicode (I am unsure of which
flavour or bit depth). as I read in the file one line at a time
(readlines()) it converts to ascii. Simple enough. At the same time I am
copressing to bz2 with the bz2 module but that works just fine. The code
is and error reported appears below. I am unsure what to do.

I assume that because it is reporting that ordinal is not in range, that
something to do with the character width that I am reading?

Peter W.

def encode_file(file_path, encode_type, compress=''N''):
"""
Changes encoding of file
"""
new_encode = encode_type
old_file_path = file_path + ''.old''
new_file_path = file_path
os.rename(file_path,old_file_path)
file_in = file(old_file_path,''r'')

if compress == ''Y'' or compress == ''y'':
bz_file_path = file_path + ''.bz2''
bz_file_out = bz2.BZ2File(bz_file_path, ''w'')
for line in file_in.readlines():
bz_file_out.write(line.encode(new_encode))
bz_file_out.close()

else:
file_out = file(file_path,''w'')
for line in file_in.readlines():
file_out.write(line.encode(new_encode))
file_out.close()

file_in.close()
os.remove(old_file_path)

ERROR Reported:

Parsing


X: \ GenomeQuebec存储库\microarray \HIS \ M15K \St ep_1_repository\HISH0224.tx
t


X:\GenomeQuebec_repository\microarray\HIS\M15K\St ep_1_repository\HISH0224.tx
t

Traceback(最近一次调用最后一次):
文件C :\ Program Files \ ActiveState Komodo 2.5 \ callkomodo \ kdb.py",line
433,in _do_start
self.kdb.run(code_ob,locals,locals)
文件C:\Python23 \lib \ bdb.py,第350行,在运行中
exec cmd in globals,locals
文件C:\Python23 \Lib \ site -packages \ xBio \Scripts\unicodeToAscii.py",
第158行,在?
main()
文件" C:\Python23 \Lib \ s ite-packages \ xBio \Scripts\unicodeToAscii.py",
第75行,主要是
encode_file(fileToProcess,options.encode,''Y'')
文件" C:\Python23 \Lib\site-packages \ xBio \Scripts\unicodeToAscii.py",
第144行,在encode_file中
bz_file_out.write(line.encode(new_encode))
UnicodeDecodeError:''ascii''编解码器无法解码位置0的字节0xff:
序数不在范围内(128)
Traceback (most recent call last):
File "C:\Program Files\ActiveState Komodo 2.5\callkomodo\kdb.py", line
433, in _do_start
self.kdb.run(code_ob, locals, locals)
File "C:\Python23\lib\bdb.py", line 350, in run
exec cmd in globals, locals
File "C:\Python23\Lib\site-packages\xBio\Scripts\unicodeToAscii.py",
line 158, in ?
main()
File "C:\Python23\Lib\site-packages\xBio\Scripts\unicodeToAscii.py",
line 75, in main
encode_file(fileToProcess, options.encode, ''Y'')
File "C:\Python23\Lib\site-packages\xBio\Scripts\unicodeToAscii.py",
line 144, in encode_file
bz_file_out.write(line.encode(new_encode))
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0xff in position 0:
ordinal not in range(128)


我之前遇到过这个问题解决方案我已经提出了一个有效但可能不是最好的解决方案,

def is_ord(strng):
new_text =''''<对于我来说:
如果ord(i)> 127:
new_text = new_text +''''
否则:
new_text = new_text + i
返回new_text

#Then just,

text_from_file = is_ord(text_from_file)

Tom

-
http://mail.python.org/mailman/listinfo/python-list



Peter Wilkinson< pw ******** @ videotron.ca>写道:
Peter Wilkinson <pw********@videotron.ca> writes:
最好找出_why_这首先发生了。我会在这几天继续做​​一些搜索。
It would be good to find out _why_ this happens in the first place. I
will keep do a little searching on this for a few days.




很可能是因为你在该文件中的字符不在

ASCII字符集。毕竟,ASCII只是

unicode的一小部分。例如



Most likely because you have characters in that file that are not in the
ASCII character set. ASCII is after all only a very small subset of
unicode. E.g.

u"?" .encode(" ascii")
Traceback(最近期)最后调用):

文件"< stdin>",第1行,在?

UnicodeEncodeError:''ascii''编解码器无法编码字符u位于0的''\ xe4'':序数不在范围内(128)

如果丢失信息可以,你可以使用错误参数

..编码如

u"?" .encode(" ascii"," ignore")
''''




u"?" .encode(" ascii"," replace")
u"?".encode("ascii") Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: ''ascii'' codec can''t encode character u''\xe4'' in position 0: ordinal not in range(128)
If it''s OK to lose information, you could use the error argument to
..encode like
u"?".encode("ascii", "ignore") ''''

or
u"?".encode("ascii", "replace")



'' ?''

Bernhard


-

Intevation GmbH http://intevation.de/

Skencil http://sketch.sourceforge.net/

Thuban http://thuban.intevation.org/


''?''
Bernhard

--
Intevation GmbH http://intevation.de/
Skencil http://sketch.sourceforge.net/
Thuban http://thuban.intevation.org/


这篇关于unicode到ascii转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆