从Python中的文件读取字符 [英] Character reading from file in Python

查看:312
本文介绍了从Python中的文件读取字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在一个文本文件中,有一个字符串我不喜欢这个。

In a text file, there is a string "I don't like this".

然而,当我将它读入一个字符串时,它变成我don \xe2\x80\x98t这样。我明白\\\‘是'的unicode表示。我使用

However, when I read it into a string, it becomes "I don\xe2\x80\x98t like this". I understand that \u2018 is the unicode representation of "'". I use

f1 = open (file1, "r")
text = f1.read()

命令进行阅读。

现在是可以读取字符串,当它被读入字符串时,它是我不喜欢这个,而不是我不像这样,而不是我不是\xe2\x80\x98t?

Now, is it possible to read the string in such a way that when it is read into the string, it is "I don't like this", instead of "I don\xe2\x80\x98t like this like this"?

第二个编辑:我看到有些人使用映射来解决这个问题,但是真的没有内置的转换,这样的ANSI到unicode(和反之亦然)转换?

Second edit: I have seen some people use mapping to solve this problem, but really, is there no built-in conversion that does this kind of ANSI to unicode ( and vice versa) conversion?

推荐答案

参考: http://docs.python.org/howto/unicode

因此,从文件中读取Unicode很简单:

import codecs
f = codecs.open('unicode.rst', encoding='utf-8')
for line in f:
    print repr(line)

也可以操作en文件在更新模式下,允许阅读和写入:

It's also possible to open files in update mode, allowing both reading and writing:

f = codecs.open('test', encoding='utf-8', mode='w+')
f.write(u'\u4500 blah blah blah\n')
f.seek(0)
print repr(f.readline()[:1])
f.close()

编辑:我假设您的预期目标只是为了能够将文件正确地读入Python中的字符串。如果您想从Unicode转换为ASCII字符串,那么确实没有直接的方法,因为Unicode字符不一定存在于ASCII中。

EDIT: I'm assuming that your intended goal is just to be able to read the file properly into a string in Python. If you're trying to convert to an ASCII string from Unicode, then there's really no direct way to do so, since the Unicode characters won't necessarily exist in ASCII.

如果您要转换为ASCII字符串,请尝试以下操作之一:

If you're trying to convert to an ASCII string, try one of the following:


  1. 将特定的unicode字符替换为ASCII等效的,如果你只是想处理一些特殊情况,例如这个特殊的例子

  1. Replace the specific unicode chars with ASCII equivalents, if you are only looking to handle a few special cases such as this particular example

使用 unicodedata module的 normalize() string.encode()方法,尽可能地转换为下一个最接近的ASCII等价物(Ref https://web.archive.org/web/20090228203858/http://techxplorer.com/2006/07/18/converting-unicode-to-ascii-using-python ) :

Use the unicodedata module's normalize() and the string.encode() method to convert as best you can to the next closest ASCII equivalent (Ref https://web.archive.org/web/20090228203858/http://techxplorer.com/2006/07/18/converting-unicode-to-ascii-using-python):

>>> teststr
u'I don\xe2\x80\x98t like this'
>>> unicodedata.normalize('NFKD', teststr).encode('ascii', 'ignore')
'I donat like this'

/ li>

这篇关于从Python中的文件读取字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆