在Python3中读取UTF-8编码文件和文本文件 [英] Reading UTF-8 Encoded Files and Text Files in Python3

查看:471
本文介绍了在Python3中读取UTF-8编码文件和文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好,所以是python3和unicode.我知道所有python3字符串实际上都是unicode字符串,所有python3代码都存储为utf-8.但是python3如何读取文本文件?它是否假定它们以utf-8编码?读取文本文件时需要调用解码('utf-8')吗?熊猫read_csv()和to_csv()呢?

Ok, so python3 and unicode. I know that all python3 strings are actually unicode strings and all python3 code is stored as utf-8. But how does python3 reads text files? Does it assume that they are encoded in utf-8? Do I need to call decode('utf-8') when reading a text file? What about pandas read_csv() and to_csv()?

推荐答案

Python的内置-in函数open() 具有可选参数encoding:

encoding 是用于解码或编码文件的编码名称.仅应在文本模式下使用.默认编码是 取决于平台(无论locale.getpreferredencoding()返回什么), 但可以使用Python支持的任何文本编码.见 codecs模块以获取受支持的编码列表. >

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.

在熊猫中可以找到类似的参数:

Analogous parameter could be found in pandas:

  • pandas.read_csv() :encoding: str,默认为无.读/写时用于UTF的编码(例如‘utf-8’).
  • Series.to_csv() ::字符串,可选.一个字符串,表示内容为非ASCII时使用的编码,适用于3之前的python版本.
  • DataFrame.to_csv() ::字符串,可选.表示输出文件中使用的编码的字符串,在Python 2上默认为‘ascii’在Python 3上默认为‘utf-8’.
  • pandas.read_csv(): encoding: str, default None. Encoding to use for UTF when reading/writing (ex. ‘utf-8’).
  • Series.to_csv(): encoding: string, optional. A string representing the encoding to use if the contents are non-ascii, for python versions prior to 3.
  • DataFrame.to_csv(): encoding: string, optional. A string representing the encoding to use in the output file, defaults to ‘ascii’ on Python 2 and ‘utf-8’ on Python 3.

这篇关于在Python3中读取UTF-8编码文件和文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆