如何使用 Python 读取 utf-8 编码的文本文件 [英] How to read a utf-8 encoded text file using Python

查看：78 发布时间：2021/9/15 19:40:34 python encoding utf-8

本文介绍了如何使用 Python 读取 utf-8 编码的文本文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要分析一个泰米尔语文本文件(utf-8 编码).我在接口 IDLE 上使用 Python 的 nltk 包.当我尝试读取界面上的文本文件时，这是我得到的错误.我如何避免这种情况?

I need to analyse a textfile in tamil (utf-8 encoded). Im using nltk package of Python on the interface IDLE. when i try to read the text file on the interface, this is the error i get. how do i avoid this?

corpus = open('C:\\Users\\Customer\\Desktop\\DISSERTATION\\ettuthokai.txt').read()

Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    corpus = open('C:\\Users\\Customer\\Desktop\\DISSERTATION\\ettuthokai.txt').read()
  File "C:\Users\Customer\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 33: character maps to <undefined>

推荐答案

由于您使用的是 Python 3，只需在 open() 中添加 encoding 参数即可:

Since you are using Python 3, just add the encoding parameter to open():

corpus = open(
    r"C:\Users\Customer\Desktop\DISSERTATION\ettuthokai.txt", encoding="utf-8"
).read()

这篇关于如何使用 Python 读取 utf-8 编码的文本文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用 Python 读取 utf-8 编码的文本文件 [英] How to read a utf-8 encoded text file using Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用 Python 读取 utf-8 编码的文本文件 [英] How to read a utf-8 encoded text file using Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭