使用通用编码检测器（chardet）在Python中的文本文件中进行字符检测 [英] Character detection in a text file in Python using the Universal Encoding Detector (chardet)

查看：249 发布时间：2016/11/19 13:39:36 python character-encoding

本文介绍了使用通用编码检测器（chardet）在Python中的文本文件中进行字符检测的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在Python中使用通用编码检测器（chardet）来检测文本文件（'infile'）中最有可能的字符编码，并在后续处理中使用它。

I am trying to use the Universal Encoding Detector (chardet) in Python to detect the most probable character encoding in a text file ('infile') and use that in further processing.

虽然chardet主要用于检测网页的字符编码，但我发现了一个示例用于单个文本文件。

While chardet is designed primarily for detecting the character encoding of webpages, I have found an example of it being used on individual text files.

但是，我不能工作了如何告诉脚本设置最可能的字符编码到变量'charenc'（它在整个脚本中使用了几次）。

However, I cannot work out how to tell the script to set the most likely character encoding to the variable 'charenc' (which is used several times throughout the script).

我的代码，基于上述示例和chardet自己的文档的组合如下：

My code, based on a combination of the aforementioned example and chardet's own documentation is as follows:

import chardet    
rawdata=open(infile,"r").read()
chardet.detect(rawdata)

字符检测是必要的，因为脚本继续运行以下几个）：

Character detection is necessary as the script goes on to run the following (as well as several similar uses):

inF=open(infile,"rb")
s=unicode(inF.read(),charenc)
inF.close()

任何帮助将非常感激。

Any help would be greatly appreciated.

推荐答案

chardet.detect 返回一个字典，键'encoding'。所以你可以这样做：

chardet.detect returns a dictionary which provides the encoding as the value associated with the key 'encoding'. So you can do this:

import chardet    
rawdata = open(infile, "r").read()
result = chardet.detect(rawdata)
charenc = result['encoding']

这篇关于使用通用编码检测器（chardet）在Python中的文本文件中进行字符检测的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用通用编码检测器（chardet）在Python中的文本文件中进行字符检测 [英] Character detection in a text file in Python using the Universal Encoding Detector (chardet)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用通用编码检测器（chardet）在Python中的文本文件中进行字符检测 [英] Character detection in a text file in Python using the Universal Encoding Detector (chardet)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭