Python打开CSV文件与据称混合编码？ [英] Python open CSV file with supposedly mixed encodings?

查看：461 发布时间：2016/11/19 17:28:12 python csv encoding utf-8 character-encoding

本文介绍了Python打开CSV文件与据称混合编码？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试使用Python读取CSV文本文件（根据记事本++ ，没有BOM的UTF-8）。但是编码似乎有一个问题：

 
 print（open（path，encoding =utf-8）。read ）

编解码器无法解码字节 08xf

这个小字符似乎是问题：●（全字符串：●•อีเปียขี้

如果我尝试UTF-16，则会显示一条消息：

 
 #also尝试使用encode 
 print（open（path，encoding =utf-16）。read（）。encode（'utf- 8'））

非法UTF-16代理

即使我尝试使用自动编解码器查找程序打开它，我收到错误。

 def csv_unireader（f，encoding =utf-8）：
 for csv.reader（codecs.iterencode（codecs.iterdecode （f，encoding），utf-8））：
 yield [e.decode（utf-8）for e in row]

我可以忽略什么？该文件包含Twitter文本，其中包含许多不同的字符是肯定的。但这在Python中不可能是如此困难的任务，只是读取/打印文件？

编辑 >

刚刚尝试使用此答案中的代码： http://stackoverflow.com/a / 14786752/45311

 
 import csv 
 
 with open（'source.csv'，newline = b'，b'，'b'，'b'，'b'，'b' pre> 
 
 这至少会在屏幕上打印一些行，但在某些行后也会抛出一个错误：
 
   cp850.py ，第19行，encode 
 return codecs.charmap_encode（input，self.errors，encoding_map）[0] 
 UnicodeEncodeError： 'charmap'编解码器不能编码位置62-63中的字符：
字符映射到
 
 
似乎自动使用 CP850 这是另一个编码...我不能理解这一切.... 
解决方案
你的python的版本是什么？ 
如果使用2.x尝试将导入粘贴在脚本开头：
 从__future__ import unicode_literals 
  
比尝试：
  print（open（path）.read（）。encode（'utf-8'））
  
还有一个很棒的字符集检测工具： chardet 。 
我希望它会帮助你。
 
I'm trying read a CSV textfile (UTF-8 without BOM according to Notepad++) using Python. However there seems to be a problem with encoding:
print(open(path, encoding="utf-8").read())



  Codec can't decode byte 08xf
This little character seems to be the problem: ● (full string: "●• อีเปียขี้บ่น ت •●"), however I'm sure there will be more.

If I try UTF-16, then there is a message:
#also tried with encode
print(open(path, encoding="utf-16").read().encode('utf-8'))



  Illegal UTF-16 surrogate
Even when I try opening it with an automatic codec finder I receive the error. 
def csv_unireader(f, encoding="utf-8"):
    for row in csv.reader(codecs.iterencode(codecs.iterdecode(f, encoding), "utf-8")):
        yield [e.decode("utf-8") for e in row]
What am I overlooking? The file contains Twitter texts which contain a lot of different characters that's for sure. But this can't be such difficult task in Python, just reading/printing a file?

Edit:

Just tried using the code from this answer: http://stackoverflow.com/a/14786752/45311
import csv

with open('source.csv', newline='', encoding='utf-8') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)
This at least prints some rows to the screen, but it also throws an error after some rows:

  cp850.py, line 19, in encode
  return codecs.charmap_encode(input,self.errors,encoding_map)[0]
  UnicodeEncodeError: 'charmap' codec can't encode characters in position 62-63:
  character maps to 
It seems to automatically use CP850 which is another encoding... I can't make sense out of all this....
 解决方案 
What is the version of your python?
If use the 2.x try to paste the import at the beginning of your script:
from __future__ import unicode_literals
than try:
print(open(path).read().encode('utf-8'))
There is also a great tool for charset detections: chardet.
I hope it'll help you.

                        这篇关于Python打开CSV文件与据称混合编码？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python打开CSV文件与据称混合编码？ [英] Python open CSV file with supposedly mixed encodings?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python打开CSV文件与据称混合编码？ [英] Python open CSV file with supposedly mixed encodings?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭