“utf-8"编解码器无法解码位置 4276 中的字节 0xa0:起始字节无效 [英] 'utf-8' codec can't decode byte 0xa0 in position 4276: invalid start byte
问题描述
我尝试阅读并打印以下文件:txt.tsv (https://www.sec.gov/files/dera/data/financial-statement-and-notes-data-sets/2017q3_notes.zip)
I try to read and print the following file: txt.tsv (https://www.sec.gov/files/dera/data/financial-statement-and-notes-data-sets/2017q3_notes.zip)
根据 SEC,数据集以单一编码提供,如下所示:
According to the SEC the data set is provided in a single encoding, as follows:
制表符分隔值 (.txt):utf-8、制表符分隔、 - 终止的行,第一行包含小写的字段名称.
Tab Delimited Value (.txt): utf-8, tab-delimited, - terminated lines, with the first line containing the field names in lowercase.
我当前的代码:
import csv
with open('txt.tsv') as tsvfile:
reader = csv.DictReader(tsvfile, dialect='excel-tab')
for row in reader:
print(row)
所有尝试都以以下错误消息结束:
All attempts ended with the following error message:
'utf-8' 编解码器无法解码位置 4276 中的字节 0xa0:无效起始字节
'utf-8' codec can't decode byte 0xa0 in position 4276: invalid start byte
我有点失落.谁能帮我?非常感谢.
I am a bit lost. Can anyone help me? Many thanks in advance.
推荐答案
文件中的编码为windows-1252".使用:
Encoding in the file is 'windows-1252'. Use:
open('txt.tsv', encoding='windows-1252')
这篇关于“utf-8"编解码器无法解码位置 4276 中的字节 0xa0:起始字节无效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!