'utf-8'编解码器无法解码位置4276的字节0xa0:无效的起始字节 [英] 'utf-8' codec can't decode byte 0xa0 in position 4276: invalid start byte

查看:357
本文介绍了'utf-8'编解码器无法解码位置4276的字节0xa0:无效的起始字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试阅读并打印以下文件:txt.tsv(

I try to read and print the following file: txt.tsv (https://www.sec.gov/files/dera/data/financial-statement-and-notes-data-sets/2017q3_notes.zip)

根据SEC,数据集以单一编码提供,如下所示:

According to the SEC the data set is provided in a single encoding, as follows:

制表符分隔值(.txt):utf-8,制表符分隔,\ n终止的行,第一行包含小写的字段名称.

Tab Delimited Value (.txt): utf-8, tab-delimited, \n- terminated lines, with the first line containing the field names in lowercase.

我当前的代码:

import csv

with open('txt.tsv') as tsvfile:
    reader = csv.DictReader(tsvfile, dialect='excel-tab')
    for row in reader:
        print(row)

所有尝试均以以下错误消息结束:

All attempts ended with the following error message:

'utf-8'编解码器无法解码位置4276的字节0xa0:无效的起始字节

'utf-8' codec can't decode byte 0xa0 in position 4276: invalid start byte

我有点迷茫.谁能帮我?预先非常感谢.

I am a bit lost. Can anyone help me? Many thanks in advance.

推荐答案

文件中的编码为"windows-1252".使用:

Encoding in the file is 'windows-1252'. Use:

open('txt.tsv', encoding='windows-1252')

这篇关于'utf-8'编解码器无法解码位置4276的字节0xa0:无效的起始字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆