如何解决''UnicodeDecodeError：'charmap'编解码器无法解码29815位置的字节0x9d：字符映射到< undefined>''？ [英] How to fix ''UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 29815: character maps to <undefined>''?

查看：125 发布时间：2020/10/19 19:39:46 python unicode file-io sqlite decode

本文介绍了如何解决''UnicodeDecodeError：'charmap'编解码器无法解码29815位置的字节0x9d：字符映射到< undefined>''？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

目前，我正在尝试通过Spyder IDE / GUI获得一个Python 3程序，以对充满信息的文本文件进行一些操作。但是，当尝试读取文件时，出现以下错误：

At the moment, I am trying to get a Python 3 program to do some manipulations with a text file filled with information, through the Spyder IDE/GUI. However, when trying to read the file I get the following error:

  File "<ipython-input-13-d81e1333b8cd>", line 77, in <module>
    parser(f)

  File "<ipython-input-13-d81e1333b8cd>", line 18, in parser
    data = infile.read()

  File "C:\ProgramData\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 29815: character maps to <undefined>

程序代码如下：

import os

os.getcwd()

import glob
import re
import sqlite3
import csv

def parser(file):

    # Open a TXT file. Store all articles in a list. Each article is an item
    # of the list. Split articles based on the location of such string as
    # 'Document PRN0000020080617e46h00461'

    articles = []
    with open(file, 'r') as infile:
        data = infile.read()
    start = re.search(r'\n HD\n', data).start()
    for m in re.finditer(r'Document [a-zA-Z0-9]{25}\n', data):
        end = m.end()
        a = data[start:end].strip()
        a = '\n   ' + a
        articles.append(a)
        start = end

    # In each article, find all used Intelligence Indexing field codes. Extract
    # content of each used field code, and write to a CSV file.

    # All field codes (order matters)
    fields = ['HD', 'CR', 'WC', 'PD', 'ET', 'SN', 'SC', 'ED', 'PG', 'LA', 'CY', 'LP',
              'TD', 'CT', 'RF', 'CO', 'IN', 'NS', 'RE', 'IPC', 'IPD', 'PUB', 'AN']

    for a in articles:
        used = [f for f in fields if re.search(r'\n   ' + f + r'\n', a)]
        unused = [[i, f] for i, f in enumerate(fields) if not re.search(r'\n   ' + f + r'\n', a)]
        fields_pos = []
        for f in used:
            f_m = re.search(r'\n   ' + f + r'\n', a)
            f_pos = [f, f_m.start(), f_m.end()]
            fields_pos.append(f_pos)
        obs = []
        n = len(used)
        for i in range(0, n):
            used_f = fields_pos[i][0]
            start = fields_pos[i][2]
            if i < n - 1:
                end = fields_pos[i + 1][1]
            else:
                end = len(a)
            content = a[start:end].strip()
            obs.append(content)
        for f in unused:
            obs.insert(f[0], '')
        obs.insert(0, file.split('/')[-1].split('.')[0])  # insert Company ID, e.g., GVKEY
        # print(obs)
        cur.execute('''INSERT INTO articles
                       (id, hd, cr, wc, pd, et, sn, sc, ed, pg, la, cy, lp, td, ct, rf,
                       co, ina, ns, re, ipc, ipd, pub, an)
                       VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
                       ?, ?, ?, ?, ?, ?, ?, ?)''', obs)

# Write to SQLITE
conn = sqlite3.connect('factiva.db')
with conn:
    cur = conn.cursor()
    cur.execute('DROP TABLE IF EXISTS articles')
    # Mirror all field codes except changing 'IN' to 'INC' because it is an invalid name
    cur.execute('''CREATE TABLE articles
                   (nid integer primary key, id text, hd text, cr text, wc text, pd text,
                   et text, sn text, sc text, ed text, pg text, la text, cy text, lp text,
                   td text, ct text, rf text, co text, ina text, ns text, re text, ipc text,
                   ipd text, pub text, an text)''')
    for f in glob.glob('*.txt'):
        print(f)
        parser(f)

# Write to CSV to feed Stata
with open('factiva.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    with conn:
        cur = conn.cursor()
        cur.execute('SELECT * FROM articles WHERE hd IS NOT NULL')
        colname = [desc[0] for desc in cur.description]
        writer.writerow(colname)
        for obs in cur.fetchall():
            writer.writerow(obs)

如何解决''UnicodeDecodeError：'charmap'编解码器无法解码29815位置的字节0x9d：字符映射到< undefined>''？ [英] How to fix ''UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 29815: character maps to <undefined>''?

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

如何解决''UnicodeDecodeError：'charmap'编解码器无法解码29815位置的字节0x9d：字符映射到&lt; undefined&gt;''？ [英] How to fix &#39;&#39;UnicodeDecodeError: &#39;charmap&#39; codec can&#39;t decode byte 0x9d in position 29815: character maps to &lt;undefined&gt;&#39;&#39;?

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

如何解决''UnicodeDecodeError：'charmap'编解码器无法解码29815位置的字节0x9d：字符映射到< undefined>''？ [英] How to fix ''UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 29815: character maps to <undefined>''?

登录关闭