Python:Traceback codecs.charmap_decode(input,self.errors,decode_table)[0] [英] Python: Traceback codecs.charmap_decode(input,self.errors,decoding_table)[0]

查看:901
本文介绍了Python:Traceback codecs.charmap_decode(input,self.errors,decode_table)[0]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是示例代码,其目的是合并来自给定文件夹的文本文件和它的子文件夹。我偶尔得到追溯,所以不知道在哪里看。还需要一些帮助来增强代码以防止空白行被合并&在合并/主文件中不显示行。在合并文件之前可能是个好主意,应该执行一些清理,或者只是在合并过程中忽略空白行。

文件夹中的文本文件不超过1000行,但汇总主文件可以非常容易地跨越10000行以上。

  import os 
root ='C:\\\\\\\\'
files = [(path,f)for文件路径,_,file_list在os.walk(root)for f in file_list]
out_file = open('C:\\\\\\\\\\\\\\\\\''''' )
为路径,f_name在文件中:
in_file = open('%s /%s'%(path,f_name),'r')

# / path / to / file(space)file_contents
in_file中的行:
out_file.write('%s /%s%s'%(path,f_name,line))
in_file .close()

在每个文件后面输入新行
out_file.write('\\\
')

打开('master.txt', 'r')为f:
lines = f.readlines()
with open('master.txt','w')as f:
f.write(。join (L为L行,如果L.strip()))



Traceback(最近一次调用最后一次):
在< module>文件的第9行中输入C:\Dropbox\Python\master.py对于in_file中的行:
文件C:\PYTHON32\LIB\encodings\cp1252.py,第23行,解码返回codecs.charmap_decode(input,self.errors,decode_table)[0]
UnicodeDecodeError:'charmap'编解码器无法解码位置972中的字节0x81:字符映射到< undefined>


解决方案

错误是由于Python 3打开文件一个默认的编码,不匹配的内容。

如果你只是在复制文件内容,你最好使用 shutil.copyfileobj() function 以二进制模式打开文件。这样,你完全避免了编码问题(当然,只要所有的源文件都是相同的编码,那么你就不会得到一个带有混合编码的目标文件):

 导入shutil 
导入os.path

打开('C:\\Dropbox\ \Python\\\\master.txt','wb')作为输出:
为路径,f_name在文件中:
为开放(os.path.join(path,f_name),'rb ')作为输入:
shutil.copyfileobj(输入,输出)
output.write(b'\\\
')#在文件之间插入额外的换行符

为了使用上下文管理器,我已经清理了一些代码(所以你的文件在完成时自动关闭)并使用 os



如果您需要逐行处理您的输入,您需要告诉Python是什么编码,所以它可以解码文件内容到python字符串对象:

  open(path,mode,encoding ='UTF8')

请注意,这需要您知道文件使用什么编码。



阅读 Python Unicode HOWTO ,如果你还有关于python 3,文件和编码的问题。 / p>

Following is sample code, aim is just to merges text files from give folder and it's sub folder. i am getting Traceback occasionally so not sure where to look. also need some help to enhance the code to prevent blank line being merge & to display no lines in merged/master file. Probably it's good idea to before merging file, some cleanup should performed or just to ignores blank line during merging process.

Text file in folder is not more then 1000 lines but aggregate master file could cross 10000+ lines very easily.

import os
root = 'C:\\Dropbox\\ans7i\\'
files = [(path,f) for path,_,file_list in os.walk(root) for f in file_list]
out_file = open('C:\\Dropbox\\Python\\master.txt','w')
for path,f_name in files:
    in_file = open('%s/%s'%(path,f_name), 'r')

    # write out root/path/to/file (space) file_contents
    for line in in_file:
        out_file.write('%s/%s %s'%(path,f_name,line))
    in_file.close()

    # enter new line after each file
    out_file.write('\n')

with open('master.txt', 'r') as f:
  lines = f.readlines()
with open('master.txt', 'w') as f:
  f.write("".join(L for L in lines if L.strip())) 



Traceback (most recent call last):
  File "C:\Dropbox\Python\master.py", line 9, in <module> for line in in_file:
  File "C:\PYTHON32\LIB\encodings\cp1252.py", line  23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0]  
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 972: character maps to <undefined>  

解决方案

The error is thrown because Python 3 opens your files with a default encoding that doesn't match the contents.

If all you are doing is copying file contents, you'd be better off using the shutil.copyfileobj() function together with opening the files in binary mode. That way you avoid encoding issues altogether (as long as all your source files are the same encoding of course, so you don't end up with a target file with mixed encodings):

import shutil
import os.path

with open('C:\\Dropbox\\Python\\master.txt','wb') as output:
    for path, f_name in files:
        with open(os.path.join(path, f_name), 'rb') as input:
            shutil.copyfileobj(input, output)
        output.write(b'\n') # insert extra newline between files

I've cleaned up the code a little to use context managers (so your files get closed automatically when done) and to use os.path to create the full path for your files.

If you do need to process your input line by line you'll need to tell Python what encoding to expect, so it can decode the file contents to python string objects:

open(path, mode, encoding='UTF8')

Note that this requires you to know up front what encoding the files use.

Read up on the Python Unicode HOWTO if you have further questions about python 3, files and encodings.

这篇关于Python:Traceback codecs.charmap_decode(input,self.errors,decode_table)[0]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆