解析目录问题Python 2.7与3.2 [英] Parsing inside a directory problem Python 2.7 vs. 3.2

查看:179
本文介绍了解析目录问题Python 2.7与3.2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Python 3中的目录中进行一些基本的文件解析。这个代码在Python 2.7中完美地工作,但是我无法弄明白Python 3.2中的问题。



import sys,os,re

  filelist = os.listdir('/ Users / sbrown / Desktop / 
os.chdir('/ Users / sbrown / Desktop / Test')
在文件列表中的文件:
infile = open(file,mode ='r')
filestring = infile.read()
infile.close()
pattern = re.compile('exit')
filestring = pattern.sub('so long',filestring)
outfile = open(file,mode ='w')
outfile.write(filestring)
outfile.close
exit

这是抛出的错误:

 追溯(最近的电话最后):
文件/Users/bunsen/Desktop/parser.py,第9行在< module>
filestring = infile.read()
文件/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/ascii.py,第26行,解码
return codecs.ascii_decode(input,self.errors)[0]
UnicodeDecodeError:'ascii'编解码器无法解码字段0x80在位置3131:ordinal不在范围(128)`

我正在解析的文件都是文本文件。我尝试在方法参数中指定utf-8的编码,但是没有工作。有任何想法吗?感谢提前!



如果我将编码指定为utf-8,这里是抛出的错误:

 追溯(最近的最后一次呼叫):
文件/Users/sbrown/Desktop/parser.py,第9行在< module>
filestring = infile.read()
文件/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/ascii.py,第26行,解码
return codecs.ascii_decode(input,self.errors)[0]
UnicodeDecodeError:'ascii'编解码器无法解码字段0x80在位置3131:ordinal不在范围(128)`


解决方案

您不是在打开文件时指定编码。您需要在Python 3中执行此操作,如在Python 3中,文本模式文件将返回已解码的Unicode字符串。



现在,您尝试使用UTF-8, t工作,显然,这不是使用的编码。只有你知道它是什么编码,但是我猜猜这是cp1252,因为0x80是代码页的字符为€,所以当你有欧洲Windows用户时0x80失败是很常见的。 : - )



为了兼容Python 2.7和3.1,我建议您使用io库打开文件。这是默认情况下在Python 3中使用的,它可以在Python 2.6和更高版本中使用:

  import io 
infile = io.open(filelist [0],mode ='rt',encoding ='cp1252')


I am trying to do some basic file parsing within a directory in Python 3. This code works perfectly in Python 2.7, but I can not figure out what the problem is in Python 3.2.

import sys, os, re

filelist = os.listdir('/Users/sbrown/Desktop/Test') 
os.chdir('/Users/sbrown/Desktop/Test') 
for file in filelist:
    infile = open(file, mode='r') 
    filestring = infile.read() 
    infile.close() 
    pattern = re.compile('exit') 
    filestring = pattern.sub('so long', filestring) 
    outfile = open(file, mode='w') 
    outfile.write(filestring)
    outfile.close 
exit

This is the error that is thrown back:

Traceback (most recent call last):
  File "/Users/bunsen/Desktop/parser.py", line 9, in <module>
      filestring = infile.read()
  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/ascii.py", line 26, in decode
      return codecs.ascii_decode(input, self.errors)[0]
  UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 3131: ordinal not in range(128)`

The files I am parsing are all text files. I tried specifying the encoding in the method arguments to utf-8, but that didn't work. Any ideas? Thanks in advance!

If I specify the encoding as utf-8, here is the error that is thrown:

Traceback (most recent call last):
  File "/Users/sbrown/Desktop/parser.py", line 9, in <module>
    filestring = infile.read()
  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 3131: ordinal not in range(128)`

解决方案

You are not specifying an encoding when you open your files. You need to do that in Python 3, as in Python 3 a text mode file will return decoded Unicode strings.

Now you tried with UTF-8, and that didn't work, so obviously, that isn't the encoding used. Only you know what encoding it is, but I'm guessing it's cp1252, as 0x80 is that code page's character for €, so failing on 0x80 is common when you have European Windows users. :-)

To be compatible with Python 2.7 and 3.1 I recommend you use the io library to open files. That is the one used in Python 3 by default, and it's available in Python 2.6 and later as well:

import io
infile = io.open(filelist[0], mode='rt', encoding='cp1252')

这篇关于解析目录问题Python 2.7与3.2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆