再次:UnicodeEncodeError:ascii编解码器无法编码 [英] Again: UnicodeEncodeError: ascii codec can't encode

查看：133 发布时间：2020/9/7 20:15:13 python python-2.7 ascii codec elementtree

本文介绍了再次:UnicodeEncodeError:ascii编解码器无法编码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个要解析的XML文件文件夹.我需要从这些文件的元素中获取文本.它们将被收集并打印到CSV文件中，其中元素在各列中列出.

I have a folder of XML files that I would like to parse. I need to get text out of the elements of these files. They will be collected and printed to a CSV file where the elements are listed in columns.

我现在实际上可以对我的文件中的 some 执行此操作.就是说，对于我的许多XML文件，该过程进行得很好，并且我得到了想要的输出.做到这一点的代码是:

I can actually do this right now for some of my files. That is, for many of my XML files, the process goes fine, and I get the output I want. The code that does this is:

import os, re, csv, string, operator
import xml.etree.cElementTree as ET
import codecs
def parseEO(doc):
    #getting the basic structure
    tree = ET.ElementTree(file=doc)
    root = tree.getroot()
    agencycodes = []
    rins = []
    titles =[]
    elements = [agencycodes, rins, titles]
    #pulling in the text from the fields
    for elem in tree.iter():
        if elem.tag == "AGENCY_CODE":
            agencycodes.append(int(elem.text))
        elif elem.tag == "RIN":
            rins.append(elem.text)
        elif elem.tag == "TITLE":
            titles.append(elem.text)
    with open('parsetest.csv', 'w') as f:
        writer = csv.writer(f)
        writer.writerows(zip(*elements))


parseEO('EO_file.xml')

但是，在某些版本的输入文件中，我得到了臭名昭著的错误:

However, on some versions of the input file, I get the infamous error:

'ascii' codec can't encode character u'\x97' in position 32: ordinal not in range(128)

完整的回溯是:

    ---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-15-28d095d44f02> in <module>()
----> 1 execfile(r'/parsingtest.py') # PYTHON-MODE

/Users/ian/Desktop/parsingtest.py in <module>()
     91         writer.writerows(zip(*elements))
     92 
---> 93 parseEO('/EO_file.xml')
     94 
     95 

/parsingtest.py in parseEO(doc)
     89     with open('parsetest.csv', 'w') as f:
     90         writer = csv.writer(f)
---> 91         writer.writerows(zip(*elements))
     92 
     93 parseEO('/EO_file.xml')
UnicodeEncodeError: 'ascii' codec can't encode character u'\x97' in position 32: ordinal not in range(128)

通过阅读其他线程，我很确定问题出在正在使用的编解码器中(并且，您也知道错误也很清楚).但是，我所阅读的解决方案对 me 并没有帮助(强调是因为我了解我是问题的根源，而不是人们过去的回答方式).

I am fairly confident from reading the other threads that the problem is in the codec being used (and, you know, the error is pretty clear on that as well). However, the solutions I have read haven't helped me (emphasized because I understand I am the source of the problem, not the way people have answered in the past).

几个答复(例如:此和这个)没有直接处理ElementTree，而且我不确定如何处理将解决方案转化为我正在做的事情.

Several repsonses (such as: this one and this one and this one) don't deal directly with ElementTree, and I'm not sure how to translate the solutions into what I'm doing.

其他处理ElementTree的解决方案(例如:这一个和此)正在使用短字符串(此处为第一个链接)或正在使用.tostring/.fromstring在ElementTree中的方法，我没有. (当然，也许我应该是.)

Other solutions that do deal with ElementTree (such as: this one and this one) are either using a short string (the first link here) or are using the .tostring/.fromstring methods in ElementTree which I do not. (Though, of course, perhaps I should be.)

我尝试过的不起作用:

我试图通过UTF-8编码导入文件:

I have attempted to bring in the file via UTF-8 encoding:

infile = codecs.open('/EO_file.xml', encoding="utf-8")
parseEO(infile)

但是我认为ElementTree进程已经将其理解为UTF-8(在我拥有的所有XML文件的第一行中都已指出)，因此这不仅不正确，而且遍及整个地方实际上是多余的再次.

but I think the ElementTree process already understands it to be UTF-8 (which is noted in the first line of all the XML files I have), and so this is not only not correct, but is actually redundantly bad all over again.

我试图在循环中声明一个编码过程，替换为:

I attempted to declare an encoding process within the loop, replacing:

tree = ET.ElementTree(file=doc)

与

parser = ET.XMLParser(encoding="utf-8")
tree = ET.parse(doc, parser=parser)

在上面的循环中

起作用.这对我也不起作用.之前起作用的相同文件仍然起作用，造成错误的相同文件仍然造成了错误.

in the loop above that does work. This didn't work for me either. The same files that worked before still worked, the same files that created the error still created the error.

还有很多其他随机尝试，但我不会为此而感到困惑.

There have been a lot of other random attempts, but I won't belabor the point.

因此，尽管我假设我拥有的代码既效率低下又不利于良好的编程风格，但它确实可以满足我对多个文件的要求.我试图了解是否只是一个我不知道的遗漏参数，是否应该以某种方式对文件进行预处理(我尚未确定有问题的字符在哪里，但确实知道u'\ x97转换为某种控制字符)或其他选项.

So, while I assume the code I have is both inefficient and offensive to good programming style, it does do what I want for several files. I am trying to understand if there is simply an argument I'm missing that I don't know about, if I should somehow pre-process the files (I have not identified where the offending character is, but do know that u'\x97 translates to a control character of some kind), or some other option.

再次:UnicodeEncodeError:ascii编解码器无法编码 [英] Again: UnicodeEncodeError: ascii codec can't encode

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

再次:UnicodeEncodeError:ascii编解码器无法编码 [英] Again: UnicodeEncodeError: ascii codec can&#39;t encode

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

再次:UnicodeEncodeError:ascii编解码器无法编码 [英] Again: UnicodeEncodeError: ascii codec can't encode

登录关闭