我如何映射到字典而不是列表? [英] how do i map to a dictionary rather than a list?
问题描述
我有以下功能,这是将lxml对象映射到字典的基本工作...
i have the following function, which doe a basic job of mapping an lxml object to a dictionary...
from lxml import etree
tree = etree.parse('file.xml')
root = tree.getroot()
def xml_to_dict(el):
d={}
if el.text:
print '***write tag as string'
d[el.tag] = el.text
else:
d[el.tag] = {}
children = el.getchildren()
if children:
d[el.tag] = map(xml_to_dict, children)
return d
v = xml_to_dict(root)
。
>>>print v
{'root': [{'a': '1'}, {'a': [{'b': '2'}, {'b': '2'}]}, {'aa': '1a'}]}
但我想....
>>>print v
{'root': {'a': ['1', {'b': [2, 2]}], 'aa': '1a'}}
如何重写函数xml_to_dict(el)以便我得到所需的输出?
how do i rewrite the function xml_to_dict(el) so that i get the required output?
这里的xml我正在解析,为了清楚起见。
here's the xml i'm parsing, for clarity.
<root>
<a>1</a>
<a>
<b>2</b>
<b>2</b>
</a>
<aa>1a</aa>
</root>
谢谢:)
推荐答案
好吧, map()
将永远返回一个列表,所以简单的答案是不要使用 map()
。相反,通过循环 children
并将 xml_to_dict(child)
的结果分配给您要使用的字典键。看起来您希望使用该标签作为关键字,并将该值作为该标签的项目列表,因此将成为以下内容:
Well, map()
will always return a list, so the easy answer is "don't use map()
". Instead, build a dictionary like you already are, by looping over children
and assigning the result of xml_to_dict(child)
to the dictionary key you want to use. It looks like you want to use the tag as the key and have the value be a list of items with that tag, so it would become something like:
import collections
from lxml import etree
tree = etree.parse('file.xml')
root = tree.getroot()
def xml_to_dict(el):
d={}
if el.text:
print '***write tag as string'
d[el.tag] = el.text
child_dicts = collections.defaultdict(list)
for child in el.getchildren():
child_dicts[child.tag].append(xml_to_dict(child))
if child_dicts:
d[el.tag] = child_dicts
return d
xml_to_dict(root)
将dict中的标签条目留作defaultdict;如果你想要一个正常的dict由于某些原因,使用 d [el.tag] = dict(child_dicts)
。请注意,像以前一样,如果标签同时具有文本和子句,则文本将不会出现在dict中。您可能想考虑一下您的dict的不同布局来应对这种情况。
This leaves the tag entry in the dict as a defaultdict; if you want a normal dict for some reason, use d[el.tag] = dict(child_dicts)
. Note that, like before, if a tag has both text and children the text won't appear in the dict. You may want to think about a different layout for your dict to cope with that.
编辑:
在转义问题中产生输出的代码不会在 xml_to_dict
中递归,因为您只需要外部元素的dict,而不是所有子标签所以,你会使用如下的东西:
Code that would produce the output in your rephrased question wouldn't recurse in xml_to_dict
-- because you only want a dict for the outer element, not for all child tags. So, you'd use something like:
import collections
from lxml import etree
tree = etree.parse('file.xml')
root = tree.getroot()
def xml_to_item(el):
if el.text:
print '***write tag as string'
item = el.text
child_dicts = collections.defaultdict(list)
for child in el.getchildren():
child_dicts[child.tag].append(xml_to_item(child))
return dict(child_dicts) or item
def xml_to_dict(el):
return {el.tag: xml_to_item(el)}
print xml_to_dict(root)
处理带有文本和子句的标签,并将 collections.defaultdict(list)
转换为正常的dict,因此输出(几乎)符合您的期望:
This still doesn't handle tags with both text and children sanely, and it turns the collections.defaultdict(list)
into a normal dict so the output is (almost) as you expect:
***write tag as string
***write tag as string
***write tag as string
***write tag as string
***write tag as string
***write tag as string
{'root': {'a': ['1', {'b': ['2', '2']}], 'aa': ['1a']}}
(如果你真的想要整数而不是文本数据的字符串在 b
标签中,您必须明确将其转换为整数。)
(If you really want integers instead of strings for the text data in the b
tags, you'll have to explicitly turn them into integers somehow.)
这篇关于我如何映射到字典而不是列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!