从 XML 文档生成嵌套列表 [英] Generating nested lists from XML doc

查看:54
本文介绍了从 XML 文档生成嵌套列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 python 中工作,我的目标是解析我制作的 XML 文档并创建一个嵌套的列表列表,以便稍后访问它们并解析提要.XML 文档类似于以下代码段:

Working in python, my goal is to parse through an XML doc I made and create a nested list of lists in order to access them later and parse the feeds. The XML doc resembles the following snippet:

<?xml version="1.0'>
<sources>
    <!--Source List by Institution-->
    <sourceList source="cbc">
        <f>http://rss.cbc.ca/lineup/topstories.xml</f>
    </sourceList>
    <sourceList source="bbc">
        <f>http://feeds.bbci.co.uk/news/rss.xml</f>
        <f>http://feeds.bbci.co.uk/news/world/rss.xml</f>
        <f>http://feeds.bbci.co.uk/news/uk/rss.xml</f>
    </sourceList>
    <sourceList source="reuters">
        <f>http://feeds.reuters.com/reuters/topNews</f>
        <f>http://feeds.reuters.com/news/artsculture</f>
    </sourceList>
</sources>

我想要类似嵌套列表的东西,其中最里面的列表将是 </f> 标签之间的内容,而上面的列表将使用来源的名称,例如.source="reuters" 是路透社.从 XML 文档中检索信息不是问题,我正在使用 elementtreenode.get('source') 等循环检索信息.问题我在生成具有所需名称和不同来源所需的不同长度的列表时遇到问题.我试过附加但不确定如何附加到列表中检索到的名称.字典会更好吗?在这种情况下,最佳做法是什么?我怎样才能做到这一点?如果需要更多信息,只需发表评论,我一定会添加.

I would like to have something like nested lists where the inner most list would be the content between the <f></f> tags and the list above that one would be created with the names of the sources ex. source="reuters" would be reuters. Retrieving the info from the XML doc isn't a problem and I'm doing it with elementtree with loops retrieving with node.get('source') etc. The problem is I'm having trouble generating the lists with the desired names and different lengths required from the different sources. I have tried appending but am unsure how to append to list with the names retrieved. Would a dictionary be better? What would be the best practice in this situation? And how might I make this work? If any more info is required just post a comment and I'll be sure to add it.

推荐答案

根据您的描述,根据源名称和值根据提要列表具有键的字典可能会解决问题.

From your description, a dictionary with keys according to the source name and values according to the feed lists might do the trick.

这是构建这样一个野兽的一种方法:

Here is one way to construct such a beast:

from lxml import etree
from pprint import pprint

news_sources = {
    source.attrib['source'] : [feed.text for feed in source.xpath('./f')]
    for source in etree.parse('x.xml').xpath('/sources/sourceList')}

pprint(news_sources)

另一个示例,没有 lxmlxpath:

Another sample, without lxml or xpath:

import xml.etree.ElementTree as ET
from pprint import pprint

news_sources = {
    source.attrib['source'] : [feed.text for feed in source]
    for source in ET.parse('x.xml').getroot()}

pprint(news_sources)

最后,如果你对列表理解过敏:

Finally, if you are allergic to list comprehensions:

import xml.etree.ElementTree as ET
from pprint import pprint

xml = ET.parse('x.xml')
root = xml.getroot()
news_sources = {}
for sourceList in root:
    sourceListName = sourceList.attrib['source']
    news_sources[sourceListName] = []
    for feed in sourceList:
       feedName = feed.text
       news_sources[sourceListName].append(feedName)

pprint(news_sources)

这篇关于从 XML 文档生成嵌套列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆