从XML文档生成嵌套列表 [英] Generating nested lists from XML doc

查看:42
本文介绍了从XML文档生成嵌套列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在python中工作,我的目标是解析我制作的XML文档并创建嵌套的列表列表,以便以后访问它们并解析提要.XML文档类似于以下代码段:

Working in python, my goal is to parse through an XML doc I made and create a nested list of lists in order to access them later and parse the feeds. The XML doc resembles the following snippet:

<?xml version="1.0'>
<sources>
    <!--Source List by Institution-->
    <sourceList source="cbc">
        <f>http://rss.cbc.ca/lineup/topstories.xml</f>
    </sourceList>
    <sourceList source="bbc">
        <f>http://feeds.bbci.co.uk/news/rss.xml</f>
        <f>http://feeds.bbci.co.uk/news/world/rss.xml</f>
        <f>http://feeds.bbci.co.uk/news/uk/rss.xml</f>
    </sourceList>
    <sourceList source="reuters">
        <f>http://feeds.reuters.com/reuters/topNews</f>
        <f>http://feeds.reuters.com/news/artsculture</f>
    </sourceList>
</sources>

我想拥有类似嵌套列表的内容,其中最里面的列表将是< f></f> 标记与上面的列表之间的内容来源名称ex. source ="reuters" 将是路透社.从 XML 文档中检索信息不是问题,我正在使用 elementtreenode.get('source') 等循环检索信息.问题我在生成具有所需名称和不同来源所需的不同长度的列表时遇到了麻烦.我尝试附加,但是不确定如何使用检索到的名称附加到列表.字典会更好吗?在这种情况下,最佳做法是什么?我该怎么做呢?如果需要更多信息,请发表评论,我将确保添加它.

I would like to have something like nested lists where the inner most list would be the content between the <f></f> tags and the list above that one would be created with the names of the sources ex. source="reuters" would be reuters. Retrieving the info from the XML doc isn't a problem and I'm doing it with elementtree with loops retrieving with node.get('source') etc. The problem is I'm having trouble generating the lists with the desired names and different lengths required from the different sources. I have tried appending but am unsure how to append to list with the names retrieved. Would a dictionary be better? What would be the best practice in this situation? And how might I make this work? If any more info is required just post a comment and I'll be sure to add it.

推荐答案

根据您的描述,根据源名称的键和根据提要列表的值的字典可能会解决问题.

From your description, a dictionary with keys according to the source name and values according to the feed lists might do the trick.

这是构造这种野兽的一种方法:

Here is one way to construct such a beast:

from lxml import etree
from pprint import pprint

news_sources = {
    source.attrib['source'] : [feed.text for feed in source.xpath('./f')]
    for source in etree.parse('x.xml').xpath('/sources/sourceList')}

pprint(news_sources)

另一个样本,没有 lxml xpath :

import xml.etree.ElementTree as ET
from pprint import pprint

news_sources = {
    source.attrib['source'] : [feed.text for feed in source]
    for source in ET.parse('x.xml').getroot()}

pprint(news_sources)

最后,如果您对列出的内容过敏:

Finally, if you are allergic to list comprehensions:

import xml.etree.ElementTree as ET
from pprint import pprint

xml = ET.parse('x.xml')
root = xml.getroot()
news_sources = {}
for sourceList in root:
    sourceListName = sourceList.attrib['source']
    news_sources[sourceListName] = []
    for feed in sourceList:
       feedName = feed.text
       news_sources[sourceListName].append(feedName)

pprint(news_sources)

这篇关于从XML文档生成嵌套列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆