使用 ElementTree 解析 XML 时如何获取子节点的文本值 [英] How do I pick up text values of child nodes when parsing XML with ElementTree

查看：51 发布时间：2021/10/2 18:43:36 python-3.x pandas beautifulsoup xml-parsing elementtree

本文介绍了使用 ElementTree 解析 XML 时如何获取子节点的文本值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含一堆产品的 XML 购物提要，见下文.如果我用漂亮的汤来解析它，以创建一个熊猫数据框，我会使用这样的东西:

I have an XML shopping feed with a bunch of products, see below. If I'd parse this with beautiful soup, to create a pandas dataframe, I'd use something like this:

def parse_shopping_feed(feed_xml):
    #response = requests.get(feed_url)
    soup = BeautifulSoup(feed_xml, "xml")
    all_products = []
    for item in soup.find_all("item"):
        new_product = {
            "id": item.id.string,
            "title": item.title.string,
            "description": item.description.string,
            "google_product_category": item.google_product_category.string,
            "product_type": item.product_type.string if  "product_type" in item else "",
            "link": item.link.string,
            "availability": item.availability.string,
            "price": item.price.string,
            "brand": item.brand.string
        }
        all_products.append(new_product)
    feed_df = pd.DataFrame(all_products)
    return feed_df

现在，Beautiful Soup 对于其中一个提要(大约 300mbs)来说太慢了，所以已经开始研究应该更快的 ElementTree.但是，我终其一生都无法弄清楚我会用 ET 重新创建此代码.

Now, Beautiful Soup is too slow for one of these feeds (around 300mbs) so have started looking at ElementTree which is supposed to be faster. However I can't for the life of me figure out I would recreate this code with ET.

例如，如何遍历所有项目标签并获取它们的 ID 和标题?

How do I loop through all of the item tags and grap their ID and title for example?

我目前最好的猜测是这样的，但我不知道如何获取每个 ID 和标题.

My current best guess is something like this, but I don't get how pick up each ID and title.

xml_file = open('shopping_feed.xml')
for event, element in ET.iterparse(xml_file, events=None):
    for child in element:
        print(child)
    element.clear()

有什么建议吗?明确地说，我的最终目标是数据框，所以如果有一种方法可以直接转换它，那就太好了！

Any suggestions? To be clear, my end goal is the dataframe, so if there's a way to just convert it directly that'd be great!

<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
<channel>
    <title>Feed XYZ</title>
    <description></description>
    <link></link>
    <item>
        <g:id>10000005</g:id>
        <title><![CDATA[TEst Item XYZ                                           ]]></title>
        <g:google_product_category>Food and stuff</g:google_product_category>
        <g:product_type><![CDATA[Details &gt; Food and stuff]]></g:product_type>
        <g:adwords_grouping><![CDATA[Food and stuff]]></g:adwords_grouping>
        <link>https://www.abc.se/abc/abc</link>
        <g:image_link>https://www.abc.se/bilder/artiklar/10000005.jpg</g:image_link>
        <g:additional_image_link>https://www.abc.se/bilder/artiklar/zoom/10000005_1.jpg</g:additional_image_link>
        <g:condition>new</g:condition>
        <g:availability>out of stock</g:availability>
        <g:price>155 SEK</g:price>
        <g:buyprice>68.00</g:buyprice>
        <g:brand>ABC</g:brand>
        <g:gtin>8003299920846</g:gtin>
        <g:mpn>ABC01 AZ</g:mpn> 
        <g:weight>0 g</g:weight> 
        <g:item_group_id>10000008r</g:item_group_id>
        <g:color>Blue</g:color>
//100s of thousand of products

使用 ElementTree 解析 XML 时如何获取子节点的文本值 [英] How do I pick up text values of child nodes when parsing XML with ElementTree

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用 ElementTree 解析 XML 时如何获取子节点的文本值 [英] How do I pick up text values of child nodes when parsing XML with ElementTree

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭