与beautifulsoup和python提取标签信息 [英] Extracting tag information with beautifulsoup and python

查看：272 发布时间：2016/8/5 19:14:44 python xml parsing beautifulsoup

本文介绍了与beautifulsoup和python提取标签信息的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

说我有一些XML像

<item name=bread weight="5" edible="yes">
<body> some blah </body>
<item>

<item name=eggs weight="5" edible="yes">
<body> some blah </body>
<item>

<item name=meat weight="5" edible="yes">
<body> some blah </body>
<item>

我想每个项目的名称存储在使用美丽的汤列表

I want to store the name of each item in a list using beautiful soup

下面是尝试至今：

names =list()

for c in soup.findAll("item"):
    #get name from the tag
        names.append(name i got from tag)

这个方法很好工作了标签之间提取文本。

This method has worked perfectly for extracting text between tags.

我试图复制用于提取链接＆LT方法; A HREF =www.blah.com＆GT; ，但它似乎并没有工作。

I've tried copying the methods used for extracting links <a href="www.blah.com"> but it doesn't seem to work.

我将如何存储在一个列表中的名称信息？（其他列表包含正文所以关联性的原因索引必须是一致的）。

How would I store the name information in a list? (other lists contain the body text so for associativity reasons the indexes have to be consistent).

非常感谢

推荐答案

使用字典（item.attrs）获得（'名'）来获取名称。

您遇到因为问题＆LT;项目＆GT; 应该是关闭标签，但它是一个开放的标签，因此你拿到6场比赛，而不是3.如果您有超过文字的任何控制，请使用结束标记来避免这个问题。

You are having issues since <item> is supposed to be a closing tag but it is an opening tag, hence you get 6 matches rather than 3. If you have any control over the text, please use closing tags to avoid this.

下面是完整的片段如预期运行：

Here is the full snippet working as intended:

names = list()

for item in soup.findAll('item'):
    name = dict(item.attrs).get('name')
    if name is not None:
        names.append(name)

这篇关于与beautifulsoup和python提取标签信息的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

与beautifulsoup和python提取标签信息 [英] Extracting tag information with beautifulsoup and python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

与beautifulsoup和python提取标签信息 [英] Extracting tag information with beautifulsoup and python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭