如何使用 BeautifulSoup 和 Python 获取属性值? [英] How to get an attribute value using BeautifulSoup and Python?

查看:17
本文介绍了如何使用 BeautifulSoup 和 Python 获取属性值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法使用 BeautifulSoup 和 Python 获取属性值.XML 的结构如下:

I'm failing miserably to get an attribute value using BeautifulSoup and Python. Here is how the XML is structured:

...
</total>
<tag>
    <stat fail="0" pass="1">TR=111111 Sandbox=3000613</stat>
    <stat fail="0" pass="1">TR=121212 Sandbox=3000618</stat>
    ...
    <stat fail="0" pass="1">TR=999999 Sandbox=3000617</stat>
</tag>
<suite>
...

我想要获得的是 pass 值,但对于我的一生,我就是不明白该怎么做.我检查了 BeautifulSoup 似乎我应该使用类似的东西stat['pass'],但这似乎不起作用.

What I'm trying to get is the pass value, but for the life of me I just can't understand how to do it. I checked the BeautifulSoup and it seems that I should be using something like stat['pass'], but that doesn't seem to work.

这是我的代码:

with open('../results/output.xml') as raw_resuls:
results = soup(raw_resuls, 'lxml')
for stat in results.find_all('tag'):
            print stat['pass']

如果我执行 results.stat['pass'],它会返回一个位于另一个标签内的值,在 XML blob 中向上.

If I do results.stat['pass'] it returns a value that is within another tag, way up in the XML blob.

如果我打印 stat 变量,我会得到以下信息:

If I print the stat variable I get the following:

<stat fail="0" pass="1">TR=787878 Sandbox=3000614</stat>
...
<stat fail="0" pass="1">TR=888888 Sandbox=3000610</stat>

这似乎没问题.

我很确定我遗漏了什么或做错了什么.我应该在哪里看?我采取了错误的方法吗?

I'm pretty sure that I'm missing something or doing something wrong. Where should I be looking at? Am I taking the wrong approach?

任何建议或指导将不胜感激!谢谢

Any advice or guidance will be greatly appreciated! Thanks

推荐答案

请考虑这种方法:

from bs4 import BeautifulSoup

with open('test.xml') as raw_resuls:
    results = BeautifulSoup(raw_resuls, 'lxml')

for element in results.find_all("tag"):
    for stat in element.find_all("stat"):
        print(stat['pass'])

您的解决方案的问题是 pass 包含在 stat 中,而不是包含在您搜索它的 tag 中.

The problem of your solution is that pass is contained in stat and not in tag where you search for it.

该解决方案搜索所有标签,并在这些标签中搜索stat.根据这些结果,它通过.

This solution searches for all tag and in these tag it searches for stat. From these results it gets pass.

对于 XML 文件

<tag>
    <stat fail="0" pass="1">TR=111111 Sandbox=3000613</stat>
    <stat fail="0" pass="1">TR=121212 Sandbox=3000618</stat>
    <stat fail="0" pass="1">TR=999999 Sandbox=3000617</stat>
</tag>

上面的脚本得到输出

1
1
1

添加

由于某些细节似乎仍然不清楚(请参阅评论),请考虑使用 BeautifulSoup 的完整解决方法来获得您想要的一切.如果您遇到性能问题,这种使用字典作为列表元素的解决方案可能并不完美.但是由于您似乎在使用 Python 和 Soup 时遇到了一些麻烦,我认为我通过提供通过名称而不是索引访问所有相关信息的可能性来尽可能简单地创建这个示例.

Since some detailes still seemed to be unclear (see comments) consider this complete workaround using BeautifulSoup to get everything you want. This solution using dictionaries as elements of lists might not be perfect if you face performance issues. But since you seem to have some troubles using the Python and Soup i thought I create this example as easy as possible by giving the possibility to access all relevant information by name and not by an index.

from bs4 import BeautifulSoup

# Parses a string of form 'TR=abc123 Sandbox=abc123' and stores it in a dictionary with the following
# structure: {'TR': abc123, 'Sandbox': abc123}. Returns this dictionary. 
def parseTestID(testid):
    dict = {'TR': testid.split(" ")[0].split("=")[1], 'Sandbox': testid.split(" ")[1].split("=")[1]}
    return dict

# Parses the XML content of 'rawdata' and stores pass value, TR-ID and Sandbox-ID in a dictionary of the 
# following form: {'Pass': pasvalue, TR': TR-ID, 'Sandbox': Sandbox-ID}. This dictionary is appended to
# a list that is returned.
def getTestState(rawdata):
    # initialize parser
    soup = BeautifulSoup(rawdata,'lxml')
    parsedData= []

    # parse for tags
    for tag in soup.find_all("tag"):
        # parse tags for stat
        for stat in tag.find_all("stat"):
            # store everthing in a dictionary
            dict = {'Pass': stat['pass'], 'TR': parseTestID(stat.string)['TR'], 'Sandbox': parseTestID(stat.string)['Sandbox']}
            # append dictionary to list
            parsedData.append(dict)

    # return list
    return parsedData

你可以使用上面的脚本做任何你想做的事情(例如打印出来)

You can use the script above as follows to do whatever you want (e.g. just print out)

# open file
with open('test.xml') as raw_resuls:
    # get list of parsed data 
    data = getTestState(raw_resuls)

# print parsed data
for element in data:
    print("TR = {0}	Sandbox = {1}	Pass = {2}".format(element['TR'],element['Sandbox'],element['Pass']))

输出看起来像这样

TR = 111111 Sandbox = 3000613   Pass = 1
TR = 121212 Sandbox = 3000618   Pass = 1
TR = 222222 Sandbox = 3000612   Pass = 1
TR = 232323 Sandbox = 3000618   Pass = 1
TR = 333333 Sandbox = 3000605   Pass = 1
TR = 343434 Sandbox = ZZZZZZ    Pass = 1
TR = 444444 Sandbox = 3000604   Pass = 1
TR = 454545 Sandbox = 3000608   Pass = 1
TR = 545454 Sandbox = XXXXXX    Pass = 1
TR = 555555 Sandbox = 3000617   Pass = 1
TR = 565656 Sandbox = 3000615   Pass = 1
TR = 626262 Sandbox = 3000602   Pass = 1
TR = 666666 Sandbox = 3000616   Pass = 1
TR = 676767 Sandbox = 3000599   Pass = 1
TR = 737373 Sandbox = 3000603   Pass = 1
TR = 777777 Sandbox = 3000611   Pass = 1
TR = 787878 Sandbox = 3000614   Pass = 1
TR = 828282 Sandbox = 3000600   Pass = 1
TR = 888888 Sandbox = 3000610   Pass = 1
TR = 999999 Sandbox = 3000617   Pass = 1

让我们总结一下使用的核心元素:

Let's summerize the core elements that are used:

查找 XML 标签要查找 XML 标签,您使用 soup.find("tag") 返回第一个匹配的标签或 soup.find_all("tag") 查找所有匹配的标签和存储他们在一个列表中.通过遍历列表可以轻松访问单个标签.

Finding XML tags To find XML tags you use soup.find("tag") which returns the first matched tag or soup.find_all("tag") which finds all matching tags and stores them in a list. The single tags can easily be accessed by iterating over the list.

查找嵌套标签要查找嵌套标签,您可以再次使用 find()find_all(),将其应用于第一个 find_all() 的结果.

Finding nested tags To find nested tags you can use find() or find_all() again by applying it to the result of the first find_all().

访问标签的内容要访问标签的内容,您可以将 string 应用于单个标签.例如,如果 tag = <tag>I love Soup!</tag> tag.string = "I love Soup!".

Accessing the content of a tag To access the content of a tag you apply string to a single tag. For example if tag = <tag>I love Soup!</tag> tag.string = "I love Soup!".

查找属性值要获取属性的值,您可以使用下标符号.例如,如果 tag = <tag color=red>I love Soup!</tag> tag['color']="red".

Finding values of attributes To get the values of attributes you can use the subscript notation. For example if tag = <tag color=red>I love Soup!</tag> tag['color']="red".

为了解析 "TR=abc123 Sandbox=abc123" 形式的字符串,我使用了常见的 Python 字符串拆分.您可以在此处阅读更多相关信息:我如何拆分并在 Python 中解析一个字符串?

For parsing strings of form "TR=abc123 Sandbox=abc123" I used common Python string splitting. You can read more about it here: How can I split and parse a string in Python?

这篇关于如何使用 BeautifulSoup 和 Python 获取属性值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆