使用BeautifulSoup导航到第二个字符串文本 [英] Navigating to second string text using BeautifulSoup

查看:75
本文介绍了使用BeautifulSoup导航到第二个字符串文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是lxml,它另存为sample.html.

Here is the lxml, it's saved as sample.html.

<html> 
    <body> 
    <div class ="ecopyramid"> 
        <ul id ="producers"> 
            <li class ="producerlist"> 
                <div class ="name">A1</div> 
                <div class ="number">100000</div> 
            </li> 
            <li class ="producerlist"> 
                <div class ="name">B1</div> 
                <div class ="number">100000</div> 
            </li> 
        </ul> 
        <ul id ="primaryconsumers"> 
            <li class ="primaryconsumerlist"> 
                <div class ="name">A2</div> 
                <div class ="number">1000</div> 
            </li> 
            <li class ="primaryconsumerlist"> 
                <div class ="name">B2</div> 
                <div class ="number">2000</div> 
            </li> 
        </ul> 
        <ul id ="secondaryconsumers"> 
            <li class ="secondaryconsumerlist"> 
                <div class ="name">A3</div> 
                <div class ="number">100</div> 
            </li>

            <li class ="secondaryconsumerlist"> 
                <div class ="name">B3</div> 
                <div class ="number">98</div>
            </li> 
        </ul> 
        <ul id ="tertiaryconsumers"> 
            <li class ="tertiaryconsumerlist"> 
                <div class ="name">A4</div> 
                <div class ="number">80</div> 
            </li> 
            <li class ="tertiaryconsumerlist"> 
                <div class ="name">B4</div> 
                <div class ="number">50</div> 
            </li> 
        </ul> 
    </body> 
</html>

这是在上面的sample.html中导航的代码:

Here is the code to navigate through the sample.html above:

from bs4 import BeautifulSoup

with open("sample.html", "r") as sample_pyramid:
    soup=BeautifulSoup(sample_pyramid, "lxml")

soup_object = soup.find("ul", id="secondaryconsumers")
print soup_object.li.div.string

因此在此代码中,我能够首先通过标签"ul"和id"secondaryconsumers"指定文本"A3"的父位置,然后在打印命令中通过".li.div"进一步指定".string"后缀并输出所需的文本"A3".我的问题如下:

So in this code I am able to first specify the parent location of the text "A3" first by the tag "ul" and id "secondaryconsumers", then in the print command I specify further by the ".li.div.string" suffix and output the desired text of "A3". My questions are as follows:

1)在此示例中,我该如何编码才能调用/打印文本"B3"?

1) How do I code in order to call/print the text "B3" in this example?

2)在此示例中,我该如何编码才能调用/打印文本"98"(在"B3"下方)?

2) How do I code in order to call/print the text "98" (below "B3") in this example?

我尝试了很多事情都没有成功,我可以通过导航调用第一个文本对象,但是不能调用共享标签中的第二个文本对象.

I have tried many things with no success, I am able to call the first text object through the navigation, but not the second text object within the shared tags.

有什么想法吗?

推荐答案

您可以使用 CSS选择器以获取名称和数字:

You can use CSS selectors to get names and numbers:

names = soup.select('ul#secondaryconsumers > li.secondaryconsumerlist > div.name')
numbers = soup.select('ul#secondaryconsumers > li.secondaryconsumerlist > div.number')

print [name.text for name in names]
print [number.text for number in numbers]

打印:

[u'A3', u'B3']
[u'100', u'98']


注释中后续问题的示例代码:


Example code for the follow-up question in comments:

from bs4 import BeautifulSoup


data = """
<div class="span9">
    <table class="result-data table" border="0">
        <tbody>
        <tr class="result-item highlighting">
            <td class="result-category" scope="row">Name:</td>
            <td class="result-value-bold" colspan="4" itemprop="item">
                Robin Hood
            </td>
        </tr>
        </tbody>
    </table>
</div>
"""

soup = BeautifulSoup(data)
print soup.find('td', class_="result-value-bold").get_text(strip=True)

打印Robin Hood.

或者,或者首先找到父tabletr:

Or, alternatively first find parent table and tr:

table = soup.find('table', class_='result-data')
tr = table.find('tr', class_='result-item')
print tr.find('td', class_="result-value-bold").get_text(strip=True)

这篇关于使用BeautifulSoup导航到第二个字符串文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆