使用beautifulsoup在标签内搜索文本,并在标签之后返回标签中的文本 [英] Search for text inside a tag using beautifulsoup and returning the text in the tag after it

查看:111
本文介绍了使用beautifulsoup在标签内搜索文本,并在标签之后返回标签中的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用美丽的汤解析python中的 HTML 代码。我希望能够在标签内搜索文本,例如Color并返回文本下一个标签Slate,mykonos,并为下一个标签执行此操作,以便为给定文本类别我可以返回相应的信息。

然而,我发现很难找到正确的代码来做到这一点。

 < h2>详情< / h2> 
< div class =section-inner>
< div class =_ UCu>
< h3 class =_ mEu>常规< / h3>
< div class =_ JDu>
< span class =_ IDu>颜色< / span>
< span class =_ KDu> Slate,mykonos< / span>
< / div>
< / div>
< div class =_ UCu>
< h3 class =_ mEu>携带案例< / h3>
< div class =_ JDu>
< span class =_ IDu>类型< / span>
< span class =_ KDu>保护罩< / span>
< / div>
< div class =_ JDu>
< span class =_ IDu>建议使用< / span>
< span class =_ KDu>对于手机< / span>
< / div>
< div class =_ JDu>
< span class =_ IDu>保护< / span>
< span class =_ KDu>影响保护< / span>
< / div>
< div class =_ JDu>
< span class =_ IDu>封面类型< / span>
< span class =_ KDu>封底< / span>
< / div>
< div class =_ JDu>
< span class =_ IDu>功能< / span>
< span class =_ KDu>相机镜头切口,硬壳,涂胶,端口切口,凸起边缘< / span>
< / div>
< / div>

我使用下面的代码检索我的div标签

  soup.find_all(div,_JDu)

一旦检索到标签,我可以在其中导航,但找不到正确的代码,这将使我能够在一个标签内查找文本,并在标签后面返回标签中的文本。

b
$ b

任何帮助都会真正得到赞赏,因为我是python的新手,并且我已经陷入了死胡同。



  def get_txt(soup)

,key):
key_tag = soup.find('span',text = key).parent
return key_tag.find_all('span')[1] .text

颜色='颜色')
print('Color:'+ color)
features = get_txt(soup,'Features')
print('Features:'+ features)

输出:

 颜色:石板,我的konos 
特点:相机镜头切割,硬壳,橡胶处理,端口切口,凸边



<



说明:

soup.find('span',text = key)返回< span> 标签,其文本=键

.parent 返回当前< span> 标签。


$ b

示例:

key ='Color' soup.find('span',text = key).parent 将返回

 < div class =_ JDu> 
< span class =_ IDu>颜色< / span>
< span class =_ KDu> Slate,mykonos< / span>
< / div>

现在我们将它存储在 key_tag 。剩下的只是获得第二个< span> 的文本,这是 key_tag.find_all('span')[1]行。文字的确如此。


I'm trying to parse the follow HTML code in python using beautiful soup. I would like to be able to search for text inside a tag, for example "Color" and return the text next tag "Slate, mykonos" and do so for the next tags so that for a give text category I can return it's corresponding information.

However, I'm finding it very difficult to find the right code to do this.

<h2>Details</h2>
<div class="section-inner">
    <div class="_UCu">
        <h3 class="_mEu">General</h3>
        <div class="_JDu">
            <span class="_IDu">Color</span>
            <span class="_KDu">Slate, mykonos</span>
        </div>
    </div>
    <div class="_UCu">
        <h3 class="_mEu">Carrying Case</h3>
        <div class="_JDu">
            <span class="_IDu">Type</span>
            <span class="_KDu">Protective cover</span>
        </div>
        <div class="_JDu">
            <span class="_IDu">Recommended Use</span>
            <span class="_KDu">For cell phone</span>
        </div>
        <div class="_JDu">
            <span class="_IDu">Protection</span>
            <span class="_KDu">Impact protection</span>
        </div>
        <div class="_JDu">
            <span class="_IDu">Cover Type</span>
            <span class="_KDu">Back cover</span>
        </div>
        <div class="_JDu">
            <span class="_IDu">Features</span>
            <span class="_KDu">Camera lens cutout, hard shell, rubberized, port cut-outs, raised edges</span>
        </div>
    </div>

I use the following code to retrieve my div tag

soup.find_all("div", "_JDu")

Once I have retrieved the tag I can navigate inside it but I can't find the right code that will enable me to find the text inside one tag and return the text in the tag after it.

Any help would be really really appreciated as I'm new to python and I have hit a dead end.

解决方案

You can define a function to return the value for the key you enter:

def get_txt(soup, key):
    key_tag = soup.find('span', text=key).parent
    return key_tag.find_all('span')[1].text

color = get_txt(soup, 'Color')
print('Color: ' + color)
features = get_txt(soup, 'Features')
print('Features: ' + features)

Output:

Color: Slate, mykonos
Features: Camera lens cutout, hard shell, rubberized, port cut-outs, raised edges

I hope this is what you are looking for.

Explanation:

soup.find('span', text=key) returns the <span> tag whose text=key.

.parent returns the parent tag of the current <span> tag.

Example:

When key='Color', soup.find('span', text=key).parent will return

<div class="_JDu">
    <span class="_IDu">Color</span>
    <span class="_KDu">Slate, mykonos</span>
</div>

Now we've stored this in key_tag. Only thing left is getting the text of second <span>, which is what the line key_tag.find_all('span')[1].text does.

这篇关于使用beautifulsoup在标签内搜索文本,并在标签之后返回标签中的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆