使用beautifulsoup在标签内搜索文本,并在标签之后返回标签中的文本 [英] Search for text inside a tag using beautifulsoup and returning the text in the tag after it
问题描述
我试图用美丽的汤解析python中的 HTML 代码。我希望能够在标签内搜索文本,例如Color并返回文本下一个标签Slate,mykonos,并为下一个标签执行此操作,以便为给定文本类别我可以返回相应的信息。
然而,我发现很难找到正确的代码来做到这一点。
< h2>详情< / h2>
< div class =section-inner>
< div class =_ UCu>
< h3 class =_ mEu>常规< / h3>
< div class =_ JDu>
< span class =_ IDu>颜色< / span>
< span class =_ KDu> Slate,mykonos< / span>
< / div>
< / div>
< div class =_ UCu>
< h3 class =_ mEu>携带案例< / h3>
< div class =_ JDu>
< span class =_ IDu>类型< / span>
< span class =_ KDu>保护罩< / span>
< / div>
< div class =_ JDu>
< span class =_ IDu>建议使用< / span>
< span class =_ KDu>对于手机< / span>
< / div>
< div class =_ JDu>
< span class =_ IDu>保护< / span>
< span class =_ KDu>影响保护< / span>
< / div>
< div class =_ JDu>
< span class =_ IDu>封面类型< / span>
< span class =_ KDu>封底< / span>
< / div>
< div class =_ JDu>
< span class =_ IDu>功能< / span>
< span class =_ KDu>相机镜头切口,硬壳,涂胶,端口切口,凸起边缘< / span>
< / div>
< / div>
我使用下面的代码检索我的div标签
soup.find_all(div,_JDu)
一旦检索到标签,我可以在其中导航,但找不到正确的代码,这将使我能够在一个标签内查找文本,并在标签后面返回标签中的文本。
b$ b
任何帮助都会真正得到赞赏,因为我是python的新手,并且我已经陷入了死胡同。
def get_txt(soup) ,key):
key_tag = soup.find('span',text = key).parent
return key_tag.find_all('span')[1] .text
颜色='颜色')
print('Color:'+ color)
features = get_txt(soup,'Features')
print('Features:'+ features)
输出:
颜色:石板,我的konos
特点:相机镜头切割,硬壳,橡胶处理,端口切口,凸边
<
说明:
soup.find('span',text = key)
返回< span>
标签,其文本=键
。
.parent
返回当前< span>
标签。
$ b
示例:
当 key ='Color'
, soup.find('span',text = key).parent
将返回
< div class =_ JDu>
< span class =_ IDu>颜色< / span>
< span class =_ KDu> Slate,mykonos< / span>
< / div>
现在我们将它存储在 key_tag
。剩下的只是获得第二个< span>
的文本,这是 key_tag.find_all('span')[1]行。文字
的确如此。
I'm trying to parse the follow HTML code in python using beautiful soup. I would like to be able to search for text inside a tag, for example "Color" and return the text next tag "Slate, mykonos" and do so for the next tags so that for a give text category I can return it's corresponding information.
However, I'm finding it very difficult to find the right code to do this.
<h2>Details</h2>
<div class="section-inner">
<div class="_UCu">
<h3 class="_mEu">General</h3>
<div class="_JDu">
<span class="_IDu">Color</span>
<span class="_KDu">Slate, mykonos</span>
</div>
</div>
<div class="_UCu">
<h3 class="_mEu">Carrying Case</h3>
<div class="_JDu">
<span class="_IDu">Type</span>
<span class="_KDu">Protective cover</span>
</div>
<div class="_JDu">
<span class="_IDu">Recommended Use</span>
<span class="_KDu">For cell phone</span>
</div>
<div class="_JDu">
<span class="_IDu">Protection</span>
<span class="_KDu">Impact protection</span>
</div>
<div class="_JDu">
<span class="_IDu">Cover Type</span>
<span class="_KDu">Back cover</span>
</div>
<div class="_JDu">
<span class="_IDu">Features</span>
<span class="_KDu">Camera lens cutout, hard shell, rubberized, port cut-outs, raised edges</span>
</div>
</div>
I use the following code to retrieve my div tag
soup.find_all("div", "_JDu")
Once I have retrieved the tag I can navigate inside it but I can't find the right code that will enable me to find the text inside one tag and return the text in the tag after it.
Any help would be really really appreciated as I'm new to python and I have hit a dead end.
You can define a function to return the value for the key you enter:
def get_txt(soup, key):
key_tag = soup.find('span', text=key).parent
return key_tag.find_all('span')[1].text
color = get_txt(soup, 'Color')
print('Color: ' + color)
features = get_txt(soup, 'Features')
print('Features: ' + features)
Output:
Color: Slate, mykonos
Features: Camera lens cutout, hard shell, rubberized, port cut-outs, raised edges
I hope this is what you are looking for.
Explanation:
soup.find('span', text=key)
returns the <span>
tag whose text=key
.
.parent
returns the parent tag of the current <span>
tag.
Example:
When key='Color'
, soup.find('span', text=key).parent
will return
<div class="_JDu">
<span class="_IDu">Color</span>
<span class="_KDu">Slate, mykonos</span>
</div>
Now we've stored this in key_tag
. Only thing left is getting the text of second <span>
, which is what the line key_tag.find_all('span')[1].text
does.
这篇关于使用beautifulsoup在标签内搜索文本,并在标签之后返回标签中的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!