使用Beautiful Soup从非类部分获取数据 [英] Using Beautiful Soup to get data from non-class section
本文介绍了使用Beautiful Soup从非类部分获取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我还是非常新手,正在学习python和漂亮的汤.我已经迷上了如何从非类HTML中获取文本.
I am still very novice and learning python and beautiful soup. I have gotten hung up on how to get text from a non-class piece of HTML.
这是我正在使用的HTML的代码段:
This is the snippet of HTML I'm working with:
<section class="userbody">
<script type="text/javascript"></script>
<figure class="iw">
<div id="ci">
<img id="iwi" title="image 2" alt="" src="http://images.craigslist.org/00C0C_daJm4U9yU5B_600x450.jpg" style="min-width: inherit; min-height: 450px;"></img>
</div>
<div id="thumbs"></div>
</figure>
<div class="mapAndAttrs">
<div class="mapbox">
<div id="map" class="leaflet-container leaflet-fade-anim" data-longitude="-84.072447" data-latitude="33.908534" tabindex="0">
<div class="leaflet-map-pane" style="transform: translate(0px, 0px);"></div>
<div class="leaflet-control-container">
<div class="leaflet-top leaflet-left"></div>
<div class="leaflet-top leaflet-right"></div>
<div class="leaflet-bottom leaflet-left"></div>
<div class="leaflet-bottom leaflet-right">
<div class="leaflet-control-attribution leaflet-control"></div>
</div>
</div>
</div>
<div class="mapaddress">
Some Address
</div>
</div>
<div class="attributes"></div>
</div>
<section id="postingbody">
some posting info
<br></br>
more posting info
<br></br>
</section>
<section class="cltags"></section>
<div class="postinginfos"></div>
</section>
我已经能够提取地址信息:
I have been able to pull the address information:
for address in soup.findAll("div", { "class" : "mapaddress" }):
addressText = ''.join(address.findAll(text=True))
似乎findAll()对于我没有尝试过的没有类的标签不起作用
It appears findAll() doesn't work for tags that have don't have a class as I tried doing in
for post in soup.findall("section", { "id" : "postingbody" }):
postText = ''.join(post.findAll(text=True))
如何获取id ="postingbody"部分中的文本?
How would grab the text in section id="postingbody"?
推荐答案
考虑到s
是html字符串,您可以执行以下操作:
Well you can do the following, taking into consideration that s
is the html string:
from bs4 import BeautifulSoup
soup = BeautifulSoup(s)
print soup.find(attrs={'id' : 'postingbody'})
输出:
<section id="postingbody">
some posting info
<br/>
more posting info
<br/>
</section>
这篇关于使用Beautiful Soup从非类部分获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文