beautifulsoup .get_text()不是对我的HTML解析不够具体 [英] beautifulsoup .get_text() is not specific enough for my HTML parsing
问题描述
由于HTML code下面我想输出H1的只是文本,但不是关于详情及NBSP;,这是跨度(这是由H1封装)的文本。
Given the HTML code below I want output just the text of the h1 but not the "Details about ", which is the text of the span (which is encapsulated by the h1).
我的电流输出给:
Details about New Men's Genuine Leather Bifold ID Credit Card Money Holder Wallet Black
我想:
New Men's Genuine Leather Bifold ID Credit Card Money Holder Wallet Black
这里是我一起工作的HTML
Here is the HTML I am working with
<h1 class="it-ttl" itemprop="name" id="itemTitle"><span class="g-hdn">Details about </span>New Men's Genuine Leather Bifold ID Credit Card Money Holder Wallet Black</h1>
下面是我目前的code:
Here is my current code:
for line in soup.find_all('h1',attrs={'itemprop':'name'}):
print line.get_text()
请注意:我不希望只是截断字符串,因为我想这code有一定的重复使用性。
什么是最好的一些code的出露由跨度为界的任何文本。
Note: I do not want to just truncate the string because I would like this code to have some re-usability. What would be best is some code that crops out any text that is bounded by the span.
推荐答案
您可以使用<一个href=\"http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html#Removing%20elements\"><$c$c>extract()$c$c>删除所有跨度
标签:
for line in soup.find_all('h1',attrs={'itemprop':'name'}):
[s.extract() for s in line('span')]
print line.get_text()
# => New Men's Genuine Leather Bifold ID Credit Card Money Holder Wallet Black
这篇关于beautifulsoup .get_text()不是对我的HTML解析不够具体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!