美丽的汤-打印容器文本而不打印子元素的文本 [英] Beautiful Soup - Print a containers text without printing the text of the child elements

查看：68 发布时间：2020/9/20 7:46:28 python html beautifulsoup

本文介绍了美丽的汤-打印容器文本而不打印子元素的文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何在不获取子元素文本的情况下定位容器内的文本?例如，如何定位文本Toshiba Satellite Pro C850-1GR Satellite Pro, 1.8 GHz 在下面的HTML

How can I target text within a container without getting the child elements text too? For example how could I target the text Toshiba Satellite Pro C850-1GR Satellite Pro, 1.8 GHz in the below HTML

我的尝试

short_description=soup.find('div',{'class':'info-item description product-short-desc c_both'}).text
print short_description

HTML

<div id="product-short-summary-wrap">
<b class="tip-anchor tip-anchor-wrap">Short summary description Toshiba Satellite Pro C850-1GR</b>ev
:
<br/>
<div class="tooltip-text">This short summary of the data-sheet.</div>
 Toshiba Satellite Pro C850-1GR Satellite Pro, 1.8 GHz
</div>

推荐答案

选择上方的div元素，然后使用nextSibling:

Select div element that is above and use nextSibling:

from bs4 import BeautifulSoup

html = '<div id="product-short-summary-wrap">\
<b class="tip-anchor tip-anchor-wrap">Short summary description Toshiba Satellite Pro C850-1GR</b>ev\
:\
<br/>\
<div class="tooltip-text">This short summary of the data-sheet.</div>\
 Toshiba Satellite Pro C850-1GR Satellite Pro, 1.8 GHz\
</div>'

soup = BeautifulSoup(html)

text = soup.find("div", {"class":"tooltip-text"})
print text.nextSibling.string

输出:

Toshiba Satellite Pro C850-1GR Satellite Pro, 1.8 GHz

如果div中包含This short summary of the data-sheet，则可以尝试以下操作:

If div has This short summary of the data-sheet in it, then you can try this:

from bs4 import BeautifulSoup

html = '<div id="product-short-summary-wrap">\
<b class="tip-anchor tip-anchor-wrap">Short summary description Toshiba Satellite Pro C850-1GR</b>ev\
:\
<br/>\
<div class="tooltip-text">This short summary of the data-sheet.</div>\
 Toshiba Satellite Pro C850-1GR Satellite Pro, 1.8 GHz\
</div>'

soup = BeautifulSoup(html)

text = soup.find("div", {"class":"tooltip-text"})
if "This short summary of the data-sheet." in text.string:
        print text.nextSibling.string

输出:

Toshiba Satellite Pro C850-1GR Satellite Pro, 1.8 GHz

我认为您在PasteBin中发布了错误的HTML，但是我发现了要剪贴的网站.我不确定到底是哪一页，所以这就是我已经找到并完成的内容.如果您转到此页面您可以在问题中找到相同的HTML部分.我提取文本的代码:

I think you have posted wrong HTML in the PasteBin, but I found which site you want to scrap. I'm not sure which page exactly so here is what I have found and done. If you go to this page you can find same HTML part as in your question. My code to extract text:

import urllib2
from bs4 import BeautifulSoup

url = "http://icecat.biz/p/toshiba/pscbxe-01t01gfr/satellite-pro-notebooks-4051528036589-C8501GR-17411822.html"
html = urllib2.urlopen(url)

soup = BeautifulSoup(html)

texts = soup.findAll("div", {"class":"tooltip-text"})
for text in texts:
    if text.string:
        if "This short summary of the" in text.string:
            print text.nextSibling.string.strip()

输出:

Toshiba C850-1GR Satellite Pro, 1.8 GHz, Intel Celeron, 1000M, 4 GB, DDR3-SDRAM, 1600 MHz

不同之处也是 URL ，输出:

Intel H2312WPFJR, Socket R (2011), Intel, Xeon, 2048 GB, DDR3-SDRAM, 2048 GB

如果需要，可以在找到字符串后将其拆分

If you need you can split string after you find it

这篇关于美丽的汤-打印容器文本而不打印子元素的文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

美丽的汤-打印容器文本而不打印子元素的文本 [英] Beautiful Soup - Print a containers text without printing the text of the child elements

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

美丽的汤-打印容器文本而不打印子元素的文本 [英] Beautiful Soup - Print a containers text without printing the text of the child elements

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭