如何提取<span>使用 Beautiful Soup 标记内容? [英] How to extract the <span> tag contents using the Beautiful Soup?

查看:41
本文介绍了如何提取<span>使用 Beautiful Soup 标记内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从谷歌翻译网站中提取 span 标签内容.内容是 id="result_box" 的翻译结果.尝试打印内容时,它返回 None 值.

请在此处查看图片

导入请求从 bs4 导入 BeautifulSoupr = requests.get("https://translate.google.co.in/?rlz=1C1CHZL_enIN729IN729&um=1&ie=UTF-8&hl=en&client=tw-ob#en/fr/good%20 上午")汤 = BeautifulSoup(r.content, "lxml")扳手 = 汤.find(id = "result_box")结果 = 扳手.文本

解决方案

Requests 不执行 JavaScript,你可以使用 seleniumPhantomJS 用于像这样的无头浏览:

from bs4 import BeautifulSoup从硒导入网络驱动程序url = "https://translate.google.co.in/?rlz=1C1CHZL_enIN729IN729&um=1&ie=UTF-8&hl=en&client=tw-ob#en/fr/good%20morning"浏览器 = webdriver.PhantomJS()browser.get(url)html = browser.page_source汤 = BeautifulSoup(html, 'lxml')扳手 = 汤.find(id = "result_box")结果 = 扳手.文本

这给出了我们的预期结果:

<预><代码>>>>结果'你好'

I'm trying to extract the span tag content from the google translate website. The content is the translated result which has the id="result_box". When tried to print the contents, it returns None value.

Please check the Image here

import requests
from bs4 import BeautifulSoup

r = requests.get("https://translate.google.co.in/?rlz=1C1CHZL_enIN729IN729&um=1&ie=UTF-8&hl=en&client=tw-ob#en/fr/good%20morning")

soup = BeautifulSoup(r.content, "lxml")
spanner = soup.find(id = "result_box")

result = spanner.text

解决方案

Requests doesn't execute JavaScript, you could use selenium and PhantomJS for the headless browsing like this:

from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://translate.google.co.in/?rlz=1C1CHZL_enIN729IN729&um=1&ie=UTF-8&hl=en&client=tw-ob#en/fr/good%20morning"
browser = webdriver.PhantomJS()
browser.get(url)
html = browser.page_source

soup = BeautifulSoup(html, 'lxml')
spanner = soup.find(id = "result_box")
result = spanner.text

This gives our expected result:

>>> result
'Bonjour'

这篇关于如何提取&lt;span&gt;使用 Beautiful Soup 标记内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆