BeautifulSoup Div类返回空 [英] BeautifulSoup Div Class returns empty
问题描述
我检查了类似的问题,但找不到解决方案...
I checked similar questions, but could not find a solution...
I'm trying to scrape the minutes of extra travel time (46) from the following page: https://www.tomtom.com/en_gb/trafficindex/city/istanbul
我尝试了2种方法(Xpath和查找类),但是两种方法都返回空.
I've tried by 2 methods (Xpath & find class), but both give an empty return.
import requests
from bs4 import BeautifulSoup
from lxml.html import fromstring
page = requests.get("https://www.tomtom.com/en_gb/trafficindex/city/istanbul")
tree = fromstring(page.content)
soup = BeautifulSoup(page.content, 'html.parser')
#print([type(item) for item in list(soup.children)])
html = list(soup.children)[2]
g_data = soup.find_all("div", {"class_": "big.ng-binding"})
congestion = tree.xpath("/html/body/div/div[2]/div[2]/div[2]/section[2]/div/div[2]/div/div[2]/div/div[2]/div[1]/div[1]/text()")
print(congestion)
print(len(g_data))
我缺少明显的东西吗?
非常感谢您的帮助!
推荐答案
不幸的是,仅BeautifulSoup
不足以实现目标.该网站使用JavaScript生成内容,因此您将不得不使用其他工具,例如Selenium
.
Unfortunately BeautifulSoup
alone is not enough to accomplish it. The website uses JavaScript to generate content so you will have to use additional tools like for example Selenium
.
import bs4 as bs
import re
from selenium import webdriver
url = 'https://www.tomtom.com/en_gb/trafficindex/city/istanbul'
driver = webdriver.Firefox()
driver.get(url)
html = driver.page_source
soup = bs.BeautifulSoup(html)
我可以看到两种提取额外时间的方法:
I can see two approaches to extract extra time:
1.使用class="text-big ng-binding"
查找div
.
div = soup.find_all('div', attrs={'class' : 'text-big ng-binding'})
result = div[0].text
2.首先查找包含Per day
文本的div
,然后向上两个div
2.Finding div
containing Per day
text first and then going two divs up
div = soup.find_all(text=re.compile('Per day'))
result = div.find_previous('div').find_previous('div').text
这篇关于BeautifulSoup Div类返回空的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!