BeautifulSoup Div类返回空 [英] BeautifulSoup Div Class returns empty

查看:255
本文介绍了BeautifulSoup Div类返回空的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我检查了类似的问题,但找不到解决方案...

I checked similar questions, but could not find a solution...

我正在尝试从以下页面中刮除额外的旅行时间(46分钟):

I'm trying to scrape the minutes of extra travel time (46) from the following page: https://www.tomtom.com/en_gb/trafficindex/city/istanbul

我尝试了2种方法(Xpath和查找类),但是两种方法都返回空.

I've tried by 2 methods (Xpath & find class), but both give an empty return.

import requests
from bs4 import BeautifulSoup
from lxml.html import fromstring

page = requests.get("https://www.tomtom.com/en_gb/trafficindex/city/istanbul")
tree = fromstring(page.content)

soup = BeautifulSoup(page.content, 'html.parser')



#print([type(item) for item in list(soup.children)])

html = list(soup.children)[2]

g_data = soup.find_all("div", {"class_": "big.ng-binding"})

congestion = tree.xpath("/html/body/div/div[2]/div[2]/div[2]/section[2]/div/div[2]/div/div[2]/div/div[2]/div[1]/div[1]/text()")
print(congestion)
print(len(g_data))

我缺少明显的东西吗?

非常感谢您的帮助!

推荐答案

不幸的是,仅BeautifulSoup不足以实现目标.该网站使用JavaScript生成内容,因此您将不得不使用其他工具,例如Selenium.

Unfortunately BeautifulSoup alone is not enough to accomplish it. The website uses JavaScript to generate content so you will have to use additional tools like for example Selenium.

import bs4 as bs
import re
from selenium import webdriver

url = 'https://www.tomtom.com/en_gb/trafficindex/city/istanbul'

driver = webdriver.Firefox()
driver.get(url)           
html = driver.page_source
soup = bs.BeautifulSoup(html)

我可以看到两种提取额外时间的方法:

I can see two approaches to extract extra time:

1.使用class="text-big ng-binding"查找div.

div = soup.find_all('div', attrs={'class' : 'text-big ng-binding'})
result = div[0].text

2.首先查找包含Per day文本的div,然后向上两个div

2.Finding div containing Per day text first and then going two divs up

div = soup.find_all(text=re.compile('Per day'))
result = div.find_previous('div').find_previous('div').text

这篇关于BeautifulSoup Div类返回空的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆