soup.findAll返回空列表 [英] soup.findAll returning empty list

查看:168
本文介绍了soup.findAll返回空列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图刮汤,并在我打电话给findAll时获得一个空碗

I am trying to scrape with soup and am obtaining an empty set when I call findAll

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url='https://www.sainsburys.co.uk/webapp/wcs/stores/servlet/SearchDisplayView?catalogId=10123&langId=44&storeId=10151&krypto=70KutR16JmLgr7Ka%2F385RFXrzDpOkSqx%2FRC3DnlU09%2BYcw0pR5cfIfC0kOlQywiD%2BTEe7ppq8ENXglbpqA8sDUtif1h3ZjrEoQkV29%2B90iqljHi2gm2T%2BDZHH2%2FCNeKB%2BkVglbz%2BNx1bKsSfE5L6SVtckHxg%2FM%2F%2FVieWp8vgaJTan0k1WrPjCrVuDs5WnbRN#langId=44&storeId=10151&catalogId=10123&categoryId=&parent_category_rn=&top_category=&pageSize=60&orderBy=RELEVANCE&searchTerm=milk&beginIndex=0&hideFilters=true&categoryFacetId1='

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html,'html.parser')

containers = page_soup.findAll("div",{"class":"product"}) 
containers

我还从这些文章中得到了空的数据集: findAll返回的html为空

I also got empty datasets from these articles: findAll returning empty for html

BeautifulSoup find_all()不返回任何数据

任何人都可以提供任何帮助吗?

Can anyone offer any help?

推荐答案

页面内容已加载javascript,因此您不能仅使用 BeautifulSoup 进行解析.您必须使用另一个模块,例如 selenium 来模拟javacript执行.

The page content is loaded with javascript, so you can't just use BeautifulSoup to parse it. You have to use another module like selenium to simulate javacript execution.

这是一个例子:

from bs4 import BeautifulSoup as soup
from selenium import webdriver

url='https://www.sainsburys.co.uk/webapp/wcs/stores/servlet/SearchDisplayView?catalogId=10123&langId=44&storeId=10151&krypto=70KutR16JmLgr7Ka%2F385RFXrzDpOkSqx%2FRC3DnlU09%2BYcw0pR5cfIfC0kOlQywiD%2BTEe7ppq8ENXglbpqA8sDUtif1h3ZjrEoQkV29%2B90iqljHi2gm2T%2BDZHH2%2FCNeKB%2BkVglbz%2BNx1bKsSfE5L6SVtckHxg%2FM%2F%2FVieWp8vgaJTan0k1WrPjCrVuDs5WnbRN#langId=44&storeId=10151&catalogId=10123&categoryId=&parent_category_rn=&top_category=&pageSize=60&orderBy=RELEVANCE&searchTerm=milk&beginIndex=0&hideFilters=true&categoryFacetId1='

driver = webdriver.Firefox()
driver.get(url)

page = driver.page_source
page_soup = soup(page,'html.parser')

containers = page_soup.findAll("div",{"class":"product"})
print(containers)
print(len(containers))

输出:

[
<div class="product "> ...
...,
<div class="product hl-product hookLogic highlighted straplineRow" ...    
]

64

这篇关于soup.findAll返回空列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆