无法抓取此站点.如何从本网站抓取数据? [英] Unable to scrape this site. How to scrape data from this site?

查看:69
本文介绍了无法抓取此站点.如何从本网站抓取数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法从该站点抓取数据.

Iam not able to scrape data from this site.

我尝试过其他网站,但其他网站没问题...

I tried with other sites but it's ok with other sites...

from bs4 import BeautifulSoup
from urllib.request import urlopen

response = urlopen("https://www.daraz.com.np/catalog/?spm=a2a0e.searchlistcategory.search.2.3eac4b8amQJ0zd&q=samsung%20m20&_keyori=ss&from=suggest_normal&sugg=samsung%20m20_1_1")

html = response.read()

parsed_html = BeautifulSoup(html, "html.parser")

containers = parsed_html.find_all("div", {"class" : "c2prKC"})

print(len(containers))

推荐答案

加载后看起来像JS渲染到页面.可以使用Selenium来渲染页面和美汤获取元素.

Look like JS render to page after loading .You can use Selenium to render the page and beautiful soup to get the element.

from bs4 import BeautifulSoup
from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get("https://www.daraz.com.np/catalog/?spm=a2a0e.searchlistcategory.search.2.3eac4b8amQJ0zd&q=samsung%20m20&_keyori=ss&from=suggest_normal&sugg=samsung%20m20_1_1")
time.sleep(5)

html = driver.page_source

parsed_html = BeautifulSoup(html, "html.parser")

containers = parsed_html.find_all("div", {"class" : "c2prKC"})

print(len(containers))

这篇关于无法抓取此站点.如何从本网站抓取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆