网络刮蟒蛇不取结果 [英] web scraping python not fetching results
问题描述
有关文本分析一个项目的工作,我想凑一些评论。我使用Python和美丽的汤做的工作。我没有得到任何错误,但没有得到任何的数据也。我相信我在指定div标签犯错。有人能帮忙吗?以下是code,我用:
For a project work on text analytics, I am trying to scrape some reviews. I am using python and beautiful soup to do the job. I am not getting any errors but not getting any data also. I am sure I am making mistake in specifying the div tags. Can someone help? The following is the code which I used:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.zomato.com/brewbot")
soup = BeautifulSoup(r.content)
links = soup.find.all("div")
k_data = soup.find_all({"class":"rev-text"})
for item in k_data:
print item.text
我已经改变了阶级:转文到的tabindex ='0',阶级 - rev.text,包括itemprop=说明等组合...没有什么似乎工作。有人可以帮忙吗?
I have changed "class":"rev-text" to "tabindex='0'", "class"-"rev.text", included the "itemprop"="description", and other combinations...nothing seem to work. Can someone help?
推荐答案
的评论动态加载的从一个POST请求的响应 social_load_more.php
端点。模拟在你的code,获得从JSON响应审查的HTML和 BeautifulSoup
解析它。完整的工作code:
Reviews are dynamically loaded from a response to a POST request to the social_load_more.php
endpoint. Simulate that in your code, get the HTML with reviews from the JSON response and parse it with BeautifulSoup
. Complete working code:
import requests
from bs4 import BeautifulSoup
with requests.Session() as session:
session.headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36"}
r = session.get("https://www.zomato.com/brewbot")
soup = BeautifulSoup(r.content, "html.parser")
itemid = soup.body["itemid"]
# get reviews
r = session.post("https://www.zomato.com/php/social_load_more.php", data={
"entity_id": itemid,
"profile_action": "reviews-top",
"page": "0",
"limit": "5"
})
reviews = r.json()["html"]
soup = BeautifulSoup(reviews, "html.parser")
k_data = soup.select("div.rev-text")
for item in k_data:
print(item.get_text())
这篇关于网络刮蟒蛇不取结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!