网络刮蟒蛇不取结果 [英] web scraping python not fetching results

查看:109
本文介绍了网络刮蟒蛇不取结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有关文本分析一个项目的工作,我想凑一些评论。我使用Python和美丽的汤做的工作。我没有得到任何错误,但没有得到任何的数据也。我相信我在指定div标签犯错。有人能帮忙吗?以下是code,我用:

For a project work on text analytics, I am trying to scrape some reviews. I am using python and beautiful soup to do the job. I am not getting any errors but not getting any data also. I am sure I am making mistake in specifying the div tags. Can someone help? The following is the code which I used:

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.zomato.com/brewbot")
soup = BeautifulSoup(r.content)
links = soup.find.all("div")
k_data = soup.find_all({"class":"rev-text"})

for item in k_data:
    print item.text

我已经改变了阶级:转文到的tabindex ='0',阶级 - rev.text,包括itemprop=说明等组合...没有什么似乎工作。有人可以帮忙吗?

I have changed "class":"rev-text" to "tabindex='0'", "class"-"rev.text", included the "itemprop"="description", and other combinations...nothing seem to work. Can someone help?

推荐答案

评论动态加载的从一个POST请求的响应 social_load_more.php 端点。模拟在你的code,获得从JSON响应审查的HTML和 BeautifulSoup 解析它。完整的工作code:

Reviews are dynamically loaded from a response to a POST request to the social_load_more.php endpoint. Simulate that in your code, get the HTML with reviews from the JSON response and parse it with BeautifulSoup. Complete working code:

import requests
from bs4 import BeautifulSoup


with requests.Session() as session:
    session.headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36"}
    r = session.get("https://www.zomato.com/brewbot")
    soup = BeautifulSoup(r.content, "html.parser")
    itemid = soup.body["itemid"]

    # get reviews
    r = session.post("https://www.zomato.com/php/social_load_more.php", data={
        "entity_id": itemid,
        "profile_action": "reviews-top",
        "page": "0",
        "limit": "5"
    })
    reviews = r.json()["html"]

    soup = BeautifulSoup(reviews, "html.parser")
    k_data = soup.select("div.rev-text")

    for item in k_data:
        print(item.get_text())

这篇关于网络刮蟒蛇不取结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆