网络刮蟒蛇不取结果 [英] web scraping python not fetching results

查看：109 发布时间：2016/8/5 19:08:28 python html beautifulsoup

本文介绍了网络刮蟒蛇不取结果的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有关文本分析一个项目的工作，我想凑一些评论。我使用Python和美丽的汤做的工作。我没有得到任何错误，但没有得到任何的数据也。我相信我在指定div标签犯错。有人能帮忙吗？以下是code，我用：

For a project work on text analytics, I am trying to scrape some reviews. I am using python and beautiful soup to do the job. I am not getting any errors but not getting any data also. I am sure I am making mistake in specifying the div tags. Can someone help? The following is the code which I used:

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.zomato.com/brewbot")
soup = BeautifulSoup(r.content)
links = soup.find.all("div")
k_data = soup.find_all({"class":"rev-text"})

for item in k_data:
    print item.text

我已经改变了阶级：转文到的tabindex ='0'，阶级 - rev.text，包括itemprop=说明等组合...没有什么似乎工作。有人可以帮忙吗？

I have changed "class":"rev-text" to "tabindex='0'", "class"-"rev.text", included the "itemprop"="description", and other combinations...nothing seem to work. Can someone help?

推荐答案

的评论动态加载的从一个POST请求的响应 social_load_more.php 端点。模拟在你的code，获得从JSON响应审查的HTML和 BeautifulSoup 解析它。完整的工作code：

Reviews are dynamically loaded from a response to a POST request to the social_load_more.php endpoint. Simulate that in your code, get the HTML with reviews from the JSON response and parse it with BeautifulSoup. Complete working code:

import requests
from bs4 import BeautifulSoup


with requests.Session() as session:
    session.headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36"}
    r = session.get("https://www.zomato.com/brewbot")
    soup = BeautifulSoup(r.content, "html.parser")
    itemid = soup.body["itemid"]

    # get reviews
    r = session.post("https://www.zomato.com/php/social_load_more.php", data={
        "entity_id": itemid,
        "profile_action": "reviews-top",
        "page": "0",
        "limit": "5"
    })
    reviews = r.json()["html"]

    soup = BeautifulSoup(reviews, "html.parser")
    k_data = soup.select("div.rev-text")

    for item in k_data:
        print(item.get_text())

这篇关于网络刮蟒蛇不取结果的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

网络刮蟒蛇不取结果 [英] web scraping python not fetching results

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

网络刮蟒蛇不取结果 [英] web scraping python not fetching results

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭