如何使用python为所有数据抓取谷歌地图 [英] How to scrape google maps for all data using python

查看:64
本文介绍了如何使用python为所有数据抓取谷歌地图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 python 从谷歌地图中抓取某个地点的标题、电话号码、网站、地址、评级、评论数量.例如,Pike's Landing 餐厅(请参阅下面的谷歌地图 URL)需要所有信息.我想在 python 中提取它们.

I am trying to scrape the title, phone number, website, address, rating, number of reviews of a place from google maps using python. For example, the restaurant Pike's Landing (see google maps URL below) needs all the information. I want to pull those in python.

网址:https://www.google.com/maps?cid=15423079754231040967&hl=en

我在检查时可以看到 HTML 代码,但是当我使用漂亮的汤进行抓取时,所有代码都被转换了.从堆栈溢出中,我找到了一个解决方案,唯一的审查次数如下代码,

I can see HTML code when I inspect but when I have used beautiful soup for scraping all codes are converted. From stack overflow, I have found a solution for the only number of review as following code,

import re
import requests
from ast import literal_eval

urls = [
'https://www.google.com/maps?cid=15423079754231040967&hl=en',
'https://www.google.com/maps?cid=16168151796978303235&hl=en']

for url in urls:
    for g in re.findall(r'\[\\"http.*?\d+ reviews?.*?]', requests.get(url).text):
        data = literal_eval(g.replace('null', 'None').replace('\\"', '"'))
        print(bytes(data[0], 'utf-8').decode('unicode_escape'))
        print(data[1])

但我需要所有数据.我可以使用 Google Maps API 获取实际数据,但现在获取电话号码、评分、评论不是免费的.所以我想从前端转义数据.

But I need all the data. I can use Google Maps API to actual data but getting phone number, rating, review is not free now. So that I want to escape data from the frontend.

请帮帮我.

推荐答案

我很久以前在 reddit 上问过同样的问题.我最终自己解决了这个问题,看看这个 注意 -这是严格编写的,以提取我的用例的详细信息,但您可以了解这里发生的事情的要点.

I asked the same question a long time ago on reddit. I ended up solving it myself, have a look at this NOTE - this was strictly written to extract details for my use case but you can get a gist of what's going on here.

from selenium import webdriver

options = webdriver.ChromeOptions()

options.add_argument('headless')



browser = webdriver.Chrome(options=options)



url = "https://www.google.com/maps/place/Papa+John's+Pizza/@40.7936551,-74.0124687,17z/data=!3m1!4b1!4m5!3m4!1s0x89c2580eaa74451b:0x15d743e4f841e5ed!8m2!3d40.7936551!4d-74.0124687"

# url = "https://www.google.com/maps/place/Lucky+Dhaba/@30.653792,76.8165233,17z/data=!3m1!4b1!4m5!3m4!1s0x390feb3e3de1a031:0x862036ab85567f75!8m2!3d30.653792!4d76.818712"



browser.get(url)



# review titles / username / Person who reviews

review_titles = browser.find_elements_by_class_name("section-review-title")

print([a.text for a in review_titles])

# review text / what did they think

review_text = browser.find_elements_by_class_name("section-review-review-content")

print([a.text for a in review_text])

# get the number of stars

stars = browser.find_elements_by_class_name("section-review-stars")

first_review_stars = stars[0]

active_stars = first_review_stars.find_elements_by_class_name("section-review-star-active")

print(f"the stars the first review got was {len(active_stars)}")

这篇关于如何使用python为所有数据抓取谷歌地图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆