使用Selenium Python进行Web爬虫[Twitter + Instagram] [英] Web Scraping with Selenium Python [Twitter + Instagram]

查看：201 发布时间：2020/5/23 22:25:23 python pandas twitter web-scraping instagram

本文介绍了使用Selenium Python进行Web爬虫[Twitter + Instagram]的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试根据地理位置在网上抓取Instagram和Twitter. 我可以运行查询搜索，但是在将网页重新加载到更多内容并将字段存储到数据框中时遇到了挑战.

I am trying to web scrape both Instagram and Twitter based on geolocation. I can run a query search but I am having challenges in reloading the web page to to more and store the fields to data-frame.

我确实找到了几个示例，这些示例用于在没有API密钥的情况下抓取twitter和Instagram.但是它们与#tags关键字有关.

I did find couple of examples for web scraping twitter and Instagram without API keys. But they are with respect to #tags keywords.

我正在尝试就地理位置和旧日期之间进行探讨.到目前为止，我已经在python 3.X中编写了代码，并在anaconda中使用了所有最新版本的软件包.

I am trying to scrape with respect to geo location and between old dates. so far I have come this far with writing code in python 3.X and all the latest versions of packages in anaconda.

'''
    Instagram - Components
    "id": "1478232643287060472", 
     "dimensions": {"height": 1080, "width": 1080}, 
     "owner": {"id": "351633262"}, 
     "thumbnail_src": "https://instagram.fdel1-1.fna.fbcdn.net/t51.2885-15/s640x640/sh0.08/e35/17439262_973184322815940_668652714938335232_n.jpg", 
     "is_video": false, 
     "code": "BSDvMHOgw_4", 
     "date": 1490439084, 
     "taken-at=213385402"
     "display_src": "https://instagram.fdel1-1.fna.fbcdn.net/t51.2885-15/e35/17439262_973184322815940_668652714938335232_n.jpg", 
     "caption": "Hakuna jambo zuri kama kumpa Mungu shukrani kwa kila jambo.. \ud83d\ude4f\ud83c\udffe\nIts weekend\n#lifeistooshorttobeunhappy\n#Godisgood \n#happysoul \ud83d\ude00", 
     "comments": {"count": 42}, 
     "likes": {"count": 3813}}, 
'''


import selenium
from selenium import webdriver
#from selenium import selenium
from bs4 import BeautifulSoup
import pandas

#geotags = pd.read_csv("geocodes.csv")
#parmalink = 
query = geocode%3A35.68501%2C139.7514%2C30km%20since:2016-03-01%20until:2016-03-02&f=tweets

twitterURL = 'https://twitter.com/search?q=' + query
#instaURL = "https://www.instagram.com/explore/locations/213385402/"


browser = webdriver.Firefox()
browser.get(twitterURL)
content = browser.page_source

soup = BeautifulSoup(content)
print (soup)

对于Twitter搜索查询，我遇到语法错误

For Twitter Search Query I am getting syntax error

对于Instagram，我没有收到任何错误，但无法重新加载更多帖子并写回csv数据框.

For Instagram I am not getting any error but I am not able to reload for more posts and write back to csv dataframe.

我也在尝试在Twitter和Instagram中使用经度和纬度搜索.

I am also trying to search with latitude and longitude search in both Twitter and Instagram.

我在csv中有一个地理坐标列表，我可以使用该输入或编写查询查询.

I have a list of geo coordinates in csv I can use that input or can write a query for search.

通过位置完成抓取的任何方法将不胜感激.

Any way to complete the scraping with location will be appreciated.

感谢帮助！

使用Selenium Python进行Web爬虫[Twitter + Instagram] [英] Web Scraping with Selenium Python [Twitter + Instagram]

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用Selenium Python进行Web爬虫[Twitter + Instagram] [英] Web Scraping with Selenium Python [Twitter + Instagram]

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭