如何使用 oauth2 为网站构建 Python 爬虫 [英] How to build a Python crawler for websites using oauth2
问题描述
我是网络编程新手.我想构建一个爬虫,用于通过 Python 爬取 Foursquare 中的社交图.我通过使用 apiv2
库获得了一个手动"控制的爬虫.主要方法如下:
I'm new in web programming. I want to build a crawler for crawling the social graph in Foursquare by Python.
I've got a "manually" controlled crawler by using the apiv2
library. The main method is like:
def main():
CODE = "******"
url = "https://foursquare.com/oauth2/authenticate?client_id=****&response_type=code&redirect_uri=****"
key = "***"
secret = "****"
re_uri = "***"
auth = apiv2.FSAuthenticator(key, secret, re_uri)
auth.set_token(code)
finder = apiv2.UserFinder(auth)
#DO SOME REQUIRES By USING THE FINDER
finder.finde(ANY_USER_ID).mayorships()
bla bla bla
问题是,目前我必须在我的浏览器中输入 URL 并从重定向 URL 中提取 CODE,然后在我的程序中更新 CODE,并再次运行它.我认为可能有某种方法可以将 CODE 编码到我当前的程序中并使其自动进行.
The problem is that at present, I have to type the URL in my browser and pick up the CODE from the redirect URL, and then update the CODE in my program, and run it again. I think there might be some way that I can code the CODE taking progress into my current program and make it automatic.
感谢任何说明或示例代码.
Any instruction or sample code is appreciated.
推荐答案
您应该查看 python-oauth2 模块.这似乎是最稳定的东西.
You should check out the python-oauth2 module. It seems to be the most stable thing out there.
特别是,这篇博文 运行良好了解如何使用 Python 轻松完成 Oauth.示例代码使用 Foursquare API,所以我会先检查一下.
In particular, this blog post has a really good run down on how to do Oauth easily with Python. The example code uses the Foursquare API, so I would check that out first.
我最近不得不让 oauth 与 Dropbox 合作,并编写了这个模块 包含进行 oauth 交换的必要步骤.
I recently had to get oauth working with Dropbox, and wrote this module containing the necessary steps to do oauth exchange.
对于我的系统,我能想到的最简单的事情就是pickle
Oauth 客户端.我的博客包只是反序列化了腌制客户端并使用以下函数请求端点:
For my system, the simplest thing I could think of was to pickle
the Oauth client. My blog package just deserialized the pickled client and requested endpoints with the following function:
get = lambda x: client.request(x, 'GET')[1]
只要确保你的工作人员有这个客户端对象,你应该很高兴:-)
Just makes sure your workers have this client object and you should be good to go :-)
这篇关于如何使用 oauth2 为网站构建 Python 爬虫的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!