如何使用 oauth2 为网站构建 Python 爬虫 [英] How to build a Python crawler for websites using oauth2

查看:48
本文介绍了如何使用 oauth2 为网站构建 Python 爬虫的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是网络编程新手.我想构建一个爬虫,用于通过 Python 爬取 Foursquare 中的社交图.我通过使用 apiv2 库获得了一个手动"控制的爬虫.主要方法如下:

I'm new in web programming. I want to build a crawler for crawling the social graph in Foursquare by Python. I've got a "manually" controlled crawler by using the apiv2 library. The main method is like:

def main():
    CODE = "******"
    url = "https://foursquare.com/oauth2/authenticate?client_id=****&response_type=code&redirect_uri=****"
    key = "***"
    secret = "****"
    re_uri = "***"

    auth = apiv2.FSAuthenticator(key, secret, re_uri)
    auth.set_token(code)    
    finder = apiv2.UserFinder(auth)        

    #DO SOME REQUIRES By USING THE FINDER
    finder.finde(ANY_USER_ID).mayorships()
    bla bla bla

问题是,目前我必须在我的浏览器中输入 URL 并从重定向 URL 中提取 CODE,然后在我的程序中更新 CODE,并再次运行它.我认为可能有某种方法可以将 CODE 编码到我当前的程序中并使其自动进行.

The problem is that at present, I have to type the URL in my browser and pick up the CODE from the redirect URL, and then update the CODE in my program, and run it again. I think there might be some way that I can code the CODE taking progress into my current program and make it automatic.

感谢任何说明或示例代码.

Any instruction or sample code is appreciated.

推荐答案

您应该查看 python-oauth2 模块.这似乎是最稳定的东西.

You should check out the python-oauth2 module. It seems to be the most stable thing out there.

特别是,这篇博文 运行良好了解如何使用 Python 轻松完成 Oauth.示例代码使用 Foursquare API,所以我会先检查一下.

In particular, this blog post has a really good run down on how to do Oauth easily with Python. The example code uses the Foursquare API, so I would check that out first.

我最近不得不让 oauth 与 Dropbox 合作,并编写了这个模块 包含进行 oauth 交换的必要步骤.

I recently had to get oauth working with Dropbox, and wrote this module containing the necessary steps to do oauth exchange.

对于我的系统,我能想到的最简单的事情就是pickle Oauth 客户端.我的博客包只是反序列化了腌制客户端并使用以下函数请求端点:

For my system, the simplest thing I could think of was to pickle the Oauth client. My blog package just deserialized the pickled client and requested endpoints with the following function:

get = lambda x: client.request(x, 'GET')[1]

只要确保你的工作人员有这个客户端对象,你应该很高兴:-)

Just makes sure your workers have this client object and you should be good to go :-)

这篇关于如何使用 oauth2 为网站构建 Python 爬虫的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆