使用 PRAW 获得 100 多个搜索结果? [英] Getting more than 100 search results with PRAW?

查看:52
本文介绍了使用 PRAW 获得 100 多个搜索结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用以下代码通过 PRAW 4.4.0 获取 reddit 搜索结果:

I'm using the following code to obtain reddit search results with PRAW 4.4.0:

params = {'sort':'new', 'time_filter':'year'}
return reddit.subreddit(subreddit).search('', **params)

我想从 subreddit 中抓取无限量的帖子,时间最长为一年.Reddit 的搜索功能(以及相应的 API)通过 'after' 参数实现了这一点.但是,上述搜索功能不接受after"作为参数.有没有办法用PRAW的.search()获取100多个搜索结果?

I'd like to scrape an indefinite amount of posts from the subreddit, for a period of up to a year. Reddit's search functionality (and correspondingly, their API) achieves this with the 'after' parameter. However, the above search function doesn't accept 'after' as a parameter. Is there a way to use PRAW's .search() to obtain more than 100 search results?

推荐答案

是的,通过发送参数 limit=None 会将其增加到 1000,但不保证任何时间范围,也无法获取更多那 1000.但是您可以使用 cloudsearch 语法.它在 reddit wiki https://www.reddit.com/wiki/search#中有详细描述wiki_cloudsearch_syntax 是非常强大的搜索增强器.

Yes, by sending parameter limit=None will increase that to 1000, but will not guarantee any timeframe and no way to grab more that 1000. However you can use cloudsearch syntax. It is described in detail in reddit wiki https://www.reddit.com/wiki/search#wiki_cloudsearch_syntax and is pretty powerful search enhancer.

为了用一些代码来支持它,像这样的例子用法可以这样实现:

To support it with some code, example usage like this case can be achieved in this way:

import datetime
params = {'sort':'new', 'limit':None, 'syntax':'cloudsearch'}
time_now = datetime.datetime.now()
return reddit.subreddit(subreddit).search('timestamp:{0}..{1}'.format(
    int((time_now - datetime.timedelta(days=365)).timestamp()),
    int(time_now.timestamp())),
    **params)

每个查询限制为 1000 个结果,但由于指定的时间范围,您可以针对不同的时间范围多次查询.IE.抓取 1000 个提交,从最旧的一个获取 utc_time 并将该时间作为时间戳的第一个参数发送,这将给您从上次查询停止的时间点开始的结果.

This has limit of 1000 results per query, but due to specified timeframe you can query multiple times for different timeframes. I.e. grab 1000 submissions, get utc_time from oldest one and send that time as first parameter for timestamp, which will give you results starting at the point in time that your last query stopped.

这篇关于使用 PRAW 获得 100 多个搜索结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆