PRAW 6:获取 subreddit 的所有提交 [英] PRAW 6: Get all submission of a subreddit

查看:81
本文介绍了PRAW 6:获取 subreddit 的所有提交的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 PRAW 从最新到最旧迭代某个 subreddit 的提交.我以前是这样做的:

I'm trying to iterate over submissions of a certain subreddit from the newest to the oldest using PRAW. I used to do it like this:

subreddit = reddit.subreddit('LandscapePhotography')
for submission in subreddit.submissions(None, time.time()):
    print("Submission Title: {}".format(submission.title))

但是,当我现在尝试这样做时,出现以下错误:

However, when I try to do it now I get the following error:

AttributeError: 'Subreddit' 对象没有属性 'submissions'

从查看文档我似乎无法弄清楚如何做到这一点.我能做的最好的是:

From looking at the docs I can't seem to figure out how to do this. The best I can do is:

for submission in subreddit.new(limit=None):
    print("Submission Title: {}".format(submission.title))

但是,这仅限于前 1000 份提交.

However, this is limited to the first 1000 submissions only.

有没有办法对所有提交而不只是前 1000 个提交执行此操作?

Is there a way to do this with all submissions and not just the first 1000 ?

推荐答案

不幸的是,Reddit 从他们的 API 中删除了这个功能.

查看 PRAW 更改日志.6.0.0 版本的变化之一是:

Unfortunately, Reddit removed this function from their API.

Check out the PRAW changelog. One of the changes in version 6.0.0 is:

已删除

  • Subreddit.submissions as the API endpoint backing the method is no more. See https://www.reddit.com/r/changelog/comments/7tus5f/update_to_search_api/.

链接的帖子说 Reddit 正在为所有用户禁用 Cloudsearch:

The linked post says that Reddit is disabling Cloudsearch for all users:

从 2018 年 3 月 15 日起,我们将开始逐步将 API 用户转移到新的搜索系统.到 3 月底,我们希望将所有人都搬走,并最终关闭旧系统.

Starting March 15, 2018 we’ll begin to gradually move API users over to the new search system. By end of March we expect to have moved everyone off and finally turn down the old system.

PRAW 的 Subreddit.sumbissions() 使用 Cloudsearch 搜索给定时间戳之间的帖子.由于 Cloudsearch 已被删除,并且替换它的搜索不支持时间戳搜索,不再可能使用 PRAW 或任何其他 Reddit API 客户端执行基于时间戳的搜索.这包括尝试从 subreddit 获取所有帖子.

PRAW's Subreddit.sumbissions() used Cloudsearch to search for posts between the given timestamps. Since Cloudsearch has been removed and the search that replaced it doesn't support timestamp search, it is no longer possible to perform a search based on timestamp with PRAW or any other Reddit API client. This includes trying to get all posts from a subreddit.

欲了解更多信息,请参阅来自/r/redditdev 的帖子由PRAW 的维护者.

For more information, see this thread from /r/redditdev posted by the maintainer of PRAW.

由于 Reddit 将所有列表限制为约 1000 个条目,因此目前无法使用其 API 获取 subreddit 中的所有帖子.但是,存在带有 API 的第三方数据集,例如 pushshift.io.正如/u/kungming2 在 Reddit 上所说:

Since Reddit limits all listings to ~1000 entries, it is currently impossible to get all posts in a subreddit using their API. However, third-party datasets with APIs exist, such as pushshift.io. As /u/kungming2 said on Reddit:

您可以使用 Pushshift.io 仍然从定义的时间返回数据期间使用他们的 API:

You can use Pushshift.io to still return data from defined time periods by using their API:

https://api.pushshift.io/reddit/submission/search/?after=1334426439&before=1339696839&sort_type=score&sort=desc&subreddit=translator

例如,这允许您解析提交给 r/translator2012-04-14 和 2012-06-2014 之间.

This, for example, allows you to parse submissions to r/translator between 2012-04-14 and 2012-06-2014.

这篇关于PRAW 6:获取 subreddit 的所有提交的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆