如何检查 Reddit 帖子是否仅包含图片而没有其他内容? [英] How do I check if a Reddit post contains only an image and nothing else?

查看:198
本文介绍了如何检查 Reddit 帖子是否仅包含图片而没有其他内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景:我目前正在使用 praw 库和 Python 制作 Reddit 机器人3.7.我的机器人需要做的一件事是检查一些 subreddit 上的最新帖子,看看它们是否包含只包含一个图像而没有其他内容.

鉴于 Reddit 上有不同类型的帖子(只是上传图片的帖子和带有图片的普通文本帖子),我首先决定区分这两种可能性.据我所知,praw 不提供任何功能来获取 Reddit 帖子的类型.

为了处理只有图片而没有其他内容的帖子,我只检查返回的 praw 的 URL 提交,带有特定正则表达式:

^http(s)?://i\.redd\.it/\w+\.(png|gif|jpg|jpeg)$

如果 URL 匹配,我只下载图像.这有效.另一方面,对于碰巧只包含一张图片的文本帖子,我检查了 selftext 属性,对于只包含一张图片而没有其他任何内容的帖子,它是这样的:

​\n\nhttps://i.redd.it/xxxxxxxxxx.png

使用上面的正则表达式(去除了开始和结束标记),我可以提取 URL 并确保通过 re.findall 只有一个存在.但是,我如何确保帖子中完全没有 文本(除了空格和奇怪的转义序列​,我不这样做)不明白它的目的)?

解决方案

据我所知,praw 不提供任何功能来获取 Reddit 帖子的类型.

PRAW 从 Reddit 的响应中动态加载属性.有关任何给定对象的可用内容,请参阅 PRAW 文档部分 确定对象的可用属性.对于 Submission,它推荐以下代码段:

<块引用>

导入pprint# 假设你有一个 Reddit 实例绑定到变量 `reddit`提交 = reddit.submission(id='39zje0')print(submission.title) # 使其非懒惰pprint.pprint(变量(提交))

这将打印出可用属性的dict.使用它,您将发现属性 .is_self.is_reddit_media_domain.第一个会告诉你(作为一个布尔值)一个帖子是否是一个自我发布,第二个会告诉你(也是一个布尔值)一个帖子是否是reddit 媒体",其中也包括视频.无需将 URL 与正则表达式匹配,只需检查 .is_reddit_media_domain 是否为真且 .domain == 'i.redd.it'.

例如:

在 [5]: reddit.submission('anr0l2').is_self输出[5]:真在 [6] 中:reddit.submission('anspgf').domain == 'i.redd.it'输出[6]:真在 [7] 中:reddit.submission('antg2x').domain == 'i.redd.it'出[7]:假

<小时><块引用>

如何确保图像中完全没有文字

图片中没有文字"是什么意思?图片中包含文字对您来说意味着什么?

Background: I'm currently making a Reddit bot using the praw library with Python 3.7. One of the things my bot needs to do is check the latest posts on some subreddit to see if they contain just an image and nothing else.

Given that there are different types of posts on Reddit (posts that are just an uploaded image and normal text posts with an image in them), I first decided to differentiate between these two possibilities. As far as I'm aware, praw doesn't provide any functionality to get the type of Reddit post.

To handle posts which are just images and nothing else, I just check the URL of the returned praw submission with a specific regex:

^http(s)?://i\.redd\.it/\w+\.(png|gif|jpg|jpeg)$

If the URL matches, I just download the image. This works. On the other hand, for text posts that happen to contain just an image, I check the selftext property, which is something like this for posts that contain just an image and nothing else:

&#x200B;\n\nhttps://i.redd.it/xxxxxxxxxx.png

Using the regex above (with beginning and end markers removed), I can extract the URL and make sure only one is there through re.findall. However, how can I make sure that there is absolutely no text at all in the post (except whitespace and that weird escape sequence &#x200B;, which I don't understand its purpose)?

解决方案

As far as I'm aware, praw doesn't provide any functionality to get the type of Reddit post.

PRAW loads attributes dynamically from Reddit's response. To what's available on any given object, see the PRAW documentation section Determine Available Attributes of an Object. For a Submission, it recommends the following snippet:

import pprint

# assume you have a Reddit instance bound to variable `reddit`
submission = reddit.submission(id='39zje0')
print(submission.title) # to make it non-lazy
pprint.pprint(vars(submission))

This will print out a dict of the available attributes. Using this, you will discover the attributes .is_self and .is_reddit_media_domain. The first will tell you (as a boolean) whether or not a post is a self post, and the second will tell you (also as a boolean) whether a post is "reddit media," which also includes videos. Rather than matching the URL to a regex, just check that .is_reddit_media_domain is true and .domain == 'i.redd.it'.

For example:

In [5]: reddit.submission('anr0l2').is_self
Out[5]: True

In [6]: reddit.submission('anspgf').domain == 'i.redd.it'
Out[6]: True

In [7]: reddit.submission('antg2x').domain == 'i.redd.it'
Out[7]: False


how can I make sure that there is absolutely no text at all in the image

What do you mean by "no text in the image"? What does it mean to you for an image to contain text?

这篇关于如何检查 Reddit 帖子是否仅包含图片而没有其他内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆