在 Twitter API 中使用正则表达式 [英] Using regular expression in Twitter API

查看:28
本文介绍了在 Twitter API 中使用正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Python 中使用 Tweepy 库来搜索推文.我想知道是否可以使用正则表达式来搜索推文.

I am using Tweepy Library in Python to search for tweets. I am wondering, if I can use regular expression to search Tweets.

我正在使用以下代码:

query = 'ARNOLD or SYLVESTER'     
for tweet in tweepy.Cursor(api.search,
                            query,
                            count=100,
                            result_type="recent",
                            include_entities=True,
                            lang="en").items():

例如,我是否可以搜索所有使用ARNOLD"或SYLVESTER"(全部大写/单个词)的推文,而忽略所有其他推文.

For instance, can I search for all tweets which uses 'ARNOLD' or 'SYLVESTER' ( all capital/single word) an ignore all the other tweets.

我目前正在处理由 Arnold 或 Sylvester 组成的所有推文,然后检查所有字符是否为大写.我想知道是否可以通过API搜索本身来完成.

I am currently processing the tweets after obtaining all the tweets consisting of Arnold or Sylvester and then checking if all the characters are in uppercase. I am wondering if it can be done through API search itself.

谢谢

推荐答案

遗憾的是,Twitter 不支持使用正则表达式搜索推文,这意味着您必须发布流程.实际上没有任何来自 Twitter 的官方文档来说明这一点,但是 每个人 谁使用 Twitter 搜索 API 后处理 他们的推文 使用正则表达式(包括我).由于没有明确的官方立场,我在搜索查询中尝试了几乎所有类型的正则表达式,但我没有运气.根据 Twitter 搜索 API 文档,查询必须是:

Twitter unfortunately doesn't support searching of tweets using regular expressions which means that you do have to post process. There's not actually any official documentation from Twitter to that effect, but everyone who uses the Twitter search API post-processes their tweets using regex (including me). Since there isn't a stated official position, I've tried just about every flavor of regex in search queries but I've had no luck. Per the Twitter search API documentation, queries must be:

一个 UTF-8、URL 编码的搜索查询,最多 1,000 个字符,包括运算符.查询还可能受到复杂性的限制.

A UTF-8, URL-encoded search query of 1,000 characters maximum, including operators. Queries may additionally be limited by complexity.

所有查询都是 UTF-8,显然是这样搜索的.如果我们可以在 API 搜索调用中指定一个正则表达式参数,但没有,那就太好了.

All queries are UTF-8 and are obviously searched as such. It'd be nice if there was a regex parameter we could specify in the API search call but there isn't.

这背后的原因可能是对所有推文运行正则表达式搜索会给 Twitter 本身带来额外的处理成本.

The reason behind this is likely the additional processing cost that running a regex search on all tweets would have for Twitter itself.

这篇关于在 Twitter API 中使用正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆