转推的python正则表达式 [英] python regular expression for retweets

查看：18 发布时间：2021/9/11 18:36:50 python regex twitter

本文介绍了转推的python正则表达式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在研究一个正则表达式，它将从推文中提取转推关键字和用户名.这是一个例子，用一个相当糟糕的正则表达式来完成这项工作:

i'm working on a regex that will extract retweet keywords and user names from tweets. here's an example, with a rather terrible regex to do the job:

tweet='foobar RT@one, @two: @three barfoo'
m=re.search(r'(RT|retweet|from|via)\b\W*@(\w+)\b\W*@(\w+)\b\W*@(\w+)\b\W*',tweet)
m.groups()
('RT', 'one', 'two', 'three')

我想要的是浓缩重复的 \b\W*@(\w+)\b\W* 模式并使它们成为可变数字，这样如果 @four 是在@three 之后添加，它也会被提取.我已经尝试了很多排列来重复使用 + 失败.

what i'd like is to condense the repeated \b\W*@(\w+)\b\W* patterns and make them of a variable number, so that if @four were added after @three, it would also be extracted. i've tried many permutations to repeat this with a + unsuccessfully.

我也希望它适用于类似的事情

i'd also like this to work for something like

tweet='foobar RT@one, RT @two: RT @three barfoo';

这可以通过 re.finditer 实现如果模式不重叠.(我有一个模式重叠的版本，所以只有第一个 RT 被选中.)

which can be achieved with a re.finditer if the patterns don't overlap. (i have a version where the patterns do overlap, and so only the first RT gets picked up.)

非常感谢任何帮助.谢谢.

any help is greatly appreciated. thanks.

推荐答案

尝试

(RT|retweet|from|via)(?:\b\W*@(\w+))+'

将 \b\W*@(\w+) 括在 '(?:...)` 中，您可以对重复的术语进行分组，而无需捕获聚合.

Enclosing the \b\W*@(\w+) in '(?:...)` allows you to group the terms for repetition without capturing the aggregate.

我不确定我是否在关注您问题的第二部分，但我认为您可能正在寻找涉及以下结构的内容:

I'm not sure I'm following the second part of your question, but I think you may be looking for something involving a construct like:

(?:(?!RT|@).)

它将匹配任何不是@"或RT"开头的字符，同样不捕获它.

which will match any character that isn't an "@" or the start of "RT", again without capturing it.

在那种情况下，如何:

(RT|retweet|from|via)((?:\b\W*@\w+)+)

然后后期处理

re.split(r'@(\w+)' ,m.groups()[1])

获取单个句柄?

这篇关于转推的python正则表达式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

转推的python正则表达式 [英] python regular expression for retweets

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

转推的python正则表达式 [英] python regular expression for retweets

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭