解析推文以将主题标签提取到数组中 [英] Parsing a tweet to extract hashtags into an array

查看：26 发布时间：2021/11/18 3:23:21 python arrays

本文介绍了解析推文以将主题标签提取到数组中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我花了很长时间在推文中获取包括主题标签的信息，然后使用 Python 将每个主题标签拉入一个数组中.我什至不好意思把迄今为止我一直在尝试的东西放进去.

例如，我喜欢 #stackoverflow 因为#people 非常#helpful！"

这应该将 3 个标签放入一个数组中.

解决方案

一个简单的正则表达式应该可以完成这项工作:

<预><代码>>>>进口重新>>>s = "我喜欢#stackoverflow 因为#people 非常#helpful！">>>re.findall(r"#(\w+)", s)['stackoverflow', '人', '有用']

但是请注意，正如其他答案中所建议的那样，这也可能会找到非主题标签，例如 URL 中的散列位置:

<预><代码>>>>re.findall(r"#(\w+)", "http://example.org/#comments")['注释']

因此，另一个简单的解决方案如下(删除重复项作为奖励):

<预><代码>>>>def extract_hash_tags(s):... return set(part[1:] for part in s.split() if part.startswith('#'))...>>>extract_hash_tags("#test http://example.org/#comments #test")设置(['测试'])

I am having a heck of a time taking the information in a tweet including hashtags, and pulling each hashtag into an array using Python. I am embarrassed to even put what I have been trying thus far.

For example, "I love #stackoverflow because #people are very #helpful!"

This should pull the 3 hashtags into an array.

解决方案

A simple regex should do the job:

>>> import re
>>> s = "I love #stackoverflow because #people are very #helpful!"
>>> re.findall(r"#(\w+)", s)
['stackoverflow', 'people', 'helpful']

Note though, that as suggested in other answers, this may also find non-hashtags, such as a hash location in a URL:

>>> re.findall(r"#(\w+)", "http://example.org/#comments")
['comments']

So another simple solution would be the following (removes duplicates as a bonus):

>>> def extract_hash_tags(s):
...    return set(part[1:] for part in s.split() if part.startswith('#'))
...
>>> extract_hash_tags("#test http://example.org/#comments #test")
set(['test'])

这篇关于解析推文以将主题标签提取到数组中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

解析推文以将主题标签提取到数组中 [英] Parsing a tweet to extract hashtags into an array

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

解析推文以将主题标签提取到数组中 [英] Parsing a tweet to extract hashtags into an array

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭