解析鸣叫提取到的＃标签在Python数组 [英] Parsing a tweet to extract hashtags into an array in Python

查看：138 发布时间：2016/5/31 19:22:39 python arrays

本文介绍了解析鸣叫提取到的＃标签在Python数组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个时间赫克取信息，包括主题标记鸣叫，并拉动各包括hashtag成使用Python的数组。我很尴尬，甚至把我迄今已尝试。

I am having a heck of a time taking the information in a tweet including hashtags, and pulling each hashtag into an array using Python. I am embarrassed to even put what I have been trying thus far.

例如，我爱#stackoverflow因为#people非常#helpful！

For example, "I love #stackoverflow because #people are very #helpful!"

这应该拉3＃标签到一个数组中。

This should pull the 3 hashtags into an array.

推荐答案

一个简单的regex应该做的工作：

A simple regex should do the job:

>>> import re
>>> s = "I love #stackoverflow because #people are very #helpful!"
>>> re.findall(r"#(\w+)", s)
['stackoverflow', 'people', 'helpful']

请注意虽然，在其他的答案的建议，这也可能会发现非主题标签，比如在一个URL的哈希位置：

Note though, that as suggested in other answers, this may also find non-hashtags, such as a hash location in a URL:

>>> re.findall(r"#(\w+)", "http://example.org/#comments")
['comments']

所以另一种简单的解决方案是以下（删除重复作为奖金）：

So another simple solution would be the following (removes duplicates as a bonus):

>>> def extract_hash_tags(s):
...    return set(part[1:] for part in s.split() if part.startswith('#'))
...
>>> extract_hash_tags("#test http://example.org/#comments #test")
set(['test'])

这篇关于解析鸣叫提取到的＃标签在Python数组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

解析鸣叫提取到的＃标签在Python数组 [英] Parsing a tweet to extract hashtags into an array in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

解析鸣叫提取到的＃标签在Python数组 [英] Parsing a tweet to extract hashtags into an array in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭