计算标签的功能 [英] Function to count hashtags

查看:61
本文介绍了计算标签的功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图获得一种功能,该功能可以计算并显示列表的标签。

I'm trying to get a function that counts and shows hashtags of a list.

示例输入:

["Hey, im in the #pool",
 "beautiful #city",
 "#city is nice",
 "Have a nice #weekend",
 "#weekend <3",
 "Nice #"]

输出:

{"pool" : 1, "city" : 2, "weekend" : 2}

但是,如果只有,且后面没有单词,则不应将其视为主题标签。
与井号之前的内容相同,不允许将类似%#的内容计为井号。
标签定义为(az,AZ,0-9),每隔一个char结束标签

But if there is only a # followed by no words, it should not count as a hashtag. Same with stuff before the hashtag, something like „%#" is not allowed to count as a hashtag. Hashtags are defined (a-z,A-Z,0-9) every other char ends the hashtag

我当前的代码:

def analyze(posts):
    tag = {}
    for sentence in posts:
        words = sentence.split(' ')
        for word in words:
            if word.startswith('#'):
                if word[1:] in tag.keys():
                    tag[word[1:]] += 1
                else:
                    tag[word[1:]] = 1
    return(tag)


posts = ["Hey, im in the #pool",
         "beautiful #city",
         "#city is nice",
         "Have a nice #weekend",
         "#weekend <3",
         "Nice #"]
print(analyze(posts))


推荐答案

一键进行不区分大小写正则表达式搜索和 collections.Counter 对象:

In one pass with case insensitive regex search and collections.Counter object:

from collections import Counter
import re

lst = ["Hey, im in the #pool", "beautiful #city", "#city is nice",
       "Have a nice #weekend", "#weekend <3", "Nice #"]

hash_counts = Counter(re.findall(r'#([a-z0-9]+)', ' '.join(lst), re.I))
print(dict(hash_counts))

输出:

{'pool': 1, 'city': 2, 'weekend': 2}

这篇关于计算标签的功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆