如何使用tweepy仅提取主题标签中的文本? [英] How to extract only texts in hashtag using tweepy?

查看：45 发布时间：2020/5/2 6:01:55 python list pandas dictionary tweepy

本文介绍了如何使用tweepy仅提取主题标签中的文本?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想为我的情感分析项目提取主题标签，但是我在字典中找到了一个列表，其中包含所有主题标签及其在tweet中的索引.我只想要文字.

I want to extract hashtags for my sentiment analysis project, however I'm getting a list of dictionary containing all the hashtags along with their indices in the tweet. I only want the text.

data = tweepy.Cursor(api.search, q, since=a[i], until=b[i]).items()
    tweet_data = []
    tweets = pd.DataFrame()
    tweets['Tweet_ID'] = map(lambda tweet: tweet['id'], tweet_data)
    tweets['Tweet'] = map(lambda tweet: tweet['text'].encode('utf-8'), tweet_data)
    tweets['Date'] = map(lambda tweet: time.strftime('%Y-%m-%d %H:%M:%S', time.strptime(tweet['created_at'],'%a %b %d %H:%M:%S +0000 %Y')), tweet_data)
    tweets['User'] = map(lambda tweet: tweet['user']['screen_name'], tweet_data)
    tweets['Follower_count'] = map(lambda tweet: tweet['user']['followers_count'], tweet_data)
    tweets['Hashtags']=map(lambda tweet: tweet['entities']['hashtags'], tweet_data)

电流输出:

df=pd.DataFrame({'Hashtags' : [{u'indices': [53, 65], u'text': u'Predictions'}, {u'indices': [67, 76], u'text': u'FreeTips'}, {u'indices': [78, 89], u'text': u'SoccerTips'}, {u'indices': [90, 103], u'text': u'FootballTips'}, {u'indices': [104, 110], u'text': u'Goals'}]})

预期输出:

df=pd.DataFrame({'Hashtags' :["u'Predictions'", "u'SoccerTips'", "u'FootballTips'", "u'Goals'"]})

我尝试使用几种方法来展平/减少/访问包含字典列表的嵌套字典.请帮忙.

I've tried to use several methods to flatten/reduce/access a nested dictionary containing list of dictionaries. Please help.

如@MSeifert所建议，我已经尝试了他的方法.生成以下错误:

as @MSeifert suggested, I've tried his method. The following error was generated:

dt=tweet.entities.hashtags
pd.io.json.json_normalize(dt, 'hashtags')
pd.io.json.json_normalize(dt, 'hashtags')['text'].tolist()

Traceback (most recent call last): <\br>

File "<ipython-input-166-be11241611d6>", line 1, in <module>
dt=tweet.entities.hashtags

AttributeError: 'dict' object has no attribute 'entities'

我也尝试过这样做:-

dx = tweets['Hashtags']
for key, value in dx.items():
    print key, value

出现以下错误:-

Traceback (most recent call last):

File "<ipython-input-167-d66c278ec072>", line 2, in <module>
    for key, value in dx.items():

File "C:\ANACONDA\lib\site-packages\pandas\core\generic.py", line 2740, in __getattr__
    return object.__getattribute__(self, name)

AttributeError: 'Series' object has no attribute 'items'

UPDATE:

我能够访问嵌套主题标签词典的文本部分

UPDATE :

I'm able to access the text part of the nested hashtags dictionary

tweets['Hashtags'][1][1]['text']
Out[209]: u'INDvPAK'

我想创建一个循环，以将行中的所有主题标签追加.

I want to create a loop to append all the hashtags in the row.

解决方案:

在对故障排除并尝试了多种方法很多时间之后，我终于找到了如何拆分嵌套字典的方法. 这是一个相当简单的循环.我注意到我们可以通过

Here's the solution :

After troubleshooting and trying various methods for a lot of time, I finally figured out how to split the nested dictionary. It is a fairly simple loop. I noticed that we can access the hashtag text by

tweets['Hashtags'][1][1]['text']
Out[209]: u'INDvPAK'

这是一个宝贵的见解，因为我知道我不需要提及u'text作为我的索引. text将被使用.

This was a valuable insight as i got to know I DON'T need to mention u'text as my index. text will be used.

ht=[]
for s in range(len(tweets['Hashtags'])):
    hasht=[]
    for t in range(len(tweets.Hashtags[s])):
        #zx = tweets['Hashtags'][s][t]['text']
        hasht.append(tweets['Hashtags'][s][t]['text'])
        t=t+1
    ht.append(hasht)
    s=s+1
tweets['HT']=zip(ht)

这是一个简单的嵌套for循环，它首先遍历{ "Indices" : [], "u'text'" : []}中的内部键值，然后遍历["entities" : { "Hashtags" : [{1},{2},{3}]}]

This is a simple nested for loop which iterates through first the inner key values in the { "Indices" : [], "u'text'" : []} and then iterates through the list of dictionaries under ["entities" : { "Hashtags" : [{1},{2},{3}]}]

最后，我用zip()压缩了单个行/用户的主题标签列表.

Finally I used zip() to zip the lists of hashtags for a single row/user.

([u'SoccerTips', u'FootballTips'],)

这很容易拆分.

这篇关于如何使用tweepy仅提取主题标签中的文本?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用tweepy仅提取主题标签中的文本? [英] How to extract only texts in hashtag using tweepy?

问题描述

电流输出:

预期输出:

UPDATE:

UPDATE :

推荐答案

解决方案:

Here's the solution :

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用tweepy仅提取主题标签中的文本? [英] How to extract only texts in hashtag using tweepy?

问题描述

电流输出:

预期输出:

UPDATE:

UPDATE :

推荐答案

解决方案:

Here's the solution :

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭