从 R 中的 twitter 状态中提取用户 [英] Extracting users from twitter status in R

查看:23
本文介绍了从 R 中的 twitter 状态中提取用户的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试了解特定用户向/提及另一个用户的频率.通过 twitteR 包,我可以检索给定用户的推文,但是如果推文提到多个用户,则在 replyToUID 字段中只提到第一个.所以我的数据框的第一列包含推文,例如:

I am trying to find out how often a specific user has tweeted to/mentioned another user. Through the twitteR-package I can retrieve the tweets for a given user, however if a tweet mentions several users only the first is mentioned in the replyToUID-field. So the first column of my data frame contains the tweets as for example:

@user1 @user2 你读过@user3 写的东西了吗?"

"@user1 @user2 have you read what @user3 wrote?"

我想将用户名提取到这样的列表中

and I would like to extract the usernames to a list like this

  • user1
  • user2
  • user3

来自下一条推文的用户被添加到下面.如果有人知道如何做(提取,我可以处理循环)或将我指向正确的方向,那将非常受欢迎.

with users from the next tweet being added below. If someone knows how to do (the extraction, I can deal with loops) it or point me in the right direction it would be much apprechiated.

可选地,为了真正有用,如果您知道如何最终组合列表(在处理了 n 条推文之后),而不是

Optionally, for the real helpful, if you have an idea how to compound the list that in the end (after n tweets have been processed), instead of

  • user1
  • user2
  • user3
  • user1
  • user3
  • user4

列表(或表格)是这样读的(计算某个用户被提及的频率)

the list (or then table) reads like this (counting how often a certain user has been mentioned)

  • 用户 1、2
  • user2, 1
  • user3, 2
  • 用户 4, 1

它会更受欢迎.

谢谢,埃利亚斯

推荐答案

我不确定有效 twitter 用户名的规则是什么,但假设只允许使用字母数字字符,您可以使用简单的正则表达式来完成:

I'm not sure what the rules are for a valid twitter user name, but assuming only alphanumeric characters are allowed, you can do it with a simple regular expression:

x <- "@user1 @user2 have you read what @user3 wrote?"

users <- function(x){
  xx <- strsplit(x, " ")
  lapply(xx, function(xx)xx[grepl("@[[:alnum:]]", xx)])
}

users(x)
[[1]]
[1] "@user1" "@user2" "@user3"

<小时>

此外,此解决方案还假设所有单词都用空格分隔,即它不适用于后跟标点符号的用户名.您必须扩展此答案以应对这种情况.


In addition, this solution also assumes that all words are split with spaces, i.e. it won't work for user names followed by punctuation marks. You'll have to extend this answer to cope with that scenario.

这篇关于从 R 中的 twitter 状态中提取用户的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆