使用 twitteR 时排除 twitter 句柄 [英] excluding twitter handles while using twitteR

查看:50
本文介绍了使用 twitteR 时排除 twitter 句柄的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用 R 中的 twitteR 包分析#flipkart 上的推文时,大多数推文都是关于优惠的新闻,大约有 2-3 个句柄.这无助于评估关于flipkart 的整体情绪.我可以在提取推文时排除这 2-3 个句柄吗?我需要客户的回应,而不是优惠信息.谢谢

while analyzing tweets on #flipkart using twitteR package in R, most of the tweets are news on offers, by about 2-3 handles. this does not help evaluate the overall sentiment about flipkart. can I exclude these 2-3 handles while extracting the tweets? I need customer response,not news on offers. Thanks

推荐答案

这只是一个提示,不是完整的解决方案(我认为这是不可行的).然而,评论太长了.

This is just a hint, not a full solution (which I don't think it is feasible). However it is far too long for a comment.

查看 twitter api 文档,如何搜索,段落 查询运算符.如果您将 - 附加到某个术语,则您会将其从查询中排除.

Take a look to twitter api docs, how to search, paragraph query operators. If you prepend a - to a term you exclude it from your query.

这会在您的搜索查询中转换为 twitteR,如下所示:

This translates to twitteR simply at your search query, as follows:

searchTwitter("#flipkart -pricetrak", n=10)

您可以尝试排除某些术语,但这不是一件容易的事.

You can try to exclude some of your terms, but it is not going to be an easy task.

此外,你不应该做例如#flipkart -@flipkart,因为大多数客户的评论似乎都是针对用户 @flipkart 的,你会失去它们.(搜索查询的字词被解释为推文的用户或内容.)

Besides, you should not do e.g. #flipkart -@flipkart, since most of customer's comments seem to be addressed to the user @flipkart, and you would loose them. (The terms of the search query are interpreted as either users or content of the tweet.)

最后一点,您的搜索查询中最多只有 500 个字符.

As a final note, you've got only up to 500 characters in your search query.

希望它以某种方式有所帮助.

Hope it helps somehow.

更新

根据评论,我建议您可以采取其他一些简单的操作.但是恐怕没有灵丹妙药",您应该使用数据并进行大量试验.另外,需要注意的是 twitteR 库,虽然它非常有用,但在做更多高级"的东西时使用起来有点棘手(实际上每个 twitter 库都有点像我的经验,有时您最终需要直接访问 Twitter REST API;从某种意义上说,这是有道理的,因为库使常见任务变得非常简单,例如获取推文,但对于其他任务不一定容易).

As per comments, I propose some other easy actions you could take. But I'm afraid there is no "magic bullet" and you should play with the data and do lots of trials. Also, it is important to note that twitteRlibrary, although it is very useful, it's a bit more tricky to use when doing more "advanced" stuff (actually every twitter library is a bit like that in my experience, and sometimes you end up needing to access to Twitter REST API directly; in a way it makes sense, since libraries make the common tasks really straightforward, like fetching tweets, but are not necessarily easy for other tasks).

  1. 跳过某些用户或经过验证的用户的推文

这是一个关于如何访问发送推文的用户的玩具示例:

This is a toy example of how do you access to the user that sent the tweet:

tweets = searchTwitter("#flipkart -pricetrak", n=10)

for (tweet in tweets) {
  screenName <- as.data.frame(tweet)$screenName
  print(screenName)
  tuser <- getUser(screenName)
  verified <- as.data.frame(tuser)$verified
  print(verified)
}

通过这种方式,您可以过滤来自 @flipkart 的推文,或者从您知道的非客户用户列表中过滤推文.而且,您可以假设客户通常不是经过验证的用户(请参阅此处有关已验证帐户的更多详细信息),只需过滤来自他们的推文.

This way you could filter the tweets from @flipkart for example, or from a list of users that you know that are not customers. And also, you could assume that the customers are not usually verified users (see here for more details on verified accounts) and just filter the tweets from them.

  1. 跳过文本中有链接的推文

客户在其推文中发送链接会很奇怪(尽管现在当然不可能).你可以用类似的方式过滤它们:

It would be weird (though now impossible of course) that a customer sends a link in its tweet. You could filter them in a similar way:

for (tweet in tweets) {
  text <- as.data.frame(tweet)$text
  print(text)
  print(length(strsplit(text, "https://")[[1]]))
}

(当然如果 strsplit 的长度为 1,则推文文本中没有链接).

(Where of course if length of the strsplit is 1, there is no links in the text of the tweet).

通过这种方式,您将过滤一些实际上来自真实客户的推文,但我认为这将是一种简单的方法来过滤大多数包含优惠或交易的推文(所有包括一个链接).

In this way you will filter some tweets that actually are from real customers, but I gather this would be an easy way to filter most of the tweets that include an offer or deal (all of them include a link).

希望有用.

更新 2

评论后,代码的改进版本,

After comments, an improved version of the code,

data <- NULL
ads <- NULL
for (tweet in tweets) {
  tweet_df <- as.data.frame(tweet)
  screenName <- tweet_df$screenName
  tuser <- getUser(screenName)
  verified <- as.data.frame(tuser)$verified
  print(verified)
  if (verified == TRUE) {
    ads <- rbind(ads, tweet_df)    
  } else {
    data <- rbind(data, tweet_df)    
  }
}
if (! is.null(ads)) {
  write.table(ads, file = "ads.csv", append=TRUE, col.names=FALSE)
}
if (! is.null(data)) {
  write.table(data, file = "data.csv", append=TRUE, col.names=FALSE)
}

这篇关于使用 twitteR 时排除 twitter 句柄的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆