R (2.15.2) twitteR 包中的 searchTwitter() - 大量重复的推文 [英] searchTwitter() in twitteR package for R (2.15.2) - high number of duplicate tweets

查看：45 发布时间：2021/9/11 18:46:00 r twitter

本文介绍了R (2.15.2) twitteR 包中的 searchTwitter() - 大量重复的推文的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

试图通过从 Twitter REST API 拉取来创建与关键字关联的 Twitter 用户名数据框.但是在许多搜索词(例如 #rstats)上使用 searchTwitter() 的查询，即使对于像 n = 1000 这样的大样本，也返回高度(>90%) 的重复推文.

Trying to create a dataframe of Twitter usernames associated with keyword through pulls from the Twitter REST API. But queries using searchTwitter() on many search terms (e.g. #rstats), even for large samples like n = 1000, return high degree (>90%) of duplicate tweets.

一个具体的例子是:

tweets <- searchTwitter("#rstats", n = 1000)
tweets.df <- do.call("rbind", lapply(tweets, as.data.frame))

df.undup <- df[duplicated(tweets.df) == FALSE,]
dim(df.undup)

如果搜索词相对稀少，我想知道这是否是由于分页限制造成的?

I'm wondering if this is caused by limits on pagination if the search term is relatively scarce?

推荐答案

首先，代码中的第 3 行应该是 df.undup <- tweets.df[duplicated(tweets.df) ==错误，] ?

First of all, should the 3rd line in your code be df.undup <- tweets.df[duplicated(tweets.df) == FALSE,] ?

我猜你得到的推文少于 1000 条，当你运行上面的代码时(我得到 604，dim(df.undup) 的结果是 604 10代码>).因此，我想问题不在于存在重复，而在于推文数量少于 1000.

I guess you're getting less than 1000 tweets, when you run the above code (I got 604, and the result of dim(df.undup) is 604 10). So the problem, I guess, is not that of duplicates being there, but that there are lesser number of tweets than 1000.

如果您查看创建日期，最早的推文来自 3 月 14 日(一周前).Twitter API 通常不允许访问超过 7-9 天的推文.我想这就是为什么你收到的推文数量较少的原因.

If you look at the created date, the oldest tweets are from 14th March (a week ago). Twitter API usuallly usually doesn't allow one to access tweets more than 7-9 days old. I guess that's why you're getting a lesser number of tweets.

要检查，查看 dim(tweets.df) 和 dim(undup.df) 是否返回相同的内容.

To check, see if dim(tweets.df) and dim(undup.df) return the same thing.

这篇关于R (2.15.2) twitteR 包中的 searchTwitter() - 大量重复的推文的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R (2.15.2) twitteR 包中的 searchTwitter() - 大量重复的推文 [英] searchTwitter() in twitteR package for R (2.15.2) - high number of duplicate tweets

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R (2.15.2) twitteR 包中的 searchTwitter() - 大量重复的推文 [英] searchTwitter() in twitteR package for R (2.15.2) - high number of duplicate tweets

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭