从推特列表中获取推特屏幕名称 [英] Obtaining twitter screen names from a twitter list

查看:35
本文介绍了从推特列表中获取推特屏幕名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很想使用 R 从特定 twitter 列表中获取用户名和全名列表.我在任何包中都看不到函数,但此代码有效

I am keen to get a list of usernames and fullnames names from a specific twitter list using R. I could not see a function in any package but this code works

library(XML)
library(httr)


url.name <- "https://twitter.com/TwitterUK/lists/premier-league-players/members"
url.get=GET(url.name)
url.content=content(url.get, as="text")
pagehtml <- htmlParse(url.content)

screenNames <-xpathSApply(pagehtml, '//*/span[@class="username js-action-profile-name"]',xmlValue)
realName <- xpathSApply(pagehtml, '//*/strong[@class="fullname js-action-profile-name"]',xmlValue)

但是,它只提供前 20 个值(?屏幕上显示的内容),而列表要长得多

However, it only provides the first 20 values (? what appears on screen) whilst the list is much longer

如果有一个 rvest 解决方案,这也将是受欢迎的

If there is an rvest solution, this would also be welcome

干杯

推荐答案

如果你想使用 R 和 twitter,你应该看看 twitteR .它没有检索你想要的信息的功能,但是我们可以利用它的内部功能来使用OAuth,然后发送正确的 API 调用.使用 API 调用的优势在于您不依赖于解析 HTML 页面,您实际上是在做开发人员应该做的事情.

If you want to work with R and twitter, you should take a look at the twitteR package. It doesn't have a function to retrieve the information you want, but we can take advantage of its internal functions to use OAuth, and then send the correct API call. The advantage of using API calls is that you don't rely on parsing the HTML page, you're actually doing what developers are supposed to do.

下面的代码假设您已经使用 setup_twitter_oauth() 进行了身份验证,您可以轻松找到相关教程,因为它是包的基础知识.一旦通过身份验证,让我们加载我们需要的包:

The code below assumes you have already authenticated using setup_twitter_oauth(), you can find tutorials on this easily, since it's the package basics. Once authenticated, let's load the packages we need:

library(rjson)
library(httr)
# library(twitteR) Should have been loaded already of course

现在要进行 API 调用,我们将使用 POST.该 URL 有一个 slug 参数,它是 Twitter 列表名称,以及一个 owner_screen_name 参数,它是列表的 Twitter 帐户所有者.我们将使用内部 twitteR:::get_oauth_sig() 来验证调用.

Now to do the API call, we'll use POST. The URL has a slug parameter which is the twitter list name, and a owner_screen_name parameter which is the Twitter Account owner of the list. We'll use internal twitteR:::get_oauth_sig() to authenticate the call.

twlist <- "premier-league-players"
twowner <- "TwitterUK"
api.url <- paste0("https://api.twitter.com/1.1/lists/members.json?slug=",
           twlist, "&owner_screen_name=", twowner, "&count=5000")
response <- POST(api.url, config(token=twitteR:::get_oauth_sig()))
#Count = 5000 is the number of names per result page,
#        which for this case simplifies things to one page.

这将返回一个 JSON 响应,我们可以使用 fromJSON 读取该响应:

This returns a JSON response which we can read using fromJSON:

response.list <- fromJSON(content(response, as = "text", encoding = "UTF-8"))

现在,我们有一个列表,其中每个元素都是一个 Twitter 列表成员的 Twitter 数据.提取他们的姓名和用户名:

Now, we have a list where each element is the Twitter data of one Twitter-list member. To extract their names and user_names:

users.names <- sapply(response.list$users, function(i) i$name)
users.screennames <- sapply(response.list$users, function(i) i$screen_name)

分别是:

> head(users.names)
[1] "Peter Crouch"         "barry bannan"         "Jose Leonardo Ulloa "
    "Paul McShane"         "nacho monreal"        "James Ward-Prowse"
> head(users.screennames)
[1] "petercrouch"   "bazzabannan25" "Ciclone1923"   "pmacca15"
    "_nachomonreal" "Prowsey16"

现在这段代码最好的部分是它从 R 中打开了几乎整个 twitter API,作为一个已经过身份验证的请求.您可以查看每个查询的所有可用信息的响应列表和子列表.

Now the best part of this code is that it opens up pretty much the entire twitter API from R, as an already authenticated request. You can check the response list and sublists for all the available information on each query.

这篇关于从推特列表中获取推特屏幕名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆