如何将 searchTwitter 结果(来自 library(twitteR))转换为 data.frame? [英] How to convert searchTwitter results (from library(twitteR)) into a data.frame?

查看:27
本文介绍了如何将 searchTwitter 结果(来自 library(twitteR))转换为 data.frame?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将 Twitter 搜索结果保存到数据库 (SQL Server) 中,但在从 twitteR 中提取搜索结果时出现错误.

如果我执行:

图书馆(twitteR)puppy <- as.data.frame(searchTwitter("puppy", session=getCurlHandle(),num=100))

我收到以下错误:

as.data.frame.default(x[[i]], optional = TRUE) 中的错误:无法将类结构(status",package =twitteR")强制转换为 data.frame

这很重要,因为为了使用 RODBC 将其添加到使用 sqlSave 的表中,它需要是一个 data.frame.至少这是我收到的错误消息:

sqlSave 中的错误(localSQLServer, puppy, tablename = "puppy_staging", :应该是一个数据框

那么有人对如何将列表强制转换为 data.frame 或如何通过 RODBC 加载列表有任何建议吗?

我的最终目标是有一个表来反映 searchTwitter 返回的值的结构.这是我尝试检索和加载的示例:

图书馆(twitteR)puppy <- searchTwitter("puppy", session=getCurlHandle(),num=2)str(小狗)2人名单$ :Formal class 'status' [package "twitteR"] 有 10 个插槽.. ..@ text : chr "beautifull and kc reg Beagle Mix for rehomes: 这只小狗正在寻找一个新的充满爱的家庭...... http://bit.ly/9stN7V "|__截断__.. ..@ 收藏夹:logi FALSE.. ..@replyToSN : chr(0).. ..@ created : chr "Wed, 16 Jun 2010 19:04:03 +0000".. ..@ 截断:logi FALSE.. ..@replyToSID : num(0).. ..@ id : num 1.63e+10.. ..@replyToUID : num(0).. ..@ statusSource: chr "<a href="http://twitterfeed.com&quot; rel="nofollow&quot;>twitterfeed</a&amp;;gt;".. ..@ screenName : chr "puppy_ads"$ :Formal class 'status' [package "twitteR"] 有 10 个插槽.. ..@ text : chr最可爱的小狗跟着我走路,我奶奶不让我养它.把它带到一磅悲伤的脸".. ..@ 收藏夹:logi FALSE.. ..@replyToSN : chr(0).. ..@ created : chr "Wed, 16 Jun 2010 19:04:01 +0000".. ..@ 截断:logi FALSE.. ..@replyToSID : num(0).. ..@ id : num 1.63e+10.. ..@replyToUID : num(0).. ..@ statusSource: chr "<a href="http://blackberry.com/twitter&quot; rel="nofollow&quot;>Twitter for BlackBerry®&lt;/a>".. ..@ screenName : chr "iamsweaters"

所以我认为 puppy 的 data.frame 应该有这样的列名:

- 文本- 收藏- 回复SN- 创建- 截断- 回复SID- ID- 回复UID- 状态源- 屏幕名称

解决方案

试试这个:

ldply(searchTwitter("#rstats", n=100), text)

twitteR 返回一个 S4 类,因此您需要使用其辅助函数之一,或者直接处理其插槽.您可以使用 unclass() 来查看插槽,例如:

unclass(searchTwitter("#rstats", n=100)[[1]])

可以像我上面那样通过使用相关函数(来自 twitteR 帮助:?statusSource)直接访问这些插槽:

<块引用>

 text 返回状态文本收藏夹返回状态的收藏夹信息replyToSN 返回此状态的 replyToSN 槽created 检索此状态的创建时间截断返回此状态的截断信息replyToSID 返回此状态的 replyToSID 槽id 返回此状态的 idreplyToUID 返回此状态的 replyToUID 槽statusSource 返回此状态的状态源

正如我所提到的,我的理解是您必须在输出中自己指定这些字段中的每一个.以下是使用两个字段的示例:

<代码>>头(ldply(searchTwitter(#rstats",n = 100),函数(x)data.frame(文本=文本(x),最喜欢的=最喜欢的(x))))文本1 @statalgo 这实际上是如何工作的?它是否在#rstats 和 postgresql 之间共享内存?2 @jaredlander 你看过 PL/R 吗?您可以从 PostgreSQL 调用 #rstats:http://www.joeconway.com/plr/.3 @CMastication 我希望有一种很酷的方式将数据保存在数据库中并运行正常的#rstats.也许是从 R 到 SQL 代码的翻译器.4 在线数据使用分布:AT&T 最近宣布将不再 http://goo.gl/fb/eTywd #rstat5 @jaredlander 不是我所知道的.最近的是 sqldf 包,它允许 #rstats 和 sqlite 共享内存,因此从 DB 传输到 df 很快6 @CMastication #rstats 可以在 DB 中的数据上运行吗?不将其加载到数据帧中或运行 SQL cmds,而是将 DB 视为数据帧收藏了1 错误2 错误3 错误4 错误5 错误6 错误

如果你打算经常这样做,你可以把它变成一个函数.

I am working on saving twitter search results into a database (SQL Server) and am getting an error when I pull the search results from twitteR.

If I execute:

library(twitteR)
puppy <- as.data.frame(searchTwitter("puppy", session=getCurlHandle(),num=100))

I get an error of:

Error in as.data.frame.default(x[[i]], optional = TRUE) : 
  cannot coerce class structure("status", package = "twitteR") into a data.frame

This is important because in order to use RODBC to add this to a table using sqlSave it needs to be a data.frame. At least that's the error message I got:

Error in sqlSave(localSQLServer, puppy, tablename = "puppy_staging",  : 
  should be a data frame

So does anyone have any suggestions on how to coerce the list to a data.frame or how I can load the list through RODBC?

My final goal is to have a table that mirrors the structure of values returned by searchTwitter. Here is an example of what I am trying to retrieve and load:

library(twitteR)
puppy <- searchTwitter("puppy", session=getCurlHandle(),num=2)
str(puppy)

List of 2
 $ :Formal class 'status' [package "twitteR"] with 10 slots
  .. ..@ text        : chr "beautifull and  kc reg Beagle Mix for rehomes: This little puppy is looking for a new loving family wh... http://bit.ly/9stN7V "| __truncated__
  .. ..@ favorited   : logi FALSE
  .. ..@ replyToSN   : chr(0) 
  .. ..@ created     : chr "Wed, 16 Jun 2010 19:04:03 +0000"
  .. ..@ truncated   : logi FALSE
  .. ..@ replyToSID  : num(0) 
  .. ..@ id          : num 1.63e+10
  .. ..@ replyToUID  : num(0) 
  .. ..@ statusSource: chr "&lt;a href=&quot;http://twitterfeed.com&quot; rel=&quot;nofollow&quot;&gt;twitterfeed&lt;/a&gt;"
  .. ..@ screenName  : chr "puppy_ads"
 $ :Formal class 'status' [package "twitteR"] with 10 slots
  .. ..@ text        : chr "the cutest puppy followed me on my walk, my grandma won't let me keep it. taking it to the pound sadface"
  .. ..@ favorited   : logi FALSE
  .. ..@ replyToSN   : chr(0) 
  .. ..@ created     : chr "Wed, 16 Jun 2010 19:04:01 +0000"
  .. ..@ truncated   : logi FALSE
  .. ..@ replyToSID  : num(0) 
  .. ..@ id          : num 1.63e+10
  .. ..@ replyToUID  : num(0) 
  .. ..@ statusSource: chr "&lt;a href=&quot;http://blackberry.com/twitter&quot; rel=&quot;nofollow&quot;&gt;Twitter for BlackBerry®&lt;/a&gt;"
  .. ..@ screenName  : chr "iamsweaters"

So I think the data.frame of puppy should have column names like:

- text
- favorited
- replytoSN
- created
- truncated
- replytoSID
- id
- replytoUID
- statusSource
- screenName

解决方案

Try this:

ldply(searchTwitter("#rstats", n=100), text)

twitteR returns an S4 class, so you need to either use one of its helper functions, or deal directly with its slots. You can see the slots by using unclass(), for instance:

unclass(searchTwitter("#rstats", n=100)[[1]])

These slots can be accessed directly as I do above by using the related functions (from the twitteR help: ?statusSource):

 text Returns the text of the status
 favorited Returns the favorited information for the status
 replyToSN Returns the replyToSN slot for this status
 created Retrieves the creation time of this status
 truncated Returns the truncated information for this status
 replyToSID Returns the replyToSID slot for this status
 id Returns the id of this status
 replyToUID Returns the replyToUID slot for this status
 statusSource Returns the status source for this status

As I mentioned, it's my understanding that you will have to specify each of these fields yourself in the output. Here's an example using two of the fields:

> head(ldply(searchTwitter("#rstats", n=100), 
        function(x) data.frame(text=text(x), favorited=favorited(x))))
                                                                                                                                          text
1                                                     @statalgo how does that actually work? does it share mem between #rstats and postgresql?
2                                   @jaredlander Have you looked at PL/R? You can call #rstats from PostgreSQL: http://www.joeconway.com/plr/.
3   @CMastication I was hoping for a cool way to keep data in a DB and run the normal #rstats off that. Maybe a translator from R to SQL code.
4                     The distribution of online data usage: AT&amp;T has recently announced it will no longer http://goo.gl/fb/eTywd #rstat
5 @jaredlander not that I know of. Closest is sqldf package which allows #rstats and sqlite to share mem so transferring from DB to df is fast
6 @CMastication Can #rstats run on data in a DB?Not loading it in2 a dataframe or running SQL cmds but treating the DB as if it wr a dataframe
  favorited
1     FALSE
2     FALSE
3     FALSE
4     FALSE
5     FALSE
6     FALSE

You could turn this into a function if you intend on doing it frequently.

这篇关于如何将 searchTwitter 结果(来自 library(twitteR))转换为 data.frame?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆