R可变长度向量或变量列表 [英] R varied length vector or list in variable

查看:121
本文介绍了R可变长度向量或变量列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用R为D3可视化准备一些数据.可视化文件是使用以下结构创建的(这是.csv文件的一行,随后将其转换为javascript中的JSON).

I am using R to prepare some data for a D3 visualization. The visualization was created using the following structure (this is a single row from a .csv file that is subsequently converted to JSON in javascript).

Joe.Schmoe, joe.schmoe@email.com, Sao Paulo, ["Community01", "Community02", "Community03"], 
["workgroup01","workgroup02"]

这是一行.标头为:

Person, Email, Location, Communities, Workgroups

您会注意到社区"和工作组"列包含列表.此外,这些列表的长度会有所不同,具体取决于每个人与哪些社区和工作组相关联.我认识到,就数据整洁"而言,这可能不是最佳做法,但这正是这种期望.

You'll notice that the Communities and Workgroup columns contain lists. Furthermore, these lists will vary in length depending on what Communities and Workgroups each individual is associated with. I recognize that this is probably not best practice with regard to data "tidyness," but it is what this viz is expecting.

所以...在R(我正在学习)中,我发现无法重新创建此结构,因为当我尝试填充社区"或工作组"变量时,R似乎期望每个变量的长度相等.

So ... in R (which I'm learning), I'm finding it impossible to recreate this structure because, when I try to populate the "communities" or "workgroups" variables, R seems to expect that each variable will be of equal length.

我拥有的代码是从data.frame中读取的,该数据是特定社区成员的列表,并将该社区的名称添加到所有雇员的master data.frame中的列中.我正在按电子邮件地址编制索引,因为它是唯一的.因此,此特定循环在名为"commTD"的data.frame中查看每个单独的电子邮件地址,并在名为"testr"的主data.frame中找到它.如果找到它,它将查看社区变量,并用社区名称替换NA值(在这种情况下为技术设计"),或者如果矢量已经存在,则将技术设计附加到该变量:

The code that I have is reading from a data.frame which is list of the members of a particular community, and adding the name of that community to a column in a master data.frame of all employees. I'm indexing by email address because it is unique. So this particular loop looks at each individual email address in a data.frame called "commTD" and finds it in a master data.frame called "testr." If it finds it, it looks at the communities variable and either replaces an NA value with the name of the community (in this case "Technical Design"), or if the vector already exists, appends Technical Design to it:

for(i in commTD$email){
    if(i %in% testr$email){
        tmpList <- testr[which(testr$email ==i) , 'communities']

        if(is.na(tmpList)){
            tmpList <- list(c("Technical Design"))
        }

        else{        
            tmpList <- append(tmpList[[1]][1], 'Technical Design')
        }

    testr[which(testr$email ==i) , 'communities'] <- list(tmpList)
    }   
} 

这对于最初的替换工作正常,但是如果我将一个新社区添加到列表中,然后尝试将其传递回测试器data.frame,则会收到错误消息:

This works fine for the initial replacement, but if I append a new community to the list, and then try to pass it back into the testr data.frame, I get an error:

Error in `[<-.data.frame`(`*tmp*`, which(testr$email == i), "communities", 
: replacement has 2 rows, data has 1

您会注意到,我正在尝试创建向量列表,这只是我试图弄清楚这一点的一种方法.我以为也许我可以强迫R将列表视为单个对象,即使它包含多个项目-在这种情况下还是多个项目的向量.

You'll note that I'm trying to create a list of vectors, which is just one way I've tried to figure this out. I thought maybe I could force R to see the list as a single object, even though it contains multiple items -- or in this case a vector of multiple items.

在R中,要在数据帧中将可变长度的向量或列表作为单个变量,这是不可能的吗?

Is this just impossible in R, to have varied length vectors or lists as a single variable in a data frame?

推荐答案

数据帧按照定义是等长向量的列表,因此当您询问是否有可能将其作为类data.frame()时,没有它不是

Data frames are by definition a list of vectors of equal length, so when you ask if this is possible as a class data.frame(), no its not.

您可以按照建议使用另一种类型的对象,例如data.table,或者另一种方法是将所需的输出视为不等矢量的列表,以传递给js.

You could either use as suggested another type of object like data.table, or another way would be to think of your desired output as a list of unequal vectors, to pass to your js.

该对象看起来像:

dataList <- list(name = c("Joe.Schmoe", "Joe.Bloe"),
                 email = c("joe.schmoe@email.com", "joe.bloe@email.com"),
                 location = c("Sao Paulo", "London"),
                 Communities = list(c("Community01", "Community02", "Community03"), 
                                  c("Community02", "Community05", "Community03")
                 ),
                 Workgroups = list(c("workgroup01","workgroup02"), 
                                   c("workgroup01","workgroup03"))
                )

然后像访问数据框一样访问每个字段,以输出到您的js:

Then access each field like a dataframe, for output to your js:

dataList$name
dataList$Communities
etc...

按照弗兰克的建议,如果您想通过电子邮件地址访问每个条目,则可以像这样访问每个条目:

As per Frank's suggestion, if you want to access each entry via the email address, so you can access each entry like this:

data_list[["joe.schmoe@email.com"]]

...然后使用电子邮件名称作为索引来构建列表,如下所示:

...then build the list with the names of the email as the index, like so:

data_list = list(`joe.schmoe@email.com`=list(name="Joe",
                                             location="Sao Paulo",
                                             Communities=....),
                 `joe.bloe@email.com`=list(n‌​ame="Joe", ...)) 

然后,您可以避免使用for()循环的非R样式,并开始使用lapply()函数系列的乐趣,以矢量化的方式处理所有条目. (请参阅?lapply了解详情)

Then, you can avoid the non-R style of using for() loops, and start the fun of the lapply() family of functions to work on all the entries in a vectorised manner. (See ?lapply for details)

希望有帮助.

这篇关于R可变长度向量或变量列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆