将向量列表转换为数据数据框 [英] Convert List of Vectors into Data Frame of Counts

查看:183
本文介绍了将向量列表转换为数据数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的列表中存储的字符向量列表:

I have a list of character vectors stored in a list like this:

basket1 <- c("Apple", "Orange", "Banana", "Apple", "Apple", "Grape")
basket2 <- c("Grape", "Grape", "Grape", "Grape")
basket3 <- c("Kiwi", "Apple", "Cantaloupe", "Banana")
basket4 <- c("Strawberry")
basket5 <- c("Grape", "Grape", "Grape")
FruitBasketList <- list(basket1, basket2, basket3, basket4, basket5)

我想将$ code> FruitBasketList 转换成数据框,每行中的每个水果的数量与其来自的篮子相匹配。我遇到的主要问题是每个向量中可能会有数千种不同的水果,而且很多都会出现不止一次。

And I would like to turn the FruitBasketList into a data frame with a count of each fruit in each row matching the basket it came from. The main problem I have is that there could be thousands of different "fruits" in each vector and a lot of them will appear more than once.


这是我想要的数据框架:

This is the desired data frame I would like as a result:



Basket  Apple   Orange  Banana  Grape   Kiwi    Cantaloupe  Strawberry
basket1 3       1       1       1       0       0           0
basket2 0       0       0       4       0       0           0
basket3 1       0       1       0       1       1           0
basket4 0       0       0       0       0       0           1
basket5 0       0       0       3       0       0           0

显然,这不是我的真实数据,但我以为我会简化数据的样子,所以任何人都能够理解它。不,这不是功课。无论如何,一篮子里的水果数量可以是千种不同的水果,每个水果矢量的长度也不一样。也可以有成千上万的篮子(矢量)。显然,一些水果可以在同一个载体(篮子)中重复多次。我一直在努力解决这个问题,但我相信这是非常复杂和非常低效的。到目前为止,我的解决方案涉及到所有向量的所有向量,然后确定所有可能的唯一水果名称。那没事了然后我正在努力的部分是从所有这些独特的列名称中创建一个空的数据框架,然后为每个向量计算每个独特的水果,然后将该值放在数据框架中的新行中的正确列中对于不存在于该特定篮子中的水果,零点为零。

Obviously, this isn't my real data, but I thought I would simplify what the data looks like so anyone would be able to understand it. No, this isn't homework. Anyhow, The number of fruits in a basket can be a thousand different fruits and the lengths of each fruit vector wouldn't be the same. There can be tens of thousands of baskets (vectors) as well. Obviously, some fruits could be repeated many times in the same vector (basket). I have been working on solving this, but I'm sure it is terribly over-complicated and very inefficient. So far my solution involves combining all the vectors from all the vectors, then identifying all the unique fruit names that are possible. That worked out fine. Then the part I'm struggling with is creating an empty data frame from out of all of these unique column names, then for each vector counting each unique fruit and then placing that value in the correct column in a new row in the data frame along with zeros for fruits that don't exist in that particular basket.

我用于统计各个向量的代码如下所示:

The code I'm using to tally up individual vectors looks like this:

GetUniqueItemCount <- function(rle, value)
{
  value <- rle$lengths[rle$values == value]
  if (identical(value, integer(0)))
  {
    value <- 0
  }
  value
}

调用它的代码如下所示:

And the code to call it looks like this:

Apple <- GetUniqueItemCount(rle, "Apple") 

正如你可以在我现在的代码中看到的,我必须知道所有手头可能的果实和硬编码每个水果的计数,然后将其分配给数据帧中预先已知的特定列。无论如何,我意识到我在这里错误的路线,所以我会感谢任何建议,回到正轨,得到我想要的数据框架如上所示。请随时提供完全不同的方法,而不是试图弄清楚如何使我的工作,如果这将是解决问题的最佳方法。

As you can see in my current code I have to know all the possible fruits before hand and hard code the count of each fruit and then assign that to a specific column known beforehand in the data frame. Anyhow, I realize I am going down the wrong path here, so I would appreciate any advice on getting back on track to getting my desired data frame shown above. Please feel free to offer a completely different approach instead of trying to figure out how to make mine work if that would be the best way to solve the problem.

推荐答案

我会从qdapTools包中提出 mtabulate

I would suggest mtabulate from the "qdapTools" package.

library(qdapTools)
mtabulate(FruitBasketList)
#   Apple Banana Cantaloupe Grape Kiwi Orange Strawberry
# 1     3      1          0     1    0      1          0
# 2     0      0          0     4    0      0          0
# 3     1      1          1     0    1      0          0
# 4     0      0          0     0    0      0          1
# 5     0      0          0     3    0      0          0

包的作者甚至分享你的头像。 Nifty。

The package's author even shares your avatar. Nifty.

这篇关于将向量列表转换为数据数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆