如何根据列名称拆分数据帧列表? [英] How to split a list of data frames based on its column names?

查看:91
本文介绍了如何根据列名称拆分数据帧列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个600多个数据帧的列表,这些数据帧没有完全相同的结构(列名,列的顺序和变量的类型).我需要做的是识别出那些数据框中哪些没有所需的结构并对其进行修改,以便我可以将所有数据用于不同的目的(汇总,分析等).

I have a list of more than 600 data frames, which doesn't have the same exact structure (column names, the order of the columns and the type of variable). What I need to do is to identify which of those data frames do not have the desired structure and modify it so I can work with all data for different purposes (summarize, analyses, etc).

我试图根据所需的名称和列的顺序从主列表中创建两个列表.为此,我尝试执行以下操作:

I am trying to create two lists from the main one based on the desired names and order of the columns. For that I am trying to do the following:

# some random dfs for the example
v1 <- c(1:15)
v2 <- c(20:34)
v3 <- c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o")
v3b <- c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o")

df1 <- data.frame(v1, v2, v3)
df2 <- data.frame(v1, v2, v3)
df3 <- data.frame(v1, v2, v3b)

mylist <- list(df1, df2, df3)

names <- colnames(mylist[[1]]) #remember I have over 600 dfs in the original list
listA <- list()
listB <- list()

#I suppose this piece of code should work    
colnames(mylist[[1]]) == names
colnames(mylist[[2]]) == names
colnames(mylist[[3]]) == names

for (k in 1:length(mylist)){
  if(colnames(mylist[[k]]) == names){
    listA[[k]] <- mylist[[k]]
  }else{
    listB[[k]] <- mylist[[k]]
  }
}

现在的问题是,带有条件语句的循环会生成一个包含所有数据帧的列表和另一个空列表.它还会生成以下警告:

Now the problem is that the loop with the conditional statements generates a list with all the data frames and a second empty list. It also generates the following warning:

1:如果if(colnames(mylist [[k]])==名称){: 条件的长度> 1,并且只会使用第一个元素

1: In if (colnames(mylist[[k]]) == names) { : the condition has length > 1 and only the first element will be used

我已经阅读并在堆栈流中查找了大量内容来解决此问题,但我感到无助...

I have read and looked a lot in stack flow to solve this problem but I feel helpless...

有人知道代码有什么问题吗? 更重要的是,这是否是一种适当的方式来根据我的名字来分隔我的数据帧列表?还是有更好的名字?

Does anybody know what's wrong with the code? More importantly, is this an appropriate way to split my list of data frames based on the colnames or there are better ones?

推荐答案

创建名称,方法是将名称与match()匹配,然后使用split().

Create groups that you get by matching the names with match(), then use split().

f <- sapply(mylist, function(x) length(na.omit(match(names(x), names))))
listNew <- setNames(split(mylist, f), c("listB", "listA"))

屈服

Yielding

> str(listNew)
List of 2
 $ listB:List of 1
  ..$ :'data.frame':    15 obs. of  3 variables:
  .. ..$ v1 : int [1:15] 1 2 3 4 5 6 7 8 9 10 ...
  .. ..$ v2 : int [1:15] 20 21 22 23 24 25 26 27 28 29 ...
  .. ..$ v3b: Factor w/ 15 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ listA:List of 2
  ..$ :'data.frame':    15 obs. of  3 variables:
  .. ..$ v1: int [1:15] 1 2 3 4 5 6 7 8 9 10 ...
  .. ..$ v2: int [1:15] 20 21 22 23 24 25 26 27 28 29 ...
  .. ..$ v3: Factor w/ 15 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
  ..$ :'data.frame':    15 obs. of  3 variables:
  .. ..$ v1: int [1:15] 1 2 3 4 5 6 7 8 9 10 ...
  .. ..$ v2: int [1:15] 20 21 22 23 24 25 26 27 28 29 ...
  .. ..$ v3: Factor w/ 15 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...

这篇关于如何根据列名称拆分数据帧列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆