列表中的数据清理后,绑定列表中的数据帧 [英] Binding dataframes in list after data cleaning on list

查看:116
本文介绍了列表中的数据清理后,绑定列表中的数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是对我的最后一个问题(

This is a follow up on my last question (Rbinding large list of dataframes after I did some data cleaning on the list). I've gotten smarter and the former question got messy.

我有43个xlsx文件,它们已加载到R中的列表中:

I have 43 xlsx files which I loaded in to a list in R:

file.list <- list.files(recursive=T,pattern='*.xlsx')

dat = lapply(file.list, function(i){
x = read_xlsx(i, sheet=1, col_names = T)

# Create column with file name  
x$file = i
# Return data
x
})

然后添加一些列名:

my_names <- c("ID", "UDLIGNNR","BILAGNR", "AKT", "BA",
          "IART", "HTRANS", "DTRANS", "BELOB", "REGD",
          "BOGFD", "VALORD", "UDLIGND", 
          "UÅ", "AFSTEMNGL", "NRBASIS","FIBILAG", "FILE")
dat <- lapply(dat, setNames, my_names)

然后我删除了一些列:

dat <- lapply(dat, function(x) { x["UÅ"] <- NULL; x })
dat <- lapply(dat, function(x) { x["FIBILAG"] <- NULL; x })

我真的不需要删除它们,但是当我尝试合并数据框时,我一直在获取有关这些类的错误.所以我才删除了它们.

I don't really need to remove them, but when I tried to merge the dataframes I kept getting errors about the class of these. So I just removed them.

然后我将所有列更改为字符.我对R有点陌生,所以我知道这段代码不是很性感,您可能为此制作了一个循环或一个函数.但这就是我所做的:

Then I changes all my columns to character. I'm kinda new to R, so I'm aware that this code isn't very sexy and you probably could have made a loop or just one function for this. But this is what I did:

dat <- lapply(dat, function(x) { x["ID"] <- as.character(x["ID"]); x })

[我对所有列都做过同样的事情]

[I did the same for all columns]

然后我去绑定数据.

df <- rbindlist(dat)

我发现问题出在我的不是绑定方法(感谢您对此的投入).我已经删除了有关绑定方法的部分.

I've found that it isn't my binding methods that's the problem (thank you for your inputs on that). I've deleted the part about the binding methods.

问题出在我如何更改列表内数据框中列的共型.

The problem lies in how I change the coltypes on my columns in my dataframes within the list.

我也尝试过:

    dat <- lapply(dat, function(x) { x[,"ID"] <- as.character(x[,"ID"]); x })

我在"ID"之前添加了逗号.这没有帮助.我觉得我需要使用unlist,但是我不确定如何在这里使用它吗?

I added a comma before "ID". This didn't help. I feel like I need to use unlist, but I'm not sure how to use it here?

推荐答案

我找到了解决方案!

谢谢您的帮助!

显然,问题不在于嵌套列表中数据帧的绑定.问题是我以错误的方式更改了列类型.

Apparently the problem wasn't in the binding of the data frames in the nested list. The problem was that I was changing the column types the wrong way.

这是我的代码-它有效!而且它比另一种超级快!

Here's my code - and it works! And it's super faster than the other one!

file.list <- list.files(recursive=T,pattern='*.xlsx')

dat = lapply(file.list, function(i){
x = read_xlsx(i, sheet=1, col_names = T)

# Create column with file name  
x$file = i
# Return data
x
})

# Setting column names
my_names <- c("ID", "UDLIGNNR","BILAGNR", "AKT", "BA",
          "IART", "HTRANS", "DTRANS", "BELOB", "REGD",
          "BOGFD", "VALORD", "UDLIGND", 
          "UÅ", "AFSTEMNGL", "NRBASIS","FIBILAG", "FILE")

dat <- lapply(dat, setNames, my_names)

# Removing problematic columns
dat <- lapply(dat, function(x) { x["UÅ"] <- NULL; x })
dat <- lapply(dat, function(x) { x["FIBILAG"] <- NULL; x })


dat2 <- lapply(dat, function(df) setDT(df)[, (1:16) := lapply(.SD, as.character), .SDcols = 1:16])

# Merging
df <- rbindlist(dat2)

哦,还有几个人告诉我改用bind_rows(@atomman和@Probel)

Oh and several people told me to use bind_rows instead (@atomman and @Probel)

我想赞扬我偷走了第一部分的人,但我不记得了...

And I want to give credit to the person whom I stole the first part from, but I can't remember...

这篇关于列表中的数据清理后,绑定列表中的数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆