将具有可变列类型的多个 .csv 文件导入 R [英] Importing multiple .csv files with variable column types into R

查看:28
本文介绍了将具有可变列类型的多个 .csv 文件导入 R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何正确构建一个 lapply 以读取(从一个目录中)所有 .csv 文件,将所有列作为字符串加载,然后将它们绑定到一个数据框中.

How can I properly build an lapply to read (from out of one directory) all the .csv files, load all the columns as strings and then bind them into one data frame.

根据 this,我有一种方法可以将所有 .csv 文件加载并绑定到数据帧中.不幸的是,他们对列如何进行类型转换的可变性感到困惑.因此给了我这个错误:

Per this, I have a way to get all the .csv files loaded and bound into a dataframe. Unfortunately they are getting hung up on the variablity of how the columns are getting type cast. Thus giving me this error:

错误:无法自动从字符转换为整数栏目

Error: Can not automatically convert from character to integer in column

我尝试用 参数补充代码对于数据类型,我试图将所有内容都保留为字符;我现在陷入困境,无法正确地让我的 lapply '循环' 有效地引用其'循环'的每个循环的主题.

I have tried supplementing the code with the arguments for data type and am trying to just keep everything as characters; I am getting stuck now on being able to properly get my lapply 'loop' to effectively reference the subject of each cycle of its 'loop'.

srvy1 <- structure(list(RESPONSE_ID = 584580L, QUESTION_ID = 328L, SURVEY_ID = 2324L, 
           AFF_ID_INV_RESP = 5L), .Names = c("RESPONSE_ID", "QUESTION_ID", 
                                             "SURVEY_ID", "AFF_ID_INV_RESP"), class = "data.frame", row.names = c(NA, 
                                                                                                                  -1L))

srvy2 <- structure(list(RESPONSE_ID = 584580L, QUESTION_ID = 328L, SURVEY_ID = 2324L, 
           AFF_ID_INV_RESP = "bovine"), .Names = c("RESPONSE_ID", "QUESTION_ID", 
                                                   "SURVEY_ID", "AFF_ID_INV_RESP"), class = "data.frame", row.names = c(NA, 
                                                                                                                        -1L))    

files = list.files(pattern="*.csv")
tbl = lapply(files, read_csv(files, col_types = cols(.default = col_character()))) %>% bind_rows

是否有一个简单的解决方法可以让我保留在 tidyverse 中,或者我必须降低一个级别并自己公开构建 for 循环 - 根据 这个.

Is there an easy fix for this that I can keep in tidyverse, or must I drop down a level and go into openly building the for loop myself - per this.

推荐答案

lapply 的形式应该是 lapply(x, FUN, ...) where ... 是传递给 FUN 的参数.您正在 FUN 中填写参数.它应该是 lapply(files, read_csv, col_types = cols(.default = "c"))

The lapply should be the form lapply(x, FUN, ...) where ... is the arguments passed to FUN. You're filling the arguments within FUN. It should be lapply(files, read_csv, col_types = cols(.default = "c"))

如果您喜欢 tidyverse 解决方案:

If you like a tidyverse solution:

files %>%
  map_df(~read_csv(.x, col_types = cols(.default = "c")))

最后将整个事物绑定到一个数据帧中.

Which will bind the whole thing into a data frame at the end.

这篇关于将具有可变列类型的多个 .csv 文件导入 R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆