添加“文件名"从表到表,因为读取和绑定了多个文件 [英] Add "filename" column to table as multiple files are read and bound

查看:69
本文介绍了添加“文件名"从表到表,因为读取和绑定了多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在多个目录中都有大量的csv文件,我想将其读入R tribble或data.table.我将递归参数设置为TRUE时使用"list.files()"来创建文件名和路径的列表,然后使用"lapply()"来读取多个csv文件,然后"bind_rows()"将它们全部粘贴一起:

I have numerous csv files in multiple directories that I want to read into a R tribble or data.table. I use "list.files()" with the recursive argument set to TRUE to create a list of file names and paths, then use "lapply()" to read in multiple csv files, and then "bind_rows()" stick them all together:

filenames <- list.files(path, full.names = TRUE, pattern = fileptrn, recursive = TRUE)
tbl <- lapply(filenames, read_csv) %>% 
  bind_rows()

这种方法行之有效.但是,我需要从每个文件名中提取一个子字符串,并将其作为一列添加到最终表中.我可以使用"str_extract()"来获取所需的子字符串,如下所示:

This approach works fine. However, I need to extract a substring from the each file name and add it as a column to the final table. I can get the substring I need with "str_extract()" like this:

sites <- str_extract(filenames, "[A-Z]{2}-[A-Za-z0-9]{3}")

但是,我对如何将提取的子字符串添加为列感到困惑,因为lapply()对于每个文件都通过read_csv()运行.

I am stuck however on how to add the extracted substring as a column as lapply() runs through read_csv() for each file.

推荐答案

我通常基于dplyr/tidyr使用以下方法:

I generally use the following approach, based on dplyr/tidyr:

data = tibble(File = files) %>%
    extract(File, "Site", "([A-Z]{2}-[A-Za-z0-9]{3})", remove = FALSE) %>%
    mutate(Data = lapply(File, read_csv)) %>%
    unnest(Data) %>%
    select(-File)

这篇关于添加“文件名"从表到表,因为读取和绑定了多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆