如何从子目录导入文件并使用子目录名称R命名它们 [英] How to import files from subdirectories and name them with subdirectory name R

查看:61
本文介绍了如何从子目录导入文件并使用子目录名称R命名它们的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从子目录递归导入文件(不同长度),并将它们放入一个data.frame中,其中一列带有子目录名称,一列带有文件名(减去扩展名):

 例如文件夹结构
IsolatedData
00
tap-4.out
cl_pressure.out
15
tap-4.out
cl_pressure.out

到目前为止,我有:

  setwd(〜/ Documents / IsolatedData)
l<-list.files(pattern = .out $,recursive = TRUE)
p<-bind_rows(lapply (1:length(l),function(i){chars;-strsplit(l [i], /);
cbind(data.frame(Pressure = read.table(l [i],标头= FALSE,跳过= 2,行= length(readLines(l [i])))),
角度=字符[[1]] [1],位置=字符[[1]] [1] )}),.id = id)

但是我收到一个错误,说第43行没有t有2个元素。



还使用dplyr看到了这一元素,它看起来很整洁,但我无法使用它: http://www.machinegurning.com/rstats/map_df/

  tbl<-
list.files(recursive = T,pattern =。out $)%>%
map_df(〜data _frame(x = .x),. id = id)


解决方案

这是带有tidyverse中 purrr 中的 map 函数的工作流。



我生成了一堆csv文件,可用来模拟您的文件结构和一些简单的数据。我在每个文件的开头添加了两行垃圾数据,因为您说过您试图跳过前两行。

  library(tidyverse)

setwd(〜/ _R / SO / nested)

步行(paste0( folder,1:3),dir.create)

list.files()%&%;%
walk(function(folderpath){
map( 1:4,function(i){
df<-tibble(
x1 = sample(letters [1:3],10,replace = T),
x2 = rnorm(10 )

虚拟<-tibble(
x1 = c(垃圾线1,垃圾线2),
x2 = c(0)

bind_rows(dummy,df)%>%
write_csv(sprintf(%s / file%s.out,folderpath,i))
})
})

这将得到以下文件结构:

 ├──folder1 
| ├──file1.out
| ├──file2.out
| ├──file3.out
| └──file4.out
├──文件夹2
| ├──file1.out
| ├──file2.out
| ├──file3.out
| └──file4.out
└──folder3
├──file1.out
├──file2.out
├──file3.out
└─ ─file4.out

然后我使用 list.files(recursive = T) 获取这些文件的路径列表,使用 str_extract 提取文件夹的文本和文件名,读取跳过的csv文件



自从我使用 map_dfr进行了操作 code>,我回头看看,其中每次迭代的数据帧都全部 rbind 在一起。



< pre class = lang-r prettyprint-override> all_data<-list.files(递归= T)%>%
map_dfr(function(path){
#从路径开头到/
文件夹名<-str_extract(path, ^。+(?= /)))的任何字符
#在/和.out之间的任何字符
文件名<-str_extract(路径,(?< = /)。+(?= \\.out $))

#跳过= 3跳过名称和前2行
#可以使用col_names = c( x1, x2)
read_csv(路径,跳过= 3,col_names = F)%>%
mutate(文件夹=文件夹名,文件=文件名)
})

head(all_data)
#> #小动作:6 x 4
#> X1 X2文件夹文件
#> < chr> < dbl> < chr> < chr>
#> 1 b 0.858 folder1 file1
#> 2 b 0.544 folder1 file1
#> 3 a -0.180 folder1 file1
#> 4 b 1.14 folder1 file1
#> 5 b 0.725 folder1 file1
#> 6 c 1.05 folder1 file1

reprex包(v0.2.0)。


I'd like to import files (of different lengths) recursively from sub-directories and put them into one data.frame, having one column with the subdirectory name and one column with the file name (minus the extension):

e.g. folder structure
IsolatedData
  00
    tap-4.out
    cl_pressure.out
  15
    tap-4.out
    cl_pressure.out

So far I have:

setwd("~/Documents/IsolatedData")
l <- list.files(pattern = ".out$",recursive = TRUE)
p <- bind_rows(lapply(1:length(l), function(i) {chars <- strsplit(l[i], "/");
cbind(data.frame(Pressure = read.table(l[i],header = FALSE,skip=2, nrow =length(readLines(l[i])))),
      Angle = chars[[1]][1], Location = chars[[1]][1])}), .id = "id")

But I get an error saying line 43 doesn't have 2 elements.

Also seen this one using dplyr which looks neat but I can't get it to work: http://www.machinegurning.com/rstats/map_df/

tbl <-
  list.files(recursive=T,pattern=".out$")%>% 
  map_df(~data_frame(x=.x),.id="id")

解决方案

Here's a workflow with the map functions from purrr within the tidyverse.

I generated a bunch of csv files to work with to mimic your file structure and some simple data. I threw in 2 lines of junk data at the beginning of each file, since you said you were trying to skip the top 2 lines.

library(tidyverse)

setwd("~/_R/SO/nested")

walk(paste0("folder", 1:3), dir.create)

list.files() %>%
    walk(function(folderpath) {
        map(1:4, function(i) {
            df <- tibble(
                x1 = sample(letters[1:3], 10, replace = T),
                x2 = rnorm(10)
            )
            dummy <- tibble(
                x1 = c("junk line 1", "junk line 2"),
                x2 = c(0)
            )
            bind_rows(dummy, df) %>%
                write_csv(sprintf("%s/file%s.out", folderpath, i))
        })
    })

That gets the following file structure:

├── folder1
|  ├── file1.out
|  ├── file2.out
|  ├── file3.out
|  └── file4.out
├── folder2
|  ├── file1.out
|  ├── file2.out
|  ├── file3.out
|  └── file4.out
└── folder3
   ├── file1.out
   ├── file2.out
   ├── file3.out
   └── file4.out

Then I used list.files(recursive = T) to get a list of the paths to these files, use str_extract to pull text for the folder and file name for each, read the csv file skipping the dummy text, and add the folder and file names so they'll be added to the dataframe.

Since I did this with map_dfr, I get a tibble back, where the dataframes from each iteration are all rbinded together.

all_data <- list.files(recursive = T) %>%
    map_dfr(function(path) {
        # any characters from beginning of path until /
        foldername <- str_extract(path, "^.+(?=/)")
        # any characters between / and .out at end
        filename <- str_extract(path, "(?<=/).+(?=\\.out$)")

        # skip = 3 to skip over names and first 2 lines
        # could instead use col_names = c("x1", "x2")
        read_csv(path, skip = 3, col_names = F) %>%
            mutate(folder = foldername, file = filename)
    })

head(all_data)
#> # A tibble: 6 x 4
#>   X1        X2 folder  file 
#>   <chr>  <dbl> <chr>   <chr>
#> 1 b      0.858 folder1 file1
#> 2 b      0.544 folder1 file1
#> 3 a     -0.180 folder1 file1
#> 4 b      1.14  folder1 file1
#> 5 b      0.725 folder1 file1
#> 6 c      1.05  folder1 file1

Created on 2018-04-21 by the reprex package (v0.2.0).

这篇关于如何从子目录导入文件并使用子目录名称R命名它们的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆