计算复杂文件夹结构中每个文件夹的文件数? [英] Compute number of files per folder in a complex folder structure?

查看:151
本文介绍了计算复杂文件夹结构中每个文件夹的文件数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我通过导入其中包含文件的文件夹结构创建了一个简单的data.tree.

I have created a simple data.tree through importing a folder structure with files inside of it.

if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/pathr")

library(pathr)
library(data.tree)

folder_structure <- pathr::tree(path = "/Users/username/Downloads/top_level/",
 use.data.tree = T, include.files = T)

现在,我想将对象folder_structure转换为data.frame,每个文件夹有一行,并指定了每个文件夹包含多少个文件的列.我该怎么做?

Now, I would like to convert the object folder_structure into a data.frame with one row per folder and a column that specifies how many files each folder contains. How can I accomplish this?

例如,我有一个非常简单的文件夹结构:

For example, I have this very simply folder structure:

top_level_folder
    sub_folder_1
        file1.txt
    sub_folder_2
        file2.txt

回答问题将涉及创建如下所示的输出:

Answering the question would involve creating an output that looks like this:

Folders             Files
top_level_folder    0
sub_folder_1        1
sub_folder_2        1

第一列可以简单地通过调用list.dirs("/Users/username/Downloads/top_level/")生成,但是我不知道如何生成第二列.请注意,第二列是非递归的,这意味着不计算子文件夹中的文件(即top_level_folder包含0文件,即使top_level_folder的子文件夹包含2个文件).

The first column can simply be generated through calling list.dirs("/Users/username/Downloads/top_level/"), but I don't know how to generate the second column. Note that the second column is non-recursive, meaning that files within subfolders are not counted (i.e. top_level_folder contains 0 files, even though the subfolders of top_level_folder contains 2 files).

如果要查看解决方案是否可扩展,请下载Rails代码库: https://github.com/rails/rails/archive/master.zip 并在Rails更为复杂的文件结构上进行尝试.

If you want to see whether your solution scales or not, download the Rails codebase: https://github.com/rails/rails/archive/master.zip and try it on Rails' more complex file structure.

推荐答案

list.dirs()提供一个从起始文件夹可访问的每个子目录的向量,以便处理数据帧的第一列.非常方便.

list.dirs() provides a vector of every subdirectory reachable from a starting folder, so that handles the first column of your data-frame. Very convenient.

# Get a vector of all the directories and subdirectories from this folder
dir <- "."
xs <- list.dirs(dir, recursive = TRUE)

list.files()可以告诉我们每个文件夹的内容,但其中包括文件和文件夹.我们只想要文件.要获取文件数,我们需要使用谓词过滤list.files()的输出. file.info()可以告诉我们给定文件是否为目录,因此我们可以以此为基础建立谓词.

list.files() can tell us the contents of each of those folders, but it includes files and folders. We just want the files. To get the count of files, we need to filter the output of list.files() with a predicate. file.info() can tell us whether a given file is a directory or not, so we build our predicate from that.

# Helper to check if something is folder or file
is_dir <- function(x) file.info(x)[["isdir"]]
is_file <- Negate(is_dir)

现在,我们解决了如何获取单个文件夹中文件的数量.布尔值的总和返回TRUE个案例的数量.

Now, we solve how to get the number of files in a single folder. Summing boolean values returns the number of TRUE cases.

# Count the files in a single folder
count_files_in_one_dir <- function(dir) {
  files <- list.files(dir, full.names = TRUE)
  sum(is_file(files))
}

为方便起见,我们包装了该功能以使其可以在许多文件夹中使用.

For convenience, we wrap that function to make it work on many folders.

# Vectorized version of the above
count_files_in_dir <- function(dir) {
  vapply(dir, count_files_in_one_dir, numeric(1), USE.NAMES = FALSE)
}

现在我们可以计算文件了.

Now we can count the files.

df <- tibble::data_frame(
  dir = xs,
  nfiles = count_files_in_dir(xs))

df
#> # A tibble: 688 x 2
#>                                                  dir nfiles
#>                                                <chr>  <dbl>
#>  1                                                 .     11
#>  2                                         ./.github      3
#>  3                                     ./actioncable      7
#>  4                                 ./actioncable/app      0
#>  5                          ./actioncable/app/assets      0
#>  6              ./actioncable/app/assets/javascripts      1
#>  7 ./actioncable/app/assets/javascripts/action_cable      5
#>  8                                 ./actioncable/bin      1
#>  9                                 ./actioncable/lib      1
#> 10                    ./actioncable/lib/action_cable      8
#> # ... with 678 more rows

这篇关于计算复杂文件夹结构中每个文件夹的文件数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆