是`ls -f |当使用POSIX / Unix系统(大数据)时,grep -c。是目录中最快的方法吗? [英] Is `ls -f | grep -c .` the fastest way to count files in directory, when using POSIX / Unix system (Big Data)?
问题描述
我曾经做过 ls path-to-whatever | wc -l </ code>,直到我发现它实际上消耗了大量的内存。然后我转到
查找路径到任意名称*| wc -l </ code>,这似乎消耗了大量的内存,不管有多少文件。
然后我知道ls大部分由于对结果进行分类,所以速度慢,内存效率低。通过使用 ls -f | grep -c。
,会得到非常快的结果;唯一的问题是文件名可能有换行符。然而,对于大多数用例来说,这是一个非常小的问题。
这是计数文件最快的方法吗?
<编辑/可能的答案:看来,当涉及到大数据,一些版本的LS,发现等已经报告与>八百万文件(需要确认虽然)挂起。为了获得非常大的文件数量(我的猜测是> 22亿),应该使用getdents64系统调用而不是getdent,这可以用大多数支持POSIX标准的编程语言来完成。一些文件系统可能会提供更快的非POSIX方法来计算文件。一种方法是使用 readdir
并对条目进行计数(在一个目录中)。下面我计算常规文件,并使用 d_type == DT_REG
这是有限的操作系统和FSs( man readdir
并看到注释),但你可以注释掉这一行,并计算所有的dir条目:
#include< stdio.h> ;
#include< dirent.h>
$ b $ int main(int argc,char * argv []){
struct dirent * entry;
DIR * dirp;
long long c; // 64位
if(argc <= 1)//需要dir
返回1;
dirp = opendir(argv [1]);
if(dirp == NULL){// dir not found
return 2; ((entry = readdir(dirp))!= NULL){
if(entry-> d_type == DT_REG)
c ++;
// printf(%s \ n,entry-> d_name); //输出文件名
}
printf(%lli\\\
,c);
closedir(dirp);
返回0;
编译并运行:
$ gcc code.c
$ ./a.out〜
254
(我需要清理我的主目录:)
$ b 编辑
我将一个1000000个文件转换成了一个目录,然后运行一个快速比较(最好用户+ sys的5个目录):
$ time ls -f | grep -c。
1000005
real 0m1.771s
user 0m0.656s
sys 0m1.244s
$ time ls -f | wc -l
1000005
real 0m1.733s
user 0m0.520s
sys 0m1.248s
$ time ../a .out。
1000003
real 0m0.474s
user 0m0.048s
sys 0m0.424s
编辑2 :
请注意:
$ time ./a.out testdir | wc -l
1000004
real 0m0.567s
user 0m0.124s
sys 0m0.468s
I used to do ls path-to-whatever| wc -l
, until I discovered, that it actually consumes huge amount of memory. Then I moved to find path-to-whatever -name "*" | wc -l
, which seems to consume much graceful amount of memory, regardless how many files you have.
Then I learned that ls is mostly slow and less memory efficient due to sorting the results. By using ls -f | grep -c .
, one will get very fast results; the only problem is filenames which might have "line breaks" in them. However, that is a very minor problem for most use cases.
Is this the fastest way to count files?
EDIT / Possible Answer: It seems that when it comes to Big Data, some versions of ls, find etc. have been reported to hang with >8 million files (need to be confirmed though). In order to succeed with very large file counts (my guess is > 2.2 billion), one should use getdents64 system call instead of getdents, which can be done with most programming languages, that support POSIX standards. Some filesystems might offer faster non-POSIX methods for counting files.
One way would be to use readdir
and count the entries (in one directory). Below I'm counting regular file and using d_type==DT_REG
which is available for limited OSs and FSs (man readdir
and see NOTES) but you could just comment out that line and count all the dir entries:
#include <stdio.h>
#include <dirent.h>
int main (int argc, char *argv[]) {
struct dirent *entry;
DIR *dirp;
long long c; // 64 bit
if(argc<=1) // require dir
return 1;
dirp = opendir (argv[1]);
if (dirp == NULL) { // dir not found
return 2;
}
while ((entry = readdir(dirp)) != NULL) {
if(entry->d_type==DT_REG)
c++;
// printf ("%s\n", entry->d_name); // for outputing filenames
}
printf ("%lli\n", c);
closedir (dirp);
return 0;
}
Complie and run:
$ gcc code.c
$ ./a.out ~
254
(I need to clean my home dir :)
Edit:
I touched a 1000000 files into a dir and run a quick comparison (best user+sys of 5 presented):
$ time ls -f | grep -c .
1000005
real 0m1.771s
user 0m0.656s
sys 0m1.244s
$ time ls -f | wc -l
1000005
real 0m1.733s
user 0m0.520s
sys 0m1.248s
$ time ../a.out .
1000003
real 0m0.474s
user 0m0.048s
sys 0m0.424s
Edit 2:
As requested in comments:
$ time ./a.out testdir | wc -l
1000004
real 0m0.567s
user 0m0.124s
sys 0m0.468s
这篇关于是`ls -f |当使用POSIX / Unix系统(大数据)时,grep -c。是目录中最快的方法吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!