使用R获取文本文件中的行数 [英] Get the number of lines in a text file using R
问题描述
是否可以在不导入文件的情况下获取文件中的行数?
Is there a way to get the number of lines in a file without importing it?
到目前为止,这就是我正在做的
So far this is what I am doing
myfiles <- list.files(pattern="*.dat")
myfilesContent <- lapply(myfiles, read.delim, header=F, quote="\"")
for (i in 1:length(myfiles)){
test[[i]] <- length(myfilesContent[[i]]$V1)
}
但由于每个文件都很大,因此非常耗时.
but is too time consuming since each file is quite big.
推荐答案
如果您:
- 仍然希望避免
system2("wc"…
会引起系统调用 - 在BSD/Linux或OS X上运行(我没有在Windows上测试以下功能)
- 不要介意使用完整的文件名路径
- 使用
inline
软件包很舒服
- still want to avoid the system call that a
system2("wc"…
will cause - are on BSD/Linux or OS X (I didn't test the following on Windows)
- don't mind a using a full filename path
- are comfortable using the
inline
package
然后,以下内容应尽可能快(内联R C函数中的wc
的行数"部分差不多):
then the following should be about as fast as you can get (it's pretty much the 'line count' portion of wc
in an inline R C function):
library(inline)
wc.code <- "
uintmax_t linect = 0;
uintmax_t tlinect = 0;
int fd, len;
u_char *p;
struct statfs fsb;
static off_t buf_size = SMALL_BUF_SIZE;
static u_char small_buf[SMALL_BUF_SIZE];
static u_char *buf = small_buf;
PROTECT(f = AS_CHARACTER(f));
if ((fd = open(CHAR(STRING_ELT(f, 0)), O_RDONLY, 0)) >= 0) {
if (fstatfs(fd, &fsb)) {
fsb.f_iosize = SMALL_BUF_SIZE;
}
if (fsb.f_iosize != buf_size) {
if (buf != small_buf) {
free(buf);
}
if (fsb.f_iosize == SMALL_BUF_SIZE || !(buf = malloc(fsb.f_iosize))) {
buf = small_buf;
buf_size = SMALL_BUF_SIZE;
} else {
buf_size = fsb.f_iosize;
}
}
while ((len = read(fd, buf, buf_size))) {
if (len == -1) {
(void)close(fd);
break;
}
for (p = buf; len--; ++p)
if (*p == '\\n')
++linect;
}
tlinect += linect;
(void)close(fd);
}
SEXP result;
PROTECT(result = NEW_INTEGER(1));
INTEGER(result)[0] = tlinect;
UNPROTECT(2);
return(result);
";
setCMethod("wc",
signature(f="character"),
wc.code,
includes=c("#include <stdlib.h>",
"#include <stdio.h>",
"#include <sys/param.h>",
"#include <sys/mount.h>",
"#include <sys/stat.h>",
"#include <ctype.h>",
"#include <err.h>",
"#include <errno.h>",
"#include <fcntl.h>",
"#include <locale.h>",
"#include <stdint.h>",
"#include <string.h>",
"#include <unistd.h>",
"#include <wchar.h>",
"#include <wctype.h>",
"#define SMALL_BUF_SIZE (1024 * 8)"),
language="C",
convention=".Call")
wc("FULLPATHTOFILE")
作为一个软件包会更好,因为它实际上必须是第一次编译.但是,如果您确实确实需要速度",则可以在这里参考.对于我已经躺着的189,955
行文件,我得到了(来自一堆运行的平均值):
It'd be better as a package since it actually has to compile the first time through. But, it's here for reference if you really do need "speed". For a 189,955
line file I had lying around, I get (mean values from a bunch of runs):
user system elapsed
0.007 0.003 0.010
这篇关于使用R获取文本文件中的行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!