用fread计算行而不读取整个文件 [英] Counting rows with fread without reading the whole file

查看:106
本文介绍了用fread计算行而不读取整个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用data.table处理一个很大的文件。
内存不足。
我曾考虑过使用循环(正确地增加skip参数)读取块文件。

I want to use data.table to process a very big file. It doesn't fit on memory. I've thought on reading the file on chunks using a loop with (increasing properly the skip parameter).

fread("myfile.csv", skip=loopindex, nrows=chunksize) 

处理每个这些块

为了正确执行此操作,我需要知道总行数,而无需读取整个文件。

In order to do it properly I need to know the total number of rows, without reading the whole file.

什么是正确/更快的方法?

What's the proper/faster way to do it?

我只能考虑仅阅读第一列,但也许有一个特殊命令或技巧。
或也许有一种自动的方法来检测文件的结尾。

I can ony think in reading only the first column but maybe there is an special command or trick. or maybe there is an automatic way to detect the end of the file.

推荐答案

1)计数.fields 不确定 count.fields 是否一次将整个文件读入R。试试看它是否有效。

1) count.fields Not sure if count.fields reads the whole file into R at once. Try it to see if it works.

length(count.fields("myfile.csv", sep = ","))

如果文件包含标头,请从上面减去一个。

If the file has a header subtract one from the above.

2)sqldf 另一种可能性是:

library(sqldf)
read.csv.sql("myfile.csv", sep = ",", sql = "select count(*) from file")

根据标头等,您可能还需要其他参数。请注意,这根本不会将文件读入R中-仅读入sqlite。

You may need other arguments as well depending on header, etc. Note that this does not read the file into R at all -- only into sqlite.

3)wc 使用系统命令wc,该命令应在R运行的所有平台上可用。

3) wc Use the system command wc which should be available on all platforms that R runs on.

shell("wc -l myfile.csv", intern = TRUE)

或者直接获取文件中的行数

or to directly get the number of lines in the file

read.table(pipe("wc -l myfile.csv"))[[1]]

read.table(text = shell("wc -l myfile.csv", intern = TRUE))[[1]]

再次,如果有标题减去

如果您使用的是Windows,请确保 Rtools 已安装并使用:

If you are on Windows be sure that Rtools is installed and use this:

read.table(pipe("C:\\Rtools\\bin\\wc -l myfile.csv"))[[1]]

或者在没有Rtools的Windows上尝试以下操作:

Alternately on Windows without Rtools try this:

read.table(pipe('find /v /c "" myfile.csv'))[[3]]

请参见如何不计算文本文件中的第几行,然后使用批处理脚本将值存储到变量中?

这篇关于用fread计算行而不读取整个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆