跳过和自动启动在fread [英] skip and autostart in fread

查看:157
本文介绍了跳过和自动启动在fread的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用以下代码读取data.table库中的文件:

I am using the following code to read a file with the data.table library:

fread(myfile, header=FALSE, sep=",", skip=100, colClasses=c("character","numeric","NULL","numeric"))

但我得到以下错误:

The supplied 'sep' was not found on line 80. To read the file as a single character column set sep='\n'.

它说它没有在第80行找到sep,但是我设置skip = 100,注意前100行。

It says it did not find sep on line 80, however I set skip=100 so it should not pay attention to the first 100 lines.

更新:
我尝试了skip = 101,它工作,但它跳过第一

UPDATE: I tried with skip=101 and it worked but it skips the first line where the data starts

我使用版本1.9.2的data.table包和R版本3.02 64位在Windows 7上

I am using version 1.9.2 of the data.table package and R version 3.02 64 bit on windows 7

推荐答案

我们不知道您使用的版本号,但我可以在这种情况下猜测。

We don't know the version number you're using, but I can make a guess in this case.

尝试设置 autostart = 101

请注意?fread :


一旦在 autostart ,确定列数。然后从 autostart 向后搜索该文件,直到找到没有该列数的行。因此,找到第一数据行,并且自动跳过任何人类可读横幅。此功能对于加载一组可能并非都具有一致大小的横幅的文件特别有用。通过设置 autostart = skip + 1 并关闭向上搜索步骤,设置 skip> 0 p>

Once the separator is found on line autostart, the number of columns is determined. Then the file is searched backwards from autostart until a row is found that doesn't have that number of columns. Thus, the first data row is found and any human readable banners are automatically skipped. This feature can be particularly useful for loading a set of files which may not all have consistently sized banners. Setting skip>0 overrides this feature by setting autostart=skip+1 and turning off the search upwards step.

跳过参数有:


如果-1(默认)使用下面描述的过程,从行自动启动开始查找第一个数据行。 skip> = 0表示忽略自动启动,并将跳过+ 1作为第一个数据行(或根据header =auto| TRUE | FALSE的列名称)。 skip =string在文件中搜索string(例如,列名称行的子字符串),并从该行开始(灵感来自gdata包中的read.xls)。

If -1 (default) use the procedure described below starting on line autostart to find the first data row. skip>=0 means ignore autostart and take line skip+1 as the first data row (or column names according to header="auto"|TRUE|FALSE as usual). skip="string" searches for "string" in the file (e.g. a substring of the column names row) and starts on that line (inspired by read.xls in package gdata).

autostart 参数有:


默认情况下,机器可读分隔文本区域中的任何行号30.如果文件较短或该行为空(例如,具有尾随空白行的短文件),则最后一个非空行以上)。此行及其上面的行用于自动检测sep,sep2和字段数。我们希望自动启动不太可能需要更改。

Any line number within the region of machine readable delimited text, by default 30. If the file is shorter or this line is empty (e.g. short files with trailing blank lines) then the last non empty line (with a non empty line above that) is used. This line and the lines above it are used to auto detect sep, sep2 and the number of fields. It's extremely unlikely that autostart should ever need to be changed, we hope.

在您的情况下,人类可读的标题比30行,这就是为什么我猜想设置 autostart = 101 可能工作。无需使用跳过

In your case perhaps the human readable header is much larger than 30 rows, which is why I guess setting autostart=101 might work. No need to use skip.

一个动机是为了方便,当文件包含多个表。通过将 autostart 设置为要从表中删除文件的任何行,它会自动找到第一个数据行和标题行,然后读取只是那个表。您不必担心在数据开始时获取确切的行号,就像使用 skip 一样。 fread 目前只能读取一个表。它可以从单个文件中返回一个表的列表,但是这有点复杂,没有人要求。

One motivation is for convenience when a file contains multiple tables. By setting autostart to any row inside the table that you want to pluck out of the file, it'll find the first data row and header row for you automatically, and then read just that table. You don't have to worry about getting the exact line number at the start of data like you do with skip. fread can only read one table currently. It could feasibly return a list of tables from a single file, but that's getting a bit complicated and nobody has asked for that.

这篇关于跳过和自动启动在fread的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆