使用 fread 读取对齐的列数据 [英] Reading aligned column data with fread
问题描述
我遇到了这样的文件:
COL1 COL2 COL3
weqw asrg qerhqetjw
weweg ethweth rqerhwrtjw
rhqerhqerhq qergqer qerhqew5h
qerh qergqer wetjwryerj
我无法直接用 fread
加载它,所以我将 s+
替换为 ,
与 sed
比我交给 fread 并解决了它.但是有没有一种内置的方式来使用 data.table
读取这种数据?
I could not load it directly with fread
so I replaced s+
by ,
with sed
than I gave to fread and it solved it. But is there a built in way of reading this kind of data with data.table
?
推荐答案
fread
(还)没有任何阅读能力 固定宽度文件.
fread
does not (yet) have any capabilities for reading fixed-width files.
我也经常遇到像这样令人讨厌地存储的文件.随意在 Github 页面上添加功能请求.
I, too, often come across files annoyingly stored like this. Feel free to add a feature request on the Github page.
在您的情况下可能不是这样,但是您使用 sed
的解决方案不适用于我遇到的很多 FWF,因为列之间没有空格,例如您会看到像 00010 这样的字符串实际上包含 3 个字段.
It may not be so in your case, but your solution with sed
would not work on a lot of FWF I come across because there's no space between columns, e.g. you'll see strings like 00010 that actually comprise 3 fields.
如果是这种情况,您将需要一个字段宽度字典,此时您有多种选择:
If that's the case, you'll need a field width dictionary, at which point you have several options:
read.fwf
R
中的- 写一个
fwf
->csv
程序(我用的是我用Python
写的,速度挺快的,如果你可以分享代码'd like)--基本上是您最初方法的增强版本,这样您就不必再次处理 FWF - 在 Excel/LibreOffice/等中打开它;有一个本地 FWF 阅读器会尝试(通常很糟糕)猜测列的宽度,这至少完成了为您指定列宽的一半工作.然后,您可以将其另存为 .csv 或其他格式.
read.fwf
withinR
- Write a
fwf
->csv
program (I use one I wrote inPython
and it's pretty fast, could share the code if you'd like)--basically the beefed up version of your initial approach, so that you never have to deal with the FWF again - Open it in Excel / LibreOffice / etc; there's a native FWF reader that tries (usually poorly) to guess the widths of the columns, which at least does half the work of specifying the column widths for you. Then you can save it as .csv or whatever from there.
我个人最常坚持第二种选择.read.fwf
没有像 fread
那样优化,所以它可能会很慢.如果你有很多(比如 20+)的 FWF 要阅读,第 3 个选项就相当乏味了.
I personally stick with the second option most often. read.fwf
is not optimized like fread
so it will probably be slow. And if you've got a lot (say 20+) of FWF to read, the 3rd option is pretty tedious.
但我同意在 fread
中内置这样的东西会很好.
But I agree it would be nice to have something like this built in to fread
.
这篇关于使用 fread 读取对齐的列数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!