如何读取选项卡分隔的文件到data.table使用fread? [英] How to read tab separated file into data.table using fread?

查看:158
本文介绍了如何读取选项卡分隔的文件到data.table使用fread?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

示例数据( emp.data

Beth  4.00  0
Dan   3.75  0
Kathy 4.00  10
Mark  5.00  20
Mary  5.50  22
Susie 4.25  18

我可以使用 read.table 将其读入 data.frame $ c>,然后将其转换为 data.table

I can read it into a data.frame using read.table, then convert it to data.table:

library(data.table)
df <- read.table("emp.data", col.names = c("Name", "PayRate", "HoursWorked"))
DT <- as.data.table(df, key = HoursWorked)

计算工资:

DT[HoursWorked > 0, .(Name, Pay = PayRate * HoursWorked),]

    Name   Pay
1: Kathy  40.0
2:  Mark 100.0
3:  Mary 121.0
4: Susie  76.5

然而,我认为转换有一个额外的步骤。由于在 data.table 中有 fread(),为什么不直接使用?

That works fine; however, I consider there's an extra step in converting. Since there's fread() in data.table, why not use it directly?

readDT <- fread("emp.data", header=FALSE, sep="\t")

               V1
1:  Beth  4.00  0
2:  Dan   3.75  0
3: Kathy 4.00  10
4: Mark  5.00  20
5: Mary  5.50  22
6: Susie 4.25  18

 str(readDT)
Classes 'data.table' and 'data.frame':  6 obs. of  1 variable:
 $ V1: chr  "Beth  4.00  0" "Dan   3.75  0" "Kathy 4.00  10" "Mark  5.00  20" ...
 - attr(*, ".internal.selfref")=<externalptr> 

数据被识别为一列;显然这不行。

The data is recognized as one column; obviously this doesn't work.

问题

如何读取此数据正确使用 fread()? (如果可能,也设置列名。)

How to read this data using fread() properly? (If possible, set the column names as well.)

推荐答案

这最近在devel版本v1.9.5 (将很快在CRAN上以v1.9.6提供):

This has been fixed recently in the devel version, v1.9.5 (will be soon available on CRAN as v1.9.6):

require(data.table) # v1.9.5+
fread("~/Downloads/tmp.txt")
#       V1   V2 V3
# 1:  Beth 4.00  0
# 2:   Dan 3.75  0
# 3: Kathy 4.00 10
# 4:  Mark 5.00 20
# 5:  Mary 5.50 22
# 6: Susie 4.25 18

有关详细信息,请参阅项目页面中的 README.md fread 获得 strip.white 参数(在其他功能/错误修复之间) TRUE 。

See README.md in the project page for more info. fread gained strip.white argument (amidst other functionalities / bug fixes) which is by default TRUE.

更新: col.names 参数now:

fread("~/Downloads/tmp.txt", col.names = c("Name", "PayRate", "HoursWorked"))
#     Name PayRate HoursWorked
# 1:  Beth    4.00           0
# 2:   Dan    3.75           0
# 3: Kathy    4.00          10
# 4:  Mark    5.00          20
# 5:  Mary    5.50          22
# 6: Susie    4.25          18

这篇关于如何读取选项卡分隔的文件到data.table使用fread?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆