使用rbindlist时如何按数据集添加索引? [英] How to add a index by set of data when using rbindlist?

查看:66
本文介绍了使用rbindlist时如何按数据集添加索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个具有相同结构的不同csv文件.我使用fread将它们读入R,然后使用rbindlist()将它们合并为更大的数据集.

I have several different csv files with the same structure. I read them into R using fread, and then union them into a bigger dataset using rbindlist().

files <- list.files( pattern = "*.csv" );
x2csv <- rbindlist( lapply(files, fread, stringsAsFactors=FALSE), fill = TRUE )

代码工作正常.但是,我想添加一列填充数字以指示观察结果来自哪个csv文件.例如,输出应为:

The code works weel. However, I would like to add a column filled with numbers to indicate from which csv file that observation came from. For exemple, the output should be:

       V1        V2         V3  C1
   1:   0 0.2859163 0.55848521   1
   2:   1 1.1616298 0.87571349   1 
   3:   2 2.1122510 0.95062116   2 
   4:   3 2.6832013 0.57095035   2
   5:   4 2.9117493 0.22854804   2 
   6:   5 2.9886040 0.07685464   3

其中C1是新的索引列,它表明:第一个和第二个观察值来自files [1](第一个.csv文件);第三和第四观察来自文件[1](第一个.csv文件);等等.

where C1 is the new index column telling that: the first and second observations come from files[1] (the first .csv file); the third and fourth observation come from files[1] (the first .csv file); and so on.

推荐答案

这是尼古拉斯的答案的增强版本会添加文件名而不是数字:

This is an enhanced version of Nicolás' answer which adds the file names instead of numbers:

x2csv <- rbindlist(lapply(files, fread), idcol = "origin")
x2csv[, origin := factor(origin, labels = basename(files))]

  • fread()默认情况下使用stringsAsFactors = FALSE,因此我们可以保存一些击键
  • 仅当我们要读取结构不同的文件(例如位置,名称或列数不同)的文件时,才需要fill = TRUE
  • 可以命名id col(默认为.id),并使用list元素的序列号填充.
  • 然后,此数字转换为一个因子,其级别用文件名标记.文件名可能不只是一个数字,而是更容易记住. basename()删除文件名中的路径.
    • fread() uses stringsAsFactors = FALSE by default so we can save some keystrokes
    • Also fill = TRUE is only required if we want to read files with differing structure, e.g., differing position, name, or number of columns
    • The id col can be named (the default is .id) and is populated with the sequence number of the list element.
    • Then, this number is converted into a factor whose levels are labeled with the file names. A file name might be easier to remember than just a mere number. basename() strips the path off the file name.
    • 这篇关于使用rbindlist时如何按数据集添加索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆