读取多个csv数据并一次创建新列 [英] Read multiple csv data and create new columns at one time

查看:136
本文介绍了读取多个csv数据并一次创建新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件,并且其中有很多csv数据.
我想阅读它们并一次创建新的列,然后合并到一个数据表中.我在这里解释更多.

I have a file and there are many csv data in it.
I want to read them and create new columns at one time and then combine to one datatable. I explain more here.

  1. 看这张照片:

  1. 我想基于csv数据标题创建2个新列YEARMONTH.
    前任.以201508 Sales Report(London)为例.我想创建YEAR = 2015MONTH = 8.

  1. I want to create 2 new columns YEAR and MONTH based on the csv data title.
    ex. Take 201508 Sales Report(London) as an example. I want to create YEAR = 2015 and MONTH = 8.

我不知道该怎么做,但是我可以一次阅读它们而无需创建新列.

I don't know how to do but I can read them at one time without create new columns.

my_read_data <- function(path){  
data <- data.table::fread(path, header = T, strip.white = T, fill = T)  
data <- data[data[[5]] != 0,]  
data <- subset(data, select = c(-1,-7,-10,-12,-13,-14,-15,-17))  
}  
file.list <- dir(path = "//path/", pattern='\\.csv', full.names = T)  
df.list <- lapply(file.list, my_read_data)  
dt <- rbindlist(df.list)    

如何修改我的代码?
实际上,我不确定我的代码是否正确.
感激.

How to modify my code?
Actually I'm not sure whether my code is correct or not.
Appreciate.

感谢@Jaap,我的新代码是:

my_read_data <- function(x){
data <- data.table::fread(x, header = T, strip.white = T, fill = T)
data <- data[data[[5]] != 0,]
data <- subset(data, select = c(-1,-7,-10,-12,-13,-14,-15,-17))
}
file.list <- list.files(path = "/path/", pattern = '*.csv')
dt.list <- sapply(file.list, my_read_data, simplify=FALSE)

但是,我得到一个错误.

However, I get an error.

Error in data.table::fread(x, header = T, strip.white = T, fill = T) :   
File not found: C:\Users\PECHEN\AppData\Local\Temp\RtmpiihFR4\filea0c4d726488   

In addition: Warning messages:
1: running command 'C:\Windows\system32\cmd.exe /c (TWM-201508 Sales Report(London).csv) > C:\Users\PECHEN\AppData\Local\Temp\RtmpiihFR4\filea0c4d726488' had status 1 
2: In shell(paste("(", input, ") > ", tt, sep = "")) :
  '(TWM-201508 Sales Report(London).csv) > C:\Users\PECHEN\AppData\Local\Temp\RtmpiihFR4\filea0c4d726488' execution failed with error code 1  

此外,我编辑代码:

my_read_data <- function(x){
data <- data.table::fread(x, header = T, strip.white = T, fill = T)
data <- data[data[[5]] != 0,]
data <- subset(data, select = c(-1,-7,-10,-12,-13,-14,-15,-17))
}
file.list <- dir(path = "/path/", pattern='\\.csv', full.names = T)  
df.list <- lapply(file.list, my_read_data)  
dt <- rbindlist(df.list, idcol = 'id')[, `:=` (YEAR = substr(id,5,8), MONTH = substr(id,9,10))]   

我使用YEAR = substr(id,5,8), MONTH = substr(id,9,10),因为每个数据标题在数字前都有四个字符.前任. AAA-201508销售报告
但是,它不起作用.
感谢@Peter TW,它可以正常工作.

I use YEAR = substr(id,5,8), MONTH = substr(id,9,10) since each data title has four charater before numbers. ex. AAA-201508Sales Report
However, it doesn't work.
Thanks to @Peter TW, it works.

推荐答案

在我的评论中展开并假设所有文件都具有相同的结构,则应该可以进行以下操作:

Expanding on my comment and supposing that all the files have the same structure, the following should work:

library(data.table)
# get list of file-names
file.list <- list.files(pattern='*.csv')

# read the files with sapply & fread
# this will create a named list of data.tables
dt.list <- sapply(file.list, fread, simplify=FALSE)

# bind the list together to one data.table
# using the 'idcol'-parameter puts the names of the data.tables in the id-column
# create the YEAR & MONTH variables with 'substr'
DT <- rbindlist(dt.list, idcol = 'id')[, `:=` (YEAR = substr(id,1,4), MONTH = substr(id,5,6))]

这将导致一个data.table,其中包含所有数据,并添加了YEARMONTH列.

This will result in one data.table with all the data and a YEAR and MONTH column added.

如果要从文件中排除某些列,可以按以下方式使用freaddrop参数:

If you want to exclude certain columns from the files, you can use the drop-parameter of fread as follows:

dt.list <- sapply(file.list, fread, drop = c(1,7,10,12:15,17), simplify=FALSE)

这篇关于读取多个csv数据并一次创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆