根据标头开始的位置将CSV读入R [英] Read CSV into R based on where header begins
问题描述
我有大量的CSV文件.有些标头从第一行开始,其他标头从第三行开始,其他标头从第七行,依此类推.
I have a large number of CSV files. Some have the header beginning on the first row, others have the header beginning on the 3rd row, others the 7th and so on.
所有标头看起来都一样,它们只是从不同文件的不同行开始.有没有办法有条件地读取.csv文件以在标头开始的地方开始?
The headers all look the same, they just start on different rows across different files. Is there a way to conditionally read.csv a file to start where the header begins?
例如,如果我知道标题都具有第一列名称"office#",我可以以某种方式指示R在它首次运行到"office#"字段时开始读取csv文件,并将该行视为标头?
For example, if I know the headers all have the first column names "office#", could I somehow instruct R to start reading the csv file when it first runs into the field "office#" and treat that row as the header?
推荐答案
我有 4个CSV文件:
一个表的头开始于第1行(iris.csv)
One table with a header beginning on row 1 (iris.csv)
还有3个表,其表头开始于第3、1、5行(sales_1,sales_2,sales_3)
And 3 tables with headers beginning on rows 3, 1, and 5 (sales_1, sales_2, sales_3)
只要我知道每个表的第一列名称,我就可以使用 smart_csv_reader 函数确定每个标题的开头,并读取每个CSV文件以正确的行号:
As long as I know the first column names of each table, I can use the smart_csv_reader function to determine where each header begins, and read each CSV file at the correct row number:
first_columns <- c('sepal.length', 'month', 'month', 'month')
smart_csv_reader <- function(directory) {
header_begins <- NULL
file_names <- list.files(directory, pattern=".csv$")
for(i in 1:length(file_names)) {
path <- paste(directory, file_names[i], sep='', col='')
lines_read <- readLines(path, warn=F)
header_begins[i] <- grep(first_columns[i], lines_read)
}
print('headers detected on rows:')
print(header_begins)
l <- list()
for(i in 1:length(header_begins)) {
path <- paste(directory, file_names[i], sep='', col='')
l[i] <- list(read.csv(path, skip=header_begins[i]-1))
}
return(l)
}
只需传递所有CSV所在的目录.
Just pass in the directory where all your CSVs are.
用法:
smart_csv_reader('some_csvs/')
[1] "headers detected on rows:"
[1] 1 3 1 5
如您所见,函数为每个表返回正确的行号.它还返回正确读取的每个表的列表:
As you can see the function returns the correct row numbers for each table. It also returns a list of each table read correctly:
这篇关于根据标头开始的位置将CSV读入R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!