根据标头开始的位置将CSV读入R [英] Read CSV into R based on where header begins

查看:69
本文介绍了根据标头开始的位置将CSV读入R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大量的CSV文件.有些标头从第一行开始,其他标头从第三行开始,其他标头从第七行,依此类推.

I have a large number of CSV files. Some have the header beginning on the first row, others have the header beginning on the 3rd row, others the 7th and so on.

所有标头看起来都一样,它们只是从不同文件的不同行开始.有没有办法有条件地读取.csv文件以在标头开始的地方开始?

The headers all look the same, they just start on different rows across different files. Is there a way to conditionally read.csv a file to start where the header begins?

例如,如果我知道标题都具有第一列名称"office#",我可以以某种方式指示R在它首次运行到"office#"字段时开始读取csv文件,并将该行视为标头?

For example, if I know the headers all have the first column names "office#", could I somehow instruct R to start reading the csv file when it first runs into the field "office#" and treat that row as the header?

推荐答案

我有 4个CSV文件:

一个表的头开始于第1行(iris.csv)

One table with a header beginning on row 1 (iris.csv)

还有3个表,其表头开始于第3、1、5行(sales_1,sales_2,sales_3)

And 3 tables with headers beginning on rows 3, 1, and 5 (sales_1, sales_2, sales_3)

只要我知道每个表的第一列名称,我就可以使用 smart_csv_reader 函数确定每个标题的开头,并读取每个CSV文件以正确的行号:

As long as I know the first column names of each table, I can use the smart_csv_reader function to determine where each header begins, and read each CSV file at the correct row number:

first_columns <- c('sepal.length', 'month', 'month', 'month')

smart_csv_reader <- function(directory) {
    header_begins <- NULL
    file_names <- list.files(directory, pattern=".csv$")
    for(i in 1:length(file_names)) {
        path <- paste(directory, file_names[i], sep='', col='')
        lines_read <- readLines(path, warn=F)
        header_begins[i] <- grep(first_columns[i], lines_read)
    } 
    print('headers detected on rows:')
    print(header_begins)
    l <- list()
    for(i in 1:length(header_begins)) {
        path <- paste(directory, file_names[i], sep='', col='')
        l[i] <- list(read.csv(path, skip=header_begins[i]-1))   
    }
    return(l)
}

只需传递所有CSV所在的目录.

Just pass in the directory where all your CSVs are.

用法:

smart_csv_reader('some_csvs/')

[1] "headers detected on rows:"
[1] 1 3 1 5

如您所见,函数为每个表返回正确的行号.它还返回正确读取的每个表的列表:

As you can see the function returns the correct row numbers for each table. It also returns a list of each table read correctly:

这篇关于根据标头开始的位置将CSV读入R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆