如何将包含多个部分的CSV文件导入R? [英] How to Import a CSV file containing multiple sections into R?
问题描述
我想将csv文件的内容导入R,csv文件垂直包含多个数据部分,用空行和星号分隔。例如
I want to import the contents of a csv file into R, the csv file contains multiple sections of data vertically, seperated by blank lines and asterisks. For example
********************************************************
* SAMPLE DATA ******************************************
********************************************************
Name, DOB, Sex
Rod, 1/1/1970, M
Jane, 5/7/1980, F
Freddy, 9.12,1965, M
*******************************************************
* Income Data ****************************************
*******************************************************
Name, Income
Rod, 10000
Jane, 15000
Freddy, 7500
我想把它作为两个单独的数据帧导入到R中。目前我手动将csv文件切割成较小的文件,但我想我可以使用read.csv和read.csv的skip和nrows设置,如果我可以解决secion断开的地方。
I would like to import this into R as two seperate dataframes. Currently I'm manually cutting the csv file up into smaller files, but I think I could do it using read.csv and the skip and nrows settings of read.csv, If I could work out where the secion breaks are.
这为每个空行提供了一个逻辑TRUE
This gives me a logical TRUE for every blank line
ifelse(readLines("DATA.csv")=="",TRUE,FALSE)
我希望有人有已经解决了这个问题。
I'm hoping someone has already solved this problem.
推荐答案
在这种情况下,我会做类似的事情:
In this case I will do something like:
# Import raw data:
data_raw <- readLines("test.txt")
# find separation line:
id_sep <- which(data_raw=="")
# create ranges of both data sets:
data_1_range <- 4:(id_sep-1)
data_2_range <- (id_sep+4):length(data_raw)
# using ranges and row data import it:
data_1 <- read.csv(textConnection(data_raw[data_1_range]))
data_2 <- read.csv(textConnection(data_raw[data_2_range]))
实际上你的第一个示例集具有不一致的结构,所以 data_1
看起来很奇怪。
Actually your first example set has inconsistent structure so data_1
looks strange.
这篇关于如何将包含多个部分的CSV文件导入R?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!