R read.csv“比列名更多的列”错误 [英] R read.csv "More columns than column names" error
问题描述
将 .csv
文件导入R时出现问题。我的代码:
I have a problem when importing .csv
file into R. With my code:
t <- read.csv("C:\\N0_07312014.CSV", na.string=c("","null","NaN","X"),
header=T, stringsAsFactors=FALSE,check.names=F)
R报告错误并且没有做我想要的:
R reports an error and does not do what I want:
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names
我想问题是因为我的数据格式不正确。我只需要来自 [,1:32]
的数据。所有其他数据都应删除。
I guess the problem is because my data is not well formatted. I only need data from [,1:32]
. All others should be deleted.
数据可以从以下网址下载:
https:// drive。 google.com/file/d/0B86_a8ltyoL3VXJYM3NVdmNPMUU/edit?usp=sharing
Data can be downloaded from: https://drive.google.com/file/d/0B86_a8ltyoL3VXJYM3NVdmNPMUU/edit?usp=sharing
非常感谢!
推荐答案
这是一个不稳定的CSV f ile。多个标题被抛出(尝试将其粘贴到 CSV指纹)以了解我的意思。
That's one wonky CSV file. Multiple headers tossed about (try pasting it to CSV Fingerprint) to see what I mean.
由于我不知道数据,因此不可能确定以下内容为您生成准确的结果,但它涉及使用 readLines
和其他R函数预处理文本:
Since I don't know the data, it's impossible to be sure the following produces accurate results for you, but it involves using readLines
and other R functions to pre-process the text:
# use readLines to get the data
dat <- readLines("N0_07312014.CSV")
# i had to do this to fix grep errors
Sys.setlocale('LC_ALL','C')
# filter out the repeating, and wonky headers
dat_2 <- grep("Node Name,RTC_date", dat, invert=TRUE, value=TRUE)
# turn that vector into a text connection for read.csv
dat_3 <- read.csv(textConnection(paste0(dat_2, collapse="\n")),
header=FALSE, stringsAsFactors=FALSE)
str(dat_3)
## 'data.frame': 308 obs. of 37 variables:
## $ V1 : chr "Node 0" "Node 0" "Node 0" "Node 0" ...
## $ V2 : chr "07/31/2014" "07/31/2014" "07/31/2014" "07/31/2014" ...
## $ V3 : chr "08:58:18" "08:59:22" "08:59:37" "09:00:06" ...
## $ V4 : chr "" "" "" "" ...
## .. more
## $ V36: chr "" "" "" "" ...
## $ V37: chr "0" "0" "0" "0" ...
# grab the headers
headers <- strsplit(dat[1], ",")[[1]]
# how many of them are there?
length(headers)
## [1] 32
# limit it to the 32 columns you want (Which matches)
dat_4 <- dat_3[,1:32]
# and add the headers
colnames(dat_4) <- headers
str(dat_4)
## 'data.frame': 308 obs. of 32 variables:
## $ Node Name : chr "Node 0" "Node 0" "Node 0" "Node 0" ...
## $ RTC_date : chr "07/31/2014" "07/31/2014" "07/31/2014" "07/31/2014" ...
## $ RTC_time : chr "08:58:18" "08:59:22" "08:59:37" "09:00:06" ...
## $ N1 Bat (VDC) : chr "" "" "" "" ...
## $ N1 Shinyei (ug/m3): chr "" "" "0.23" "null" ...
## $ N1 CC (ppb) : chr "" "" "null" "null" ...
## $ N1 Aeroq (ppm) : chr "" "" "null" "null" ...
## ... continues
这篇关于R read.csv“比列名更多的列”错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!