`gather`无法处理行名 [英] `gather` can't handle rownames
问题描述
allcsvs = list.files(pattern = "*.csv$", recursive = TRUE)
library(tidyverse)
##LOOP to redact the snow data csvs##
for(x in 1:length(allcsvs)) {
df = read.csv(allcsvs[x], check.names = FALSE)
newdf = df %>%
gather(COL_DATE, SNOW_DEPTH, -PT_ID, -DATE) %>%
mutate(
DATE = as.Date(DATE,format = "%m/%d/%Y"),
COL_DATE = as.Date(COL_DATE, format = "%Y.%m.%d")
) %>%
filter(DATE == COL_DATE) %>%
select(-COL_DATE)
####TURN DATES UNAMBIGUOUS HERE####
df$DATE = lubridate::mdy(df$DATE)
finaldf = merge(newdf, df, all.y = TRUE)
write.csv(finaldf, allcsvs[x])
df = read.csv(allcsvs[x])
newdf = df[, -grep("X20", colnames(df))]
write.csv(newdf, allcsvs[x])
}
我正在使用上面的代码逐行填充新列使用来自不同现有列的值g日期作为选择标准。如果我在excel中手动打开每个.csv并删除第一列,则此代码效果很好。但是,如果我按原样在.csvs上运行
I am using the code above to populate a new column row-by-row using values from different existing columns, using date as selection criteria. If I manually open each .csv in excel and delete the first column, this code works great. However, if I run it on the .csvs "as is"
我会收到以下消息:
错误:必须命名第1列
到目前为止,我已经尝试过放置 -rownames
放在 gather
括号内,我尝试将 remove_rownames%>%
放在下面 newdf = df%&%;%
,但似乎无济于事。我尝试读取没有第一列 [,-1]
的csv或删除R df [,1]< -NULL
,但是由于某些原因,我的代码返回了一个空表,而不是我想要的。 换句话说,我可以删除Excel中的行名,并且效果很好,如果我在R中删除行名,则会发生一些时髦的事情。
So far I've tried putting -rownames
within the parenthesis of gather
, I've tried putting remove_rownames %>%
below newdf = df %>%
, but nothing seems to work. I tried reading the csv without the first column [,-1]
or deleting the first column in R df[,1]<-NULL
but for some reason when I do that my code returns an empty table instead of what I want it to. In other words, I can delete the rownames in Excel and it works great, if I delete them in R something funky happens.
这里是一些示例数据: https://drive.google.com/file/ d / 1RiMrx4wOpUdJkN4il6IopciSF6pKeNLr / view?usp = sharing
Here is some sample data: https://drive.google.com/file/d/1RiMrx4wOpUdJkN4il6IopciSF6pKeNLr/view?usp=sharing
推荐答案
您可以考虑使用> reader :: read_csv
。
使用 tidyverse
的简单解决方案:
allcsvs %>%
map(read_csv) %>%
reduce(bind_rows) %>%
gather(COL_DATE, SNOW_DEPTH, -PT_ID, -DATE) %>%
mutate(
DATE = as.Date(DATE,format = "%m/%d/%Y"),
COL_DATE = as.Date(COL_DATE, format = "%Y.%m.%d")
) %>%
filter(DATE == COL_DATE) %>%
select(-COL_DATE)
其中 utils :: read.csv
,您导入字符串是因素。 as.Date(DATE,format =%m /%d /%Y)
得出 NA
。
With utils::read.csv
, you are importing strings are factors. as.Date(DATE,format = "%m/%d/%Y")
evaluates NA
.
更新
以上解决方案返回一个单个数据帧。要使用for循环分别写入每个数据文件:
Above solution returns one single dataframe. To write the each data file separately with the for loop:
for(x in 1:length(allcsvs)) {
read_csv(allcsvs[x]) %>%
gather(COL_DATE, SNOW_DEPTH, -PT_ID, -DATE) %>%
mutate(
COL_DATE = as.Date(COL_DATE, format = "%Y.%m.%d")
) %>%
filter(DATE == COL_DATE) %>%
select(-COL_DATE) %>%
write_csv(paste('tidy', allcsvs[x], sep = '_'))
}
比较
-
purrr:map 在某些情况下,可以使用code>和
purrr:reduce
代替for循环。这些函数将另一个函数用作参数。 -
readr :: read_csv
通常比基本R等效项快10倍。 (更多信息: http://r4ds.had.co.nz/data-import。 html )。它还可以更好地处理CSV文件。
purrr:map
andpurrr:reduce
can be used instead of for loop in some cases. Those functions take another functions as arguments.readr::read_csv
is typically 10x faster than base R equivalents. (More info: http://r4ds.had.co.nz/data-import.html). Also it can handle CSV files better.
这篇关于`gather`无法处理行名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!