重新排列数据:从水年转换为日历年 [英] rearrange data: convert from water year to calendar year
问题描述
我有一张表,其中的数据来自流量表,其排列方式如下:
I have a table with data from an stream gauge arranged as this:
Water.Year May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr
1 1953-1954 55.55 43.62 30.46 26.17 26.76 41.74 19.92 41.25 28.77 20.96 12.47 10.51
2 1954-1955 23.49 81.35 46.71 29.33 67.83 133.30 37.62 30.16 21.07 19.38 13.87 10.63
3 1955-1956 9.87 51.59 55.36 63.03 154.08 98.15 104.06 32.85 22.89 17.30 15.68 10.88
> data <- structure(list(Water.Year = structure(1:6, .Label = c("1953-1954", "1954-1955", "1955-1956", "1956-1957", "1957-1958", "1958-1959", "1959-1960", "1960-1961", "1961-1962", "1962-1963", "1963-1964", "1964-1965", "1965-1966", "1966-1967", "1967-1968", "1968-1969", "1969-1970", "1970-1971", "1971-1972", "1972-1973", "1973-1974", "1974-1975", "1975-1976", "1976-1977", "1977-1978", "1978-1979", "1979-1980", "1980-1981", "1981-1982", "1982-1983", "1983-1984", "1984-1985", "1985-1986", "1986-1987", "1987-1988", "1988-1989", "1989-1990", "1990-1991", "1991-1992", "1992-1993", "1993-1994", "1994-1995", "1995-1996", "1996-1997", "1997-1998", "1998-1999", "1999-2000", "2000-2001"), class = "factor"), May = c(55.55, 23.49, 9.87, 18.03, 17.46, 11.37), Jun = c(43.62, 81.35, 51.59, 28.61, 15.14, 29.48), Jul = c(30.46, 46.71, 55.36, 24.36, 20.09, 19.48), Ago = c(26.17, 29.33, 63.03, 22.01, 16.97, 16.86), Set = c(26.76, 67.83, 154.08, 28.51, 27.24, 21.01), Oct = c(41.74, 133.3, 98.15, 53.72, 35.78, 19.78), Nov = c(19.92, 37.62, 104.06, 115.78, 20.35, 18.69), Dic = c(41.25, 30.16, 32.85, 32.04, 22, 18.86), Ene = c(28.77, 21.07, 22.89, 25.44, 13.27, 14.89), Feb = c(20.96, 19.38, 17.3, 14.53, 10.37, 10.4), Mar = c(12.47, 13.87, 15.68, 10.78, 8.77, 8.79), Abr = c(10.51, 10.63, 10.88, 9.33, 7.69, 8.99)), .Names = c("Water.Year", "May", "Jun", "Jul", "Ago", "Set", "Oct", "Nov", "Dic", "Ene", "Feb", "Mar", "Abr"), row.names = c(NA, 6L), class = "data.frame")
按水年安排,每年从5月开始,到次年4月结束(可以在第一列中看到)。
我想将其转换为具有三列的数据框: Calendar.Year - Month - Flow.Measurement
It is arranged by "water years", where each year starts in May and ends in April of the next year (this can be seen in the first column). I want to convert it to a dataframe with three columns: Calendar.Year -- Month -- Flow.Measurement
我已经使用与提迪尔分开将 Water.Year 列分解为两列:
I already broke down the Water.Year column into two columns using "separate" from tidyr:
> df = separate(data, Water.Year, c("year1","year2"))
year1 year2 May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr
1 1953 1954 55.55 43.62 30.46 26.17 26.76 41.74 19.92 41.25 28.77 20.96 12.47 10.51
2 1954 1955 23.49 81.35 46.71 29.33 67.83 133.30 37.62 30.16 21.07 19.38 13.87 10.63
现在,我计划使用tidyr中的 gather来完成其余的转换,但是我仍然坚持如何创建 Calendar.Year 列使用 year1 表示 May 到 Dec 的列,而 year2 表示 Jan 到 Apr 。
Now I'm planning to use "gather" from tidyr to do the rest of the transformation but I'm stuck at how I can create a Calendar.Year column using year1 for columns May to Dec and year2 for Jan to Apr.
任何帮助将不胜感激。
推荐答案
我决定使用我得到的所有答案中的一些。
这是我写的代码:
I decided to use some pieces of all the answers I got. This is the code I wrote:
library(dplyr)
library(tidyr)
#separate the year column into two years
df_years <- df %>%
separate(Water.Year, c("Year1", "Year2"))
#create two different dataframes for each section of the year
df1 <- subset(df_years, select = c(Year1, May:Dec))
df2 <- subset(df_years, select = c(Year2, Jan:Apr))
#rename both years' columns using the same name
colnames(df2)[1] <- "Year"
colnames(df1)[1] <- "Year"
#join both dataframes
cleandata <- full_join(df1, df2, by = "Year")
#sort months chronologically
cleandata <- cleandata[, c("Year", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")]
#convert to tidy data set
cleandata <- gather(cleandata, "Month", "Flow", 2:13)
#sort by year and month
cleandata <- arrange(cleandata, Year, Month)
这篇关于重新排列数据:从水年转换为日历年的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!