R：应用累积和函数和填充数据空白与NA进行绘图 [英] R: Applying cumulative sum function and filling data gaps with NA for plotting

查看：223 发布时间：2017/3/26 4:40:24 r plot dataframe cumsum

本文介绍了R：应用累积和函数和填充数据空白与NA进行绘图的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框看起来像这样，我正在尝试计算行VALUE的累积和。输入文件也可以在这里找到： https://dl.dropboxusercontent.com/u/ 16277659 / input.csv

  df< -read.csv（input.csv，sep = ;，header = TRUE）
 
 NAME; ID; SURVEY_YEAR REFERENCE_YEAR; VALUE 
 SAMPLE1; 253; 1880年1879年14 
 SAMPLE1; 253; 1881年1880年-10 
 SAMPLE1; 253; 1882年1881年4 
 SAMPLE1; 253; 1883年1882年10 
 SAMPLE1; 253; 1884年1883年10 
 SAMPLE1; 253; 1885年1884年12 
 SAMPLE1; 253; 1889年1888年11 
 SAMPLE1; 253; 1890年1889年12 
 SAMPLE1; 253; 1911年1910年-16 
 SAMPLE1; 253; 1913年1911年-11 
 SAMPLE1; 253; 1914年1913年-8 
 SAMPLE2; 261; 1992年1991年-19 
 SAMPLE2; 261; 1994年1992年-58 
 SAMPLE2; 261; 1995; 1994年-40 
 SAMPLE2; 261; 1996年1995; -21 
 SAMPLE2; 261; 1997年1996年-50 
 SAMPLE2; 261; 1998; 1997年-60 
 SAMPLE2; 261; 2005; 2004; -34 
 SAMPLE2; 261; 2006; 2005; -23 
 SAMPLE2; 261; 2007; 2006; -19 
 SAMPLE2; 261; 2008; 2007; -29 
 SAMPLE2; 261; 2009; 2008; -89 
 SAMPLE2; 261; 2013年2009; -14 
 SAMPLE2; 261; 2014年2013年-16

我目标的最终产品是每个SAMPLE的曲线，在x轴上的是SURVEY_YEAR在y轴上绘制了以后计算的VALUE的累积总和CUMSUM。
我的代码到目前为止整理数据：

 ＃按组筛选出小于3个度量的所有值（在这种情况下，什么也不做，但与我的其余数据重要）
 df< -read.csv（input.csv，sep =;，header = TRUE）
 rowsn <  -  with（df，by（VALUE，ID，function（xx）sum（！is.na（xx））））
 names（which（rowsn> = 3））
 dat < -  df [％name中的df $ ID％（其中（rowsn> = 3））]] 
 
＃写入新的列，该列定义组的开头（按ID分隔）和cumsum函数（= 0）
 dat < -  do.call（rbind，lapply（split（dat，dat $ ID），function（x）{
x < -  rbind（x [1，]，x ）; x [1，VALUE]<  -  0; x [1，SURVEY_YEAR]<  -  x [1，SURVEY_YEAR] -1; return（x）}））
 rownames dat）<  -  seq_len（nrow（dat））
 
＃将数据写入csv文件进行检查
 write.table（dat，dat.csv，sep =;， row.names = FALSE）

这将导致以下数据框，它是计算

  NAME; ID; SURVEY_YEAR; REFERENCE_YEAR; VALUE 
 SAMPLE1; 253; 1879年1879年0 
 SAMPLE1; 253; 1880年1879年14 
 SAMPLE1; 253; 1881年1880年-10 
 SAMPLE1; 253; 1882年1881年4 
 SAMPLE1; 253; 1883年1882年; 10 
 SAMPLE1; 253; 1884年1883年10 
 SAMPLE1; 253; 1885年1884年12 
 SAMPLE1; 253; 1889年1888年11 
 SAMPLE1; 253; 1890年1889年12 
 SAMPLE1; 253; 1911年1910年-16 
 SAMPLE1; 253; 1913年1911年-11 
 SAMPLE1; 253; 1914年1913年-8 
 SAMPLE2; 261; 1991年1991年0 
 SAMPLE2; 261; 1992年1991年-19 
 SAMPLE2; 261; 1994年1992年-58 
 SAMPLE2; 261; 1995; 1994年-40 
 SAMPLE2; 261; 1996年1995; -21 
 SAMPLE2; 261; 1997年1996年-50 
 SAMPLE2; 261; 1998; 1997年-60 
 SAMPLE2; 261; 2005; 2004; -34 
 SAMPLE2; 261; 2006; 2005; -23 
 SAMPLE2; 261; 2007; 2006; -19 
 SAMPLE2; 261; 2008; 2007; -29 
 SAMPLE2; 261; 2009; 2008; -89 
 SAMPLE2; 261; 2013年2009; -14 
 SAMPLE2; 261; 2014年2013年-16

现在的问题是我想计算每个年。正如你所看到的，我在某些年份之间有差距（例如在1890年至1911年之间的SAMPLE1和1998年至2005年的SAMPLE2之间），我想填补每年与NA值之间的差距，以便我可以用情节类型绘制='b'（点和线），并且不同的间隙不连接。重要的是，如果相互之间有多个NA值，则在CUMSUM行中，最后一个NA值应替换为..之前的最后一个数值。

正常情况是，REFERENCE_YEAR和SURVEY_YEAR之间的差值等于1（例如，从1880年到1881年的SAMPLE1的第一个例子），但在某些情况下，在REFERENCE_YEAR和SURVEY_YEAR之间有不同的时间段（例如在1911年到1913年的SAMPLE1中，在2009年至2013年的SAMPLE2中）。如果是这种情况，累计金额的功能应该只应用一次，并且在所示期间内的值应该保持不变（在图中，该结果是连接的直线）。

如果我提供一个结果应该是什么样的例子，那么它很难解释一切细节，也许更简单：

  NAME; ID; SURVEY_YEAR; REFERENCE_YEAR;值; CUMSUM 
 SAMPLE1; 253; 1879年1879年0; 0 
 SAMPLE1; 253; 1880年1879年14; 14 
 SAMPLE1; 253; 1881年1880年-10; 4 
 SAMPLE1; 253; 1882年1881年4; 8 
 SAMPLE1; 253; 1883年1882年10; 18 
 SAMPLE1; 253; 1884年1883年10; 28 
 SAMPLE1; 253; 1885年1884年12; 40 
 SAMPLE1; 253; 1886年1885年NA; NA 
 SAMPLE1; 253; 1887年1886年NA; NA 
 SAMPLE1; 253; 1888年1887年NA; 40 
 SAMPLE1; 253; 1889年1888年11; 51 
 SAMPLE1; 253; 1890年1889年12; 63 
 SAMPLE1; 253; 1891年1890年NA; NA 
 SAMPLE1; 253; 1892年1891年NA; NA 
 SAMPLE1; 253; 1893年1892年NA; NA 
 SAMPLE1; 253; 1894年1893年NA; NA 
 SAMPLE1; 253; 1895年1894年NA; NA 
 SAMPLE1; 253; 1896年1895年NA; NA 
 SAMPLE1; 253; 1897年1896年NA; NA 
 SAMPLE1; 253; 1898年1897年NA; NA 
 SAMPLE1; 253; 1899年1898年NA; NA 
 SAMPLE1; 253; 1900年1899年NA; NA 
 SAMPLE1; 253; 1901年1900年NA; NA 
 SAMPLE1; 253; 1902年1901年NA; NA 
 SAMPLE1; 253; 1903年1902年NA; NA 
 SAMPLE1; 253; 1904年1903年NA; NA 
 SAMPLE1; 253; 1905年1904年; NA; NA 
 SAMPLE1; 253; 1906年1905年NA; NA 
 SAMPLE1; 253; 1907年1906年NA; NA 
 SAMPLE1; 253; 1908年1907年NA; NA 
 SAMPLE1; 253; 1909年1908年NA; NA 
 SAMPLE1; 253; 1910年1909年NA; 63 
 SAMPLE1; 253; 1911年; 1910年-16; 47 
 SAMPLE1; 253; 1912年1911年-11; 36 
 SAMPLE1; 253; 1913年1912年-11; 36 
 SAMPLE1; 253; 1914年1913年-8; 28 
 SAMPLE2; 253; 1991年1991年0; 0 
 SAMPLE2; 253; 1992年1991年-19; -19 
 SAMPLE2; 253; 1993年1992年-58; -77 
 SAMPLE2; 253; 1994年1993年-58; -135 
 SAMPLE2; 253; 1995; 1994年-40; -175 
 SAMPLE2; 253; 1996年1995; -21; -196 
 SAMPLE2; 253; 1997年1996年-50; -246 
 SAMPLE2; 253; 1998; 1997年-60; -306 
 SAMPLE2; 253; 1999; 1998; NA; NA 
 SAMPLE2; 253; 2000; 1999; NA; NA 
 SAMPLE2; 253; 2001; 2000; NA; NA 
 SAMPLE2; 253; 2002; 2001; NA; NA 
 SAMPLE2; 253; 2003; 2002; NA; NA 
 SAMPLE2; 253; 2004; 2003; NA; -306 
 SAMPLE2; 253; 2005; 2004; -34; -340 
 SAMPLE2; 253; 2006; 2005; -23; -363 
 SAMPLE2; 253; 2007; 2006; -19; -382 
 SAMPLE2; 253; 2008; 2007; -29; -411 
 SAMPLE2; 253; 2009; 2008; -89; -500 
 SAMPLE2; 253; 2010; 2009; -14; -514 
 SAMPLE2; 253; 2011; 2010; -14; -514 
 SAMPLE2; 253; 2012; 2011; -14; -514 
 SAMPLE2; 253; 2013年2012; -14; -514 
 SAMPLE2; 253; 2014年2013年-16; -530

帮助这个相当复杂的情况将非常感谢！谢谢！

解决方案

BIG EDIT：发布的代码，添加正确的图书馆电话

df = read.csv（input.csv，sep =;，stringsAsFactors = FALSE）

#find每个SAMPLE的最小/最大年份
df_minmax = df％>％
group_by（NAME）％>％
summaryize（min_year = min（SURVEY_YEAR），
max_year = max（SURVEY_YEAR））

＃创建一个空数据框，我们想要
df2 = data.frame（NAME =，
ID = 0，
SURVEY_YEAR = min（df $ SURVEY_YEAR）：max（df $ SURVEY_YEAR），
REFERENCE_YEAR = min（df $ SURVEY_YEAR）：max（df $ SURVEY_YEAR） - 1，
VALUE = NA，stringsAsFactors = FALSE）

＃填写NAMES数据框 - 可能有一个更好的方法来做这个
for（i in 1：nrow（df_minmax））{
min_year = df_minmax [i，] $ min_year
max_year = df_minmax [i，] $ max_year

df2 [df2 $ SURVEY_YEAR％％min_y ear：max_year，] $ NAME = df_minmax [i，] $ NAME
}

#fill在值
#this行有点危险 - 它依赖于事实上，df1和df2具有相同的相对顺序
＃不要改变df和df2之前的排序。
df2 [df2 $ SURVEY_YEAR％in％df $ SURVEY_YEAR，] $ VALUE = df $ VALUE

＃在这个例子中，sample1和sample2之间有一段很长的时间，我们可以过滤掉
df2 = df2％>％filter（NAME！=）

＃现在我们可以为了累积和而将所有累积的东西
＃设置为0
temp = df2 $ VALUE
df2 [is.na（df2）] = 0
df2 = df2％>％group_by（NAME）％>％mutate（csum = cumsum ））

#get返回NA值 - 如果NA值对您有用
df2 $ VALUE = temp

这里是'head（df2）：

  NAME ID SURVEY_YEAR REFERENCE_YEAR VALUE csum 
 1 SAMPLE1 0 1880 1879 14 14 
 2 SAMPLE1 0 1881 1880 -10 4 
 3 SAMPLE1 0 1882 1881 4 8 
 4 SAMPLE1 0 1883 1882 10 18 
 5 SAMPLE1 0 1884 1883 10 28 
 6 SAMPLE1 0 1885 1884 1 2 40 
 7 SAMPLE1 0 1886 1885 NA 40 
 8 SAMPLE1 0 1887 1886 NA 40 
 9样本1 0 1888 1887 NA 40 
 10 SAMPLE1 0 1889 1888 11 51 
 11 SAMPLE1 0 1890 1889 12 63 
 12样本1 0 1891 1890 NA 63 
 13样本1 0 1892 1891 NA 63 
 14样本1 0 1893 1892 NA 63 
 15样本1 0 1894 1893 NA 63 
 16 SAMPLE1 0 1895 1894 NA 63 
 17 SAMPLE1 0 1896 1895 NA 63 
 18 SAMPLE1 0 1897 1896 NA 63 
 19 SAMPLE1 0 1898 1897 NA 63 
 20 SAMPLE1 0 1899 1898 NA 63

以下是上述步骤的概述，作为快速摘要： / p>

查找NAME中每个组的最小/最大年份。

创建一个空的数据框，具有我们想要的所有年份的总范围。

在新的空数据框中的正确位置填入NAMES。

在新的空数据框中，在正确的地方填入VALUES。

为了累积金额，将NA设置为0

按组查找累计金额。

将0替换为NAs。

为循环。我希望没有人把我绑起来。

I have a dataframe which looks like this and I am trying to calculate the cumulative sum for the row VALUE. The input file can also be found here: https://dl.dropboxusercontent.com/u/16277659/input.csv

df <-read.csv("input.csv", sep=";", header=TRUE)

NAME;       ID; SURVEY_YEAR REFERENCE_YEAR; VALUE
SAMPLE1;    253;    1880;   1879;           14
SAMPLE1;    253;    1881;   1880;           -10
SAMPLE1;    253;    1882;   1881;           4
SAMPLE1;    253;    1883;   1882;           10
SAMPLE1;    253;    1884;   1883;           10
SAMPLE1;    253;    1885;   1884;           12
SAMPLE1;    253;    1889;   1888;           11
SAMPLE1;    253;    1890;   1889;           12
SAMPLE1;    253;    1911;   1910;          -16
SAMPLE1;    253;    1913;   1911;          -11
SAMPLE1;    253;    1914;   1913;          -8
SAMPLE2;    261;    1992;   1991;          -19
SAMPLE2;    261;    1994;   1992;          -58
SAMPLE2;    261;    1995;   1994;          -40
SAMPLE2;    261;    1996;   1995;          -21
SAMPLE2;    261;    1997;   1996;          -50
SAMPLE2;    261;    1998;   1997;          -60
SAMPLE2;    261;    2005;   2004;          -34
SAMPLE2;    261;    2006;   2005;          -23
SAMPLE2;    261;    2007;   2006;          -19
SAMPLE2;    261;    2008;   2007;          -29
SAMPLE2;    261;    2009;   2008;          -89
SAMPLE2;    261;    2013;   2009;          -14
SAMPLE2;    261;    2014;   2013;          -16

The end product I am aiming for are plots for each SAMPLE where on the x axis the SURVEY_YEAR is plotted and on the y axis the later calculated cumulative sum CUMSUM of the VALUE. My code so far to sort out the data:

# Filter out all values with less than 3 measurements by group (in this case does nothing, but is important with the rest of my data)
df <-read.csv("input.csv", sep=";", header=TRUE)
rowsn <- with(df,by(VALUE,ID,function(xx)sum(!is.na(xx))))
names(which(rowsn>=3))
dat <- df[df$ID %in% names(which(rowsn>=3)),]

# write new column which defines the beginning of the group (split by ID) and for the cumsum function(=0)
dat <- do.call(rbind, lapply(split(dat, dat$ID), function(x){
x <- rbind(x[1,],x); x[1, "VALUE"] <- 0; x[1, "SURVEY_YEAR"] <- x[1, "SURVEY_YEAR"] -1;       return(x)}))
rownames(dat) <- seq_len(nrow(dat))

# write dat to csv file for inspection
write.table(dat, "dat.csv", sep=";", row.names=FALSE)

This results in the following dataframe which is the starting point for the calculation of the cumulative sum of the row VALUE.

NAME;   ID; SURVEY_YEAR;    REFERENCE_YEAR; VALUE
SAMPLE1;    253;    1879;   1879;             0
SAMPLE1;    253;    1880;   1879;            14
SAMPLE1;    253;    1881;   1880;           -10
SAMPLE1;    253;    1882;   1881;             4
SAMPLE1;    253;    1883;   1882;            10
SAMPLE1;    253;    1884;   1883;            10
SAMPLE1;    253;    1885;   1884;            12
SAMPLE1;    253;    1889;   1888;            11
SAMPLE1;    253;    1890;   1889;            12
SAMPLE1;    253;    1911;   1910;           -16
SAMPLE1;    253;    1913;   1911;           -11
SAMPLE1;    253;    1914;   1913;            -8
SAMPLE2;    261;    1991;   1991;             0
SAMPLE2;    261;    1992;   1991;           -19
SAMPLE2;    261;    1994;   1992;           -58
SAMPLE2;    261;    1995;   1994;           -40
SAMPLE2;    261;    1996;   1995;           -21
SAMPLE2;    261;    1997;   1996;           -50
SAMPLE2;    261;    1998;   1997;           -60
SAMPLE2;    261;    2005;   2004;           -34
SAMPLE2;    261;    2006;   2005;           -23
SAMPLE2;    261;    2007;   2006;           -19
SAMPLE2;    261;    2008;   2007;           -29
SAMPLE2;    261;    2009;   2008;           -89
SAMPLE2;    261;    2013;   2009;           -14
SAMPLE2;    261;    2014;   2013;           -16

The problem now is that I would like to calculate the cumulative sum of the row VALUE for each year. As you can see I have gaps between certain years (for example in SAMPLE1 between 1890 and 1911 and in SAMPLE2 between 1998 and 2005) and I would like to fill the gaps for each year inbetween with NA values so that I can plot with plot type='b' (points and lines) and so that the different gaps are not connected. What is important that if there are multiple NA values after each other, in the CUMSUM row the last NA value should be replaced with the last numerical value before..

The normal case is that the difference between the REFERENCE_YEAR and the SURVEY_YEAR equals 1 (e.g for the first example of SAMPLE1 from 1880 to 1881), but in some cases there are varying periods between the REFERENCE_YEAR and the SURVEY_YEAR (e.g. in SAMPLE1 from 1911 to 1913 and in SAMPLE2 from 2009 to 2013). If this is the case the function of cumulative sum should only be applied once and the value should stay the same for the period indicated (in the plot this results in a straight line that is connected).

Its difficult to explain everything in detail and maybe its easier if I provide an example of what the result should look like:

NAME;       ID; SURVEY_YEAR;    REFERENCE_YEAR; VALUE;  CUMSUM
SAMPLE1;    253;    1879;       1879;            0;     0
SAMPLE1;    253;    1880;       1879;           14;     14
SAMPLE1;    253;    1881;       1880;          -10;     4
SAMPLE1;    253;    1882;       1881;            4;     8
SAMPLE1;    253;    1883;       1882;           10;     18
SAMPLE1;    253;    1884;       1883;           10;     28
SAMPLE1;    253;    1885;       1884;           12;     40
SAMPLE1;    253;    1886;       1885;           NA;     NA
SAMPLE1;    253;    1887;       1886;           NA;     NA
SAMPLE1;    253;    1888;       1887;           NA;     40
SAMPLE1;    253;    1889;       1888;           11;     51
SAMPLE1;    253;    1890;       1889;           12;     63
SAMPLE1;    253;    1891;       1890;           NA;     NA
SAMPLE1;    253;    1892;       1891;           NA;     NA
SAMPLE1;    253;    1893;       1892;           NA;     NA
SAMPLE1;    253;    1894;       1893;           NA;     NA
SAMPLE1;    253;    1895;       1894;           NA;     NA
SAMPLE1;    253;    1896;       1895;           NA;     NA
SAMPLE1;    253;    1897;       1896;           NA;     NA
SAMPLE1;    253;    1898;       1897;           NA;     NA
SAMPLE1;    253;    1899;       1898;           NA;     NA
SAMPLE1;    253;    1900;       1899;           NA;     NA
SAMPLE1;    253;    1901;       1900;           NA;     NA
SAMPLE1;    253;    1902;       1901;           NA;     NA
SAMPLE1;    253;    1903;       1902;           NA;     NA
SAMPLE1;    253;    1904;       1903;           NA;     NA
SAMPLE1;    253;    1905;       1904;           NA;     NA
SAMPLE1;    253;    1906;       1905;           NA;     NA
SAMPLE1;    253;    1907;       1906;           NA;     NA
SAMPLE1;    253;    1908;       1907;           NA;     NA
SAMPLE1;    253;    1909;       1908;           NA;     NA
SAMPLE1;    253;    1910;       1909;           NA;     63
SAMPLE1;    253;    1911;       1910;          -16;     47
SAMPLE1;    253;    1912;       1911;          -11;     36
SAMPLE1;    253;    1913;       1912;          -11;     36
SAMPLE1;    253;    1914;       1913;           -8;     28
SAMPLE2;    253;    1991;       1991;            0;     0
SAMPLE2;    253;    1992;       1991;          -19;     -19
SAMPLE2;    253;    1993;       1992;          -58;     -77
SAMPLE2;    253;    1994;       1993;          -58;     -135
SAMPLE2;    253;    1995;       1994;          -40;     -175
SAMPLE2;    253;    1996;       1995;          -21;     -196
SAMPLE2;    253;    1997;       1996;          -50;     -246
SAMPLE2;    253;    1998;       1997;          -60;     -306
SAMPLE2;    253;    1999;       1998;           NA;     NA
SAMPLE2;    253;    2000;       1999;           NA;     NA
SAMPLE2;    253;    2001;       2000;           NA;     NA
SAMPLE2;    253;    2002;       2001;           NA;     NA
SAMPLE2;    253;    2003;       2002;           NA;     NA
SAMPLE2;    253;    2004;       2003;           NA;     -306
SAMPLE2;    253;    2005;       2004;          -34;     -340
SAMPLE2;    253;    2006;       2005;          -23;     -363
SAMPLE2;    253;    2007;       2006;          -19;     -382
SAMPLE2;    253;    2008;       2007;          -29;     -411
SAMPLE2;    253;    2009;       2008;          -89;     -500
SAMPLE2;    253;    2010;       2009;          -14;     -514
SAMPLE2;    253;    2011;       2010;          -14;     -514
SAMPLE2;    253;    2012;       2011;          -14;     -514
SAMPLE2;    253;    2013;       2012;          -14;     -514
SAMPLE2;    253;    2014;       2013;          -16;     -530

Help with this rather complicated case would be very much appreciated! Thank you!

解决方案

BIG EDIT: Posted code, added correct library calls

library(dplyr)
df = read.csv("input.csv", sep=";", stringsAsFactors=FALSE)

#find min/max year for each SAMPLE
df_minmax = df %>% 
group_by(NAME) %>% 
summarise(min_year = min(SURVEY_YEAR), 
          max_year = max(SURVEY_YEAR))

#create an empty dataframe with what we want
df2 = data.frame(NAME = "", 
                 ID = 0, 
                 SURVEY_YEAR = min(df$SURVEY_YEAR):max(df$SURVEY_YEAR), 
                 REFERENCE_YEAR = min(df$SURVEY_YEAR):max(df$SURVEY_YEAR) - 1,
                 VALUE = NA, stringsAsFactors=FALSE)

#fill in the NAMES dataframe - there's probably a better way to do this
for(i in 1:nrow(df_minmax)) {
  min_year = df_minmax[i, ]$min_year
  max_year = df_minmax[i, ]$max_year

  df2[df2$SURVEY_YEAR %in% min_year:max_year, ]$NAME = df_minmax[i, ]$NAME
}

#fill in the values
#this line is a bit dangerous -- it relies on the fact that df1 and df2 have the same relative ordering
#don't change the ordering of df and df2 before this line.
df2[df2$SURVEY_YEAR %in% df$SURVEY_YEAR, ]$VALUE = df$VALUE

#in this example there is a long period between sample1 and sample2 we can filter those out
df2 = df2 %>% filter(NAME != "")

#Now we can do all the cumulative stuff
#for purposes of cumulative sums, set NA to 0
temp = df2$VALUE
df2[is.na(df2)] = 0
df2 = df2 %>% group_by(NAME) %>% mutate(csum = cumsum(VALUE))

#get back the NA values -- in case the NA values are useful to you
df2$VALUE = temp

Here's `head(df2):

      NAME ID SURVEY_YEAR REFERENCE_YEAR VALUE csum
1  SAMPLE1  0        1880           1879    14   14
2  SAMPLE1  0        1881           1880   -10    4
3  SAMPLE1  0        1882           1881     4    8
4  SAMPLE1  0        1883           1882    10   18
5  SAMPLE1  0        1884           1883    10   28
6  SAMPLE1  0        1885           1884    12   40
7  SAMPLE1  0        1886           1885    NA   40
8  SAMPLE1  0        1887           1886    NA   40
9  SAMPLE1  0        1888           1887    NA   40
10 SAMPLE1  0        1889           1888    11   51
11 SAMPLE1  0        1890           1889    12   63
12 SAMPLE1  0        1891           1890    NA   63
13 SAMPLE1  0        1892           1891    NA   63
14 SAMPLE1  0        1893           1892    NA   63
15 SAMPLE1  0        1894           1893    NA   63
16 SAMPLE1  0        1895           1894    NA   63
17 SAMPLE1  0        1896           1895    NA   63
18 SAMPLE1  0        1897           1896    NA   63
19 SAMPLE1  0        1898           1897    NA   63
20 SAMPLE1  0        1899           1898    NA   63

Here's the outline of the steps I did above as a quick summary:

Find the min/max year for each group in NAME.
Create an empty dataframe that has the total range of all the years we want.
Fill in the NAMES in the correct place in new empty dataframe.
Fill in the VALUES in the correct place in new empty dataframe.
Set NA's to 0 for purposes of cumulative sums
Find cumulative sums by group.
Replace the 0 back into NAs.

It's a bit hackish with the for loop. I'm hoping no one strings me up for it.

这篇关于R：应用累积和函数和填充数据空白与NA进行绘图的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R：应用累积和函数和填充数据空白与NA进行绘图 [英] R: Applying cumulative sum function and filling data gaps with NA for plotting

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R：应用累积和函数和填充数据空白与NA进行绘图 [英] R: Applying cumulative sum function and filling data gaps with NA for plotting

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭