如何获得后续观测值(国家 - 年)之间的价值差异? [英] How to get the difference in value between subsequent observations (country-years)?

查看:139
本文介绍了如何获得后续观测值(国家 - 年)之间的价值差异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我在10年内有5个国家的分数,例如:

Let's say, I have scores for 5 countries over a period of 10 years such as:

mydata<-1:3
mydata<-expand.grid(
country=c('A', 'B', 'C', 'D', 'E'),
year=c('1980','1981','1982','1983','1984','1985','1986','1987','1988','1989'))
mydata$score=sapply(runif(50,0,2), function(x) {round(x,4)})

library(reshape)
mydata<-reshape(mydata, v.names="score", idvar="year", timevar="country", direction="wide")

> head(mydata)
   year score.A score.B score.C score.D score.E
1  1980  1.0538  1.6921  1.3165  1.7434  1.9687
6  1981  1.4773  1.6479  0.3135  0.6172  0.7704
11 1982  0.8748  1.3704  0.2788  1.6306  1.7237
16 1983  1.1224  1.1340  1.7684  1.3352  0.4317
21 1984  1.5496  1.8706  1.4641  0.5313  0.8590
26 1985  1.7715  1.8953  0.6230  0.3580  1.6313

现在,我想创建一个新变量period为1,如果后一年的分数是+/- 0.5不同于上一年的分数,如果这不是真的,则为0。我想为所有5个国家这样做。如果可以确定期间= 1的国家/年,并在表格中显示这些信息,那将是非常好的。

Now, I would like to create a new variable "period" that is 1 if the score of the subsequent year is +/- 0.5 different from the score of the previous year and that is 0 if this is not true. I would like to do so for all 5 countries. And it would be great if it were possible to identify the country-years for which period = 1 and display this information in a table.

> head(mydata)
   year score.A score.B score.C score.D score.E  period.A  period.B ...
1  1980  1.0538  1.6921  1.3165  1.7434  1.9687   NA         NA
6  1981  1.4773  1.6479  0.3135  0.6172  0.7704   0          ....
11 1982  0.8748  1.3704  0.2788  1.6306  1.7237   1
16 1983  1.1224  1.1340  1.7684  1.3352  0.4317   0
21 1984  1.5496  1.8706  1.4641  0.5313  0.8590   0
26 1985  1.7715  1.8953  0.6230  0.3580  1.6313   0

我非常希望这不是太多问。我试过 dist 库(代理),但我不知道如何限制功能成对观察而不是全行。感谢一百万!

I very much hope that this is not too much to ask. I tried it with dist in the library(proxy) but I do not know how to restrict the function to pairs of observation rather than the full row. Thanks a million!!

推荐答案

这个使用 diff c $ c> lapply :

This one uses diff and lapply:

score.cols  <- grep("score", colnames(mydata), value=TRUE)
period.cols <- gsub("score", "period", score.cols)
compute.period <- function(x)as.integer(c(NA, abs(diff(x)) > 0.5))
cbind(mydata, `names<-`(lapply(mydata[score.cols], compute.period), period.cols))

编辑也许你不使用正确的数据结构。相反,我建议你在原始数据(重建之前)完成你的工作:

It becomes more apparent (with your other question posted this morning) that maybe you are not working with the right data structure. Instead, I would recommend you do your work on the raw (before it is reshaped) data:

period.fun <- function(x)as.integer(c(NA, abs(diff(x) > 0.5)))
mydata <- within(mydata, period <- ave(score, country, FUN = period.fun))

只有这样,你才能重塑 mydata 以最终形式获得。

Only then you would reshape mydata to get it in its final form.

这篇关于如何获得后续观测值(国家 - 年)之间的价值差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆