简单的填写缺少数据的方式 [英] Easy way to fill in missing data

查看:165
本文介绍了简单的填写缺少数据的方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有优化算法结果的表。我有100次跑步。 X表示时间,仅在存储改进时存储。所以我缺少x-es。

  x1; y1; x2; y2 
1; 100; 1; 150
4; 90; 2; 85
7; 85; 10; 60
10; 80;

这只是一个csv。我正在寻找一种方法来轻松处理这个。想要计算每个x值的平均值。所以x = 4的平均值需要考虑到运行2,y在4是85。



任何简单的方法来做这个excel。或者在java或者R中读取? (我将用R的ggplot绘制图)



所以预期的输出如下所示:

  X1; y1; x2; y2 
1; 100; 1; 150
2; 100; 2; 85
4; 90; 4; 85
7; 85; 7; 85
10; 80; 10; 60

- 更新



我有应用农学院的答案如下。这是我的脚本:

 库(ggplot2)
库(zoo)

data1 = read.table(rundata1,sep =,col.names = c(tm1,score1,current1))
data2 = read.table(rundata1,sep = ,col.names = c(tm2,score2,current2))

newdata< - merge(data1 [,1:2],data2 [,1:2] by = 1,all = T)
newdata < - newdata [!is.na(newdata $ tm1),]
newdata $ score1 < - zoo :: na.locf(newdata $ score1)
newdata $ score2< - zoo :: na.locf(newdata $ score2)

几乎现在工作只有一个错误:

  newdata $ score2<  -  zoo :: na.locf(newdata $ score2)
错误在$ $。data.frame(`* tmp *`,score2,value = c(40152.6,40152.6,
替换有11767行,数据有11768


解决方案

例如,在R中,您可以通过两个步骤完成此操作:首先合并2运行,那么你填写最后一个没有丢失的值,我从动物园包中使用 na.locf

  xx<  -  read.table(text ='x1; y1; x2; y2 
1; 100; 1; 150
4; 90; 2; 85
7; 85; 10; 60
10; 80;',sep =';',fill = TRUE,header = TRUE)

dm< ; - merge(xx [,1:2],xx [,3:4],by = 1,all = T)
dm < - dm [!is.na(dm $ x1),]
dm $ y1< - zoo :: na.locf(dm $ y1)
dm $ y2< - zoo :: na.locf(dm $ y2)
dm
x1 y1 y2
1 1 100 150
2 2 100 85
3 4 90 85
4 7 85 85
5 10 80 60


I have a table with results from an optimization algorithm. I have 100 runs. X represents the time and is only stored when an improvement is stored. So I have missing x-es.

x1; y1  ; x2 ; y2
1 ; 100 ; 1  ; 150
4 ; 90  ; 2  ; 85
7 ; 85  ; 10 ; 60
10; 80  ;

This is just a csv. I am looking for a method to easily process this. As want to calculate averages at each x-value. So the average at x = 4, needs to take into account that for run 2, y at 4 is 85.

Any easy way to do this with excel. Or read it in in java or R? (I will be plotting the agerage with R's ggplot).

So the expected output would look like this:

x1; y1  ; x2 ; y2
1 ; 100 ; 1  ; 150
2 ; 100 ; 2  ; 85
4 ; 90  ; 4  ; 85
7 ; 85  ; 7  ; 85
10; 80  ;10 ; 60

--UPDATE

I have applied agstudy's answer below. This is my script:

library(ggplot2)
 library(zoo)

data1 = read.table("rundata1", sep= " ", col.names=c("tm1","score1","current1"))
data2 = read.table("rundata1", sep= " ", col.names=c("tm2","score2","current2"))

newdata<- merge(data1[,1:2],data2[,1:2],by=1,all=T)
newdata <- newdata[!is.na(newdata$tm1),]
newdata$score1 <- zoo::na.locf(newdata$score1)
newdata$score2 <- zoo::na.locf(newdata$score2)

Almost working now. Only have an error:

newdata$score2 <- zoo::na.locf(newdata$score2)
Error in `$<-.data.frame`(`*tmp*`, "score2", value = c(40152.6, 40152.6,  : 
  replacement has 11767 rows, data has 11768

解决方案

For example, in R you can do this in 2 steps. First you merge your 2 runs, then you fill the missing values with the last no missing. I am using na.locf from the zoo package for this.

xx <- read.table(text='x1; y1  ; x2 ; y2
1 ; 100 ; 1  ; 150
4 ; 90  ; 2  ; 85
7 ; 85  ; 10 ; 60
10; 80  ;',sep=';',fill=TRUE,header=TRUE)

dm <- merge(xx[,1:2],xx[,3:4],by=1,all=T)
dm <- dm[!is.na(dm$x1),]
dm$y1 <- zoo::na.locf(dm$y1)
dm$y2 <- zoo::na.locf(dm$y2)
dm
  x1  y1  y2
1  1 100 150
2  2 100  85
3  4  90  85
4  7  85  85
5 10  80  60

这篇关于简单的填写缺少数据的方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆