R根据事件更新值 [英] R update values based on event
问题描述
我最近发布了这个问题,该问题已经与我在笔记本电脑上本地使用的Mysql数据库有关。由于我没有在Mysql中找到解决该问题的方法,而其他人似乎也没有找到解决方法,因此我想再次发布它,但现在与R有关。我将数据库与 RMysql一起使用。包。这是问题所在:
I recently posted this question already related to my Mysql database which I am using locally on my laptop. Since I did not find a solution to the problem in Mysql and others did not seem to find one either I would like to post it again but now related to R. I use the database with the "RMysql" package. Here is the problem:
我有一个数据库表,其中包含几年来在多个字段上不同农业实践的数据。农民根据完成某项活动的日期在此输入信息。一个简化的示例如下所示,其中ID是一个自动递增的值。
I have a database table that contain data of different farming pratices during several years on several fields. Farmers entered there information based on the dates when a certain activity was done. A simplified example would look like this, where ID is an auto-increment value.
ID|Field|Date |Activity |
1|A |2012/08/01|Tillage |
2|A |2012/08/24|Seeding |
3|A |2013/03/05|Spraying |
4|A |2013/03/05|Fertilizer|
5|A |2013/07/25|Harvest |
6|B |2012/09/01|Tillage |
7|B |2012/09/05|Seeding |
8|B |2013/04/05|Spraying |
9|B |2013/07/28|Harvest |
10|B |2010/08/24|Tillage |
11|B |2010/09/29|Seeding |
12|B |2011/05/01|Fertilizer|
13|B |2011/07/12|Harvest |
14|A |2011/09/01|Seeding |
15|A |2011/10/10|Spraying |
16|A |2012/04/10|Fertilizer|
17|A |2012/08/02|Harvest |
现在,我想添加一个名为 Season的列,该字段仅包含收获田地的年份,但会针对上一个收获期至此收获的各个田地自动更新所有其他活动。请注意,不同田间的收获日期不同。另一点是,播种可能是在2011年完成的,而田间是在2012年收获的,因此,所有这些都应称为2012赛季。在此示例中,输出看起来像: p>
Now I would like to add a column called "Season" that contains only the year when a field was harvested but is automatically updated for all other activites between the last harvest and this harvest for individual fields. Note that harvest dates are different for different fields.Another point is that the seeding was maybe done in 2011 but the field was harvested in 2012 and all this should then be called season 2012. In this example the output would ideally look like:
ID|Field|Date |Activity |Season
1|A |2012/08/01|Tillage |2013
2|A |2012/08/24|Seeding |2013
3|A |2013/03/05|Spraying |2013
4|A |2013/03/05|Fertilizer|2013
5|A |2013/07/25|Harvest |2013
6|B |2012/09/01|Tillage |2013
7|B |2012/09/05|Seeding |2013
8|B |2013/04/05|Spraying |2013
9|B |2013/07/28|Harvest |2013
10|B |2010/08/24|Tillage |2011
11|B |2010/09/29|Seeding |2011
12|B |2011/05/01|Fertilizer|2011
13|B |2011/07/12|Harvest |2011
14|A |2011/09/01|Seeding |2012
15|A |2011/10/10|Spraying |2012
16|A |2012/04/10|Fertilizer|2012
17|A |2012/08/02|Harvest |2012
有人可以帮忙吗?
编辑:
如果我开始记录更多信息并提供所有活动,我需要更改什么?一个单独的专栏?
我尝试过:
What would I need to change if I start recording more information and give all "Activities" a separate column? I tried:
DF <- read.table(text="ID|Field|Date |Tillage|Seeding|Fertilizer|Spraying|Harvest
1|A |2012/08/01|Plough |NA|NA|NA|NA
2|A |2012/08/24|NA |Wheat|NA|NA|NA
3|A |2013/03/05|NA |NA|NA|ProduktA|NA
4|A |2013/03/05|NA|NA|TypeB|NA|NA
5|A |2013/07/25|NA |NA|NA|NA|9t
6|B |2012/09/01|Plough |NA|NA|NA|NA
7|B |2012/09/05|NA |Barley|NA|NA|NA
8|B |2013/04/05|NA |NA|NA|ProductB|NA
9|B |2013/07/28|NA |NA|NA|NA|10t
10|B |2010/08/24|Cultivator |NA|NA|NA|NA
11|B |2010/09/29|NA |NA|NA|NA|NA
12|B |2011/05/01|NA|NA|TypeB|NA|NA
13|B |2011/07/12|NA |NA|NA|NA|6t
14|A |2011/09/01|NA |Barley|NA|NA|NA
15|A |2011/10/10|NA |NA|NA|ProductC|NA
16|A |2012/04/10|NA|NA|TypeA|NA|NA
17|A |2012/08/02|NA |NA|NA|NA|7t|",
sep="|", header=TRUE, stringsAsFactors=FALSE)
DT <- data.table(DF)
DT[, Harvest:=gsub(" ", "", Harvest, fixed=TRUE)]
DT[, Date:=as.POSIXct(Date)]
setkeyv(DT, c("Field", "Date"))
DT[, Season:=cumsum(c("", !is.na(head(Harvest, -1)))), by=Field]
DT[, Season:=max(year(Date)), by=list(Field, Season)]
不幸的是,这没有用。有人能解决这个问题吗?
This does unfortunately not work. Can someone solve this,too?
推荐答案
假设每个田地每个季节都有收获(危险的假设,但是否则,我看不到这样做的方法):
Assuming there was a harvest recorded for each season on each field (a dangerous assumption, but otherwise I see no way to do this):
DF <- read.table(text="ID|Field|Date |Activity |
1|A |2012/08/01|Tillage |
2|A |2012/08/24|Seeding |
3|A |2013/03/05|Spraying |
4|A |2013/03/05|Fertilizer|
5|A |2013/07/25|Harvest |
6|B |2012/09/01|Tillage |
7|B |2012/09/05|Seeding |
8|B |2013/04/05|Spraying |
9|B |2013/07/28|Harvest |
10|B |2010/08/24|Tillage |
11|B |2010/09/29|Seeding |
12|B |2011/05/01|Fertilizer|
13|B |2011/07/12|Harvest |
14|A |2011/09/01|Seeding |
15|A |2011/10/10|Spraying |
16|A |2012/04/10|Fertilizer|
17|A |2012/08/02|Harvest |",
sep="|", header=TRUE, stringsAsFactors=FALSE)
library(data.table)
DT <- data.table(DF[, 1:4])
DT[, Activity:=gsub(" ", "", Activity, fixed=TRUE)]
DT[, Date:=as.POSIXct(Date)]
setkeyv(DT, c("Field", "Date"))
DT[, Season:=cumsum(c("", head(Activity, -1)) == "Harvest"), by=Field]
DT[, Season:=max(year(Date)), by=list(Field, Season)]
# ID Field Date Activity Season
# 1: 14 A 2011-09-01 Seeding 2012
# 2: 15 A 2011-10-10 Spraying 2012
# 3: 16 A 2012-04-10 Fertilizer 2012
# 4: 1 A 2012-08-01 Tillage 2012
# 5: 17 A 2012-08-02 Harvest 2012
# 6: 2 A 2012-08-24 Seeding 2013
# 7: 3 A 2013-03-05 Spraying 2013
# 8: 4 A 2013-03-05 Fertilizer 2013
# 9: 5 A 2013-07-25 Harvest 2013
# 10: 10 B 2010-08-24 Tillage 2011
# 11: 11 B 2010-09-29 Seeding 2011
# 12: 12 B 2011-05-01 Fertilizer 2011
# 13: 13 B 2011-07-12 Harvest 2011
# 14: 6 B 2012-09-01 Tillage 2013
# 15: 7 B 2012-09-05 Seeding 2013
# 16: 8 B 2013-04-05 Spraying 2013
# 17: 9 B 2013-07-28 Harvest 2013
PS:我认为这应该滚动连接也可以(而且效率更高),但无法立即看到如何做到这一点。
PS: I think this should also be possible (and more efficient) with a rolling join, but couldn't see immediately how to do that.
这篇关于R根据事件更新值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!