根据R中的空间邻域和时间标准将行分配给组 [英] Assign rows to a group based on spatial neighborhood and temporal criteria in R

查看:99
本文介绍了根据R中的空间邻域和时间标准将行分配给组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个我似乎无法解决的问题.我有一个从arcgis中的栅格派生的数据集.该数据集表示10年期间的每次火灾.一些栅格像元在该时间段内发生了多次起火(因此,我的数据集中将有多行),而某些栅格像元没有发生任何起火(因此将不会在我的数据集中表示).因此,数据集中的每一行都有一个列号(顺序整数)和一个分配给它的行号,该行号与栅格中的行ID和列ID相对应.它也有火灾发生的日期.

I have an issue that I just cannot seem to sort out. I have a dataset that was derived from a raster in arcgis. The dataset represents every fire occurrence during a 10-year period. Some raster cells had multiple fires within that time period (and, thus, will have multiple rows in my dataset) and some raster cells will not have had any fire (and, thus, will not be represented in my dataset). So, each row in the dataset has a column number (sequential integer) and a row number assigned to it that corresponds with the row and column ID from the raster. It also has the date of the fire.

我想为彼此之间4天内以及彼此相邻像素(在8单元邻域内)的所有火灾分配唯一的ID(fire_ID),并将其放入新的柱子.

I would like to assign a unique ID (fire_ID) to all of the fires that are within 4 days of each other and in adjacent pixels from one another (within the 8-cell neighborhood) and put this into a new column.

为明确起见,如果在2000年1月1日的第3行第3行中有观察,而在2000年1月4日的第2行第4行中有另一个观察,则将这些观察指定为相同的fire_ID.

To clarify, if there were an observation from row 3, col 3, Jan 1, 2000 and another from row 2, col 4, Jan 4, 2000, those observations would be assigned the same fire_ID.

下面是一个带有行"的示例数据集,行"是栅格的行ID,"cols"是栅格的列ID,日期"是检测到火灾的日期.

Below is a sample dataset with "rows", which are the row IDs of the raster, "cols", which are the column IDs of the raster, and "dates" which are the dates the fire was detected.

rows<-sample(seq(1,50,1),600, replace=TRUE)
cols<-sample(seq(1,50,1),600, replace=TRUE)
dates<-sample(seq(from=as.Date("2000/01/01"), to=as.Date("2000/02/01"), by="day"),600, replace=TRUE)
fire_df<-data.frame(rows, cols, dates)

我尝试按行",列",日期"和循环遍历对数据进行排序,以创建新的fire_ID(如果行和列ID在一个值内并且日期在4天内) ,但这显然行不通,因为如果列表中的观察值之间属于不同的fire_ID,则应分配相同的fire_ID的火灾会分配不同的fire_ID.

I've tried sorting the data by "row", then "column", then "date" and looping through, to create a new fire_ID if the row and column ID were within one value and the date was within 4 days, but this obviously doesn't work, as fires which should be assigned the same fire_ID are assigned different fire_IDs if there are observations in between them in the list that belong to a different fire_ID.

fire_df2<-fire_df[order(fire_df$rows, fire_df$cols, fire_df$date),]
fire_ID=numeric(length=nrow(fire_df2))
fire_ID[1]=1
for (i in 2:nrow(fire_df2)){
fire_ID[i]=ifelse(
fire_df2$rows[i]-fire_df2$rows[i-1]<=abs(1) & fire_df2$cols[i]-fire_df2$cols[i-1]<=abs(1) & fire_df2$date[i]-fire_df2$date[i-1]<=abs(4),
fire_ID[i-1],
i)
}
length(unique(fire_ID))
fire_df2$fire_ID<-fire_ID

如果您有任何建议,请告诉我.

Please let me know if you have any suggestions.

推荐答案

我认为此任务需要一些与层次化集群类似的东西.

I think this task requires something along the lines of hierarchical clustering.

但是,请注意,id中一定存在一定程度的任意性.这是因为,火灾簇本身完全有可能超过4天,而每次火灾与该簇中的其他火灾相距不到4天(因此应该具有相同的ID).

Note, however, that there will be necessarily some degree of arbitrariness in the ids. This is because it is entirely possible that the cluster of fires itself is longer than 4 days yet every fire is less than 4 days away from some other fire in that cluster (and thus should have the same id).

library(dplyr)

# Create the distances
fire_dist <- fire_df %>%
  # Normalize dates
  mutate( norm_dates = as.numeric(dates)/4) %>% 
  # Only keep the three variables of interest
  select( rows, cols, norm_dates ) %>%
  # Compute distance using L-infinite-norm (maximum)
  dist( method="maximum" )

# Do hierarchical clustering with "single" aggl method
fire_clust <- hclust(fire_dist, method="single")

# Cut the tree at height 1 and obtain groups
group_id <- cutree(fire_clust, h=1)

# First attach the group ids back to the data frame
fire_df2 <- cbind( fire_df, group_id ) %>%
  # Then sort the data
  arrange( group_id, dates, rows, cols ) 

# Print the first 20 records
fire_df2[1:10,]

(确保已安装dplyr库.如果未安装,则可以运行install.packages("dplyr",dep=TRUE).这是一个非常好的数据处理库,非常受欢迎)

(Make sure you have dplyr library installed. You can run install.packages("dplyr",dep=TRUE) if not installed. It is a really good and very popular library for data manipulations)

一些简单的测试:

测试#1.同样的森林大火在移动.

rows<-1:6
cols<-1:6
dates<-seq(from=as.Date("2000/01/01"), to=as.Date("2000/01/06"), by="day")
fire_df<-data.frame(rows, cols, dates)

给我这个:

  rows cols      dates group_id
1    1    1 2000-01-01        1
2    2    2 2000-01-02        1
3    3    3 2000-01-03        1
4    4    4 2000-01-04        1
5    5    5 2000-01-05        1
6    6    6 2000-01-06        1

测试2. 6种不同的随机森林大火.

set.seed(1234)

rows<-sample(seq(1,50,1),6, replace=TRUE)
cols<-sample(seq(1,50,1),6, replace=TRUE)
dates<-sample(seq(from=as.Date("2000/01/01"), to=as.Date("2000/02/01"), by="day"),6, replace=TRUE)
fire_df<-data.frame(rows, cols, dates)

输出:

rows cols      dates group_id
1    6    1 2000-01-10        1
2   32   12 2000-01-30        2
3   31   34 2000-01-10        3
4   32   26 2000-01-27        4
5   44   35 2000-01-10        5
6   33   28 2000-01-09        6

测试3:一场不断扩大的森林大火

dates <- seq(from=as.Date("2000/01/01"), to=as.Date("2000/01/06"), by="day")
rows_start <- 50
cols_start <- 50

fire_df <- data.frame(dates = dates) %>%
    rowwise() %>%
    do({
      diff = as.numeric(.$dates - as.Date("2000/01/01"))
      expand.grid(rows=seq(rows_start-diff,rows_start+diff), 
                  cols=seq(cols_start-diff,cols_start+diff),
                  dates=.$dates) 
    })

给我:

  rows cols      dates group_id
1    50   50 2000-01-01        1
2    49   49 2000-01-02        1
3    49   50 2000-01-02        1
4    49   51 2000-01-02        1
5    50   49 2000-01-02        1
6    50   50 2000-01-02        1
7    50   51 2000-01-02        1
8    51   49 2000-01-02        1
9    51   50 2000-01-02        1
10   51   51 2000-01-02        1

,依此类推. (所有正确标识为同一森林大火的记录.)

and so on. (All records identified correctly to belong to the same forest fire.)

这篇关于根据R中的空间邻域和时间标准将行分配给组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆