使用 spatstat 进行点模式分类:我做错了什么? [英] Point pattern classification with spatstat: what am I doing wrong?

查看:57
本文介绍了使用 spatstat 进行点模式分类:我做错了什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 spatstat 将双变量点模式分类为组.这些图案来源于癌症淋巴结的整个幻灯片图像.我训练了一个神经网络来识别三种类型的细胞(癌症LP"、免疫细胞bcell"和所有其他细胞).我不想分析所有其他细胞,而是使用它们来构建淋巴结形状的多边形窗口.因此,要分析的模式是多边形窗口中的免疫细胞和癌细胞.每个模式可以有几个 10k 癌细胞和最多 2mio 免疫细胞.图案属于小世界模型"类型,因为点不可能位于窗外.

我的分类应该基于癌细胞相对于免疫细胞的位置.例如.大多数癌细胞都位于免疫细胞的孤岛"上,但在某些情况下,癌细胞(似乎)是均匀分散的,只有少数免疫细胞.此外,整个节点的模式并不总是一致的.由于我对空间统计比较陌生,因此我开发了一种简单粗暴的方法来对模式进行分类.简而言之:

  1. 我用 sigma=80 计算了免疫细胞的核密度,因为这对我来说看起来很好".Den<-density(split(cells)$"bcell",sigma=80,window=cells$window)(我应该使用例如 sigma=bw.scott相反?)
  2. 然后我通过将密度范围划分为 3 个部分来创建一个镶嵌图像(在这里,我再次尝试了中断以获得一些好看的结果").

rangesDenMax<-2*range(Den)[2]/3rangeDenMin<-range(Den)[2]/3map.breaks<-c(-Inf,rangesDenMin,rangesDenMax,Inf)map.cuts <- cut(Den,breaks = map.breaks,labels = c(低B细胞密度",中等B细胞密度",高B细胞密度"))map.quartile <- tess(image = map.cuts,window=cells$window)tessImage<-map.quartile

以下是具有癌细胞叠加层(白点)的曲面细分图的一些示例.左边的淋巴结有一个典型的均匀分布的免疫细胞岛",而右边的淋巴结只有少数免疫细胞和癌细胞的密集点,不限于这些点:

热图:免疫细胞核密度,白点:癌细胞

  1. 然后我测量了一些愚蠢的变量,这应该可以让我了解癌细胞如何分布在曲面细分块中(计算代码很简单,所以我只发布了对变量的描述):

LPlwB<-c() #low-b-cell-area的癌细胞比例LPmdB<-c() # 中-b-cell-area中癌细胞的比例LPhiB<-c() #高b细胞区的癌细胞比例AlwB<-c() #low-b-cell区域的比例AmdB<-c() #中b小区面积比例AhiB<-c() #high-b-cell区域的比例LPm1<-c() # 到第一个邻居的平均距离LPm2<-c() # 到第二个邻居的平均距离LPm3<-c() # 到 3d 邻居的平均距离LPsd1<-c() # 到第一个邻居的平均距离的标准偏差LPsd2<-c() # 到第二个邻居的平均距离的标准偏差LPsd3<-c() #到3d邻居的平均距离的标准偏差meanQ<-c() # mean quadratcount(我视觉上选择的quadrat大小不要太大也不要太小)sdevQ<-c() # 均值quadratcount的标准差hiSAT<-c() #在高b细胞区域实现癌细胞饱和(观察到的细胞数量除以多个细胞,考虑到观察到的细胞之间的最小距离,可以将其拟合到该区域中)mdSAT<-c() # 在中等b细胞区域实现癌细胞饱和lwSAT<-c() # 在低b细胞区域实现癌细胞饱和ll<-c() #LP的LP邻居的比例(列联表计数除以总点数)lb<-c() # LP的b-cell邻居的比例bl<-c() #b-cells的b-cell邻居的比例bb<-c() # b-cells的LP邻居的比例

  1. 我对变量进行了 z 缩放,在 PCA 图中检查了它们(向量指向不同的方向,就像海胆的针一样)并执行了层次聚类分析.我通过计算 fviz_nbclust(scaled_variables, hcut, method = "silhouette") 来选择 k.在将树状图分成 k 个簇并检查簇稳定性后,我最终得到了我的组,这似乎是有道理的,因为孤岛"的情况与更分散"的情况分开了.

然而,考虑到 spatstat 包的可能性,我强烈地想用智能手机在墙上钉钉子.

解决方案

您似乎正在尝试量化癌细胞相对于免疫细胞的定位方式.你可以通过类似的方式来做到这一点

Cancer <- split(cells)[[LP"]]免疫 <- split(cells)[[bcell"]]Dimmune <- 密度(免疫,sigma=80)f <- rhohat(Cancer, Dimmune)情节(f)

那么 f 是一个函数,它表示癌细胞的强度(每单位面积的数量)作为免疫细胞密度的函数.该图在纵轴上显示了癌细胞的密度,在横轴上显示了免疫细胞的密度.

如果这个函数的图形是平坦的,则意味着癌细胞没有关注免疫细胞的密度.如果图表急剧下降,则意味着癌细胞倾向于避开免疫细胞.

我建议您首先查看一些示例数据集的 f 绘图,以确定 f 是否有能力区分您认为应该归类的空间排列不同的.如果是这样,那么您可以使用 as.data.frame 提取 f 的值,然后使用经典判别分析(等)将幻灯片图像分类.>

您可以使用免疫细胞的任何其他摘要来代替 density(Immune).例如 D <- distfun(Immune) 会给你到最近的免疫细胞的距离,然后 f 会计算癌细胞的密度作为到最近的免疫细胞的距离.等等.

I’am trying to classify bivariate point patterns into groups using spatstat. The patterns are derived from the whole slide images of lymph nodes with cancer. I’ve trained a neural network to recognize cells of three types (cancer "LP", immune cells "bcell" and all other cells). I do not wish to analyse all other cells but use them to construct a polygonal window in the shape of the lymph node. Thus, the patterns to be analysed are immune cells and cancer cells in polygonal windows. Each pattern can have several 10k cancer cells and up to 2mio immune cells. The patterns are of the type "Small World Model" as there is no possibility of points laying outside the window.

My classification should be based on the position of the cancer cells in relation to the immune cells. E.g. most cancer cells are laying on the "islands" of immune cells but in some cases cancer cells are (seemingly) uniformly dispersed and there are only a few immune cells. In addition, the patterns are not always uniform across the node. As I’m rather new to spatial statistics I developed a simple and crude method to classify the patterns. Here in short:

  1. I calculated a kernel density of the immune cells with sigma=80 because this looked "nice" for me. Den<-density(split(cells)$"bcell",sigma=80,window= cells$window) (Should I have used e.g. sigma=bw.scott instead?)
  2. Then I created a tessellation image by dividing density range in 3 parts (here again, I experimented with the breaks to get some "good looking results").

rangesDenMax<-2*range(Den)[2]/3
rangesDenMin<-range(Den)[2]/3
map.breaks<-c(-Inf,rangesDenMin,rangesDenMax,Inf)
map.cuts <- cut(Den, breaks = map.breaks, labels = c("Low B-cell density","Medium B-cell density", "High B-cell density"))
map.quartile <- tess(image = map.cuts,window=cells$window)
tessImage<-map.quartile

Here are some examples of the plots of the tessellations with the cancer cell overlay (white dots). The lymph node on the left has a typical uniformly distributed "islands" of immune cells while the node on the right has only a few dense spots of immune cells and cancer cells not restricted to those spots:

heat map: immune cell kernel density, white dots: cancer cells

  1. Then I measured a silly number of variables, which should give me a clue of how the cancer cells are distributed across the tessellation tiles (the calculation code is trivial so I post only the description of my variables):

LPlwB<-c() # proportion of cancer cells in low-b-cell-area 
LPmdB<-c() # proportion of cancer cells in medium-b-cell-area 
LPhiB<-c() # proportion of cancer cells in high-b-cell-area
AlwB<-c()  # proportion of the low-b-cell area
AmdB<-c()  # proportion of the medium-b-cell area
AhiB<-c()  # proportion of the high-b-cell area
LPm1<-c()  # mean distance to the 1st neighbour
LPm2<-c()  # mean distance to the 2nd neighbour
LPm3<-c()  # mean distance to the 3d neighbour
LPsd1<-c() # standard deviation of the mean distance to the 1st neighbour
LPsd2<-c() # standard deviation of the mean distance to the 2nd neighbour
LPsd3<-c() # standard deviation of the mean distance to the 3d neighbour
meanQ<-c() # mean quadratcount (I visually chose the quadrat size to be not too large and not too small)
sdevQ<-c() # standard deviation of the mean quadratcount
hiSAT<-c() # realised cancer cells saturation in high b-cell-area (number of cells observed divided by a number of cells, which could be fitted into the area considering the observed min distance between the cells)
mdSAT<-c() # realised cancer cells saturation in medium b-cell-area 
lwSAT<-c() # realised cancer cells saturation in low b-cell-area 
ll<-c() # Proportion LP neighbours of LP (contingency table count divided by total points) 
lb<-c() # Proportion b-cell neighbours of LP
bl<-c() # Proportion b-cell neighbours of b-cells
bb<-c() # Proportion LP neighbours of b-cells

  1. I z-scaled the variables, inspected them on a PCA-plot (the vectors pointed in different directions like needles of a sea urchin) and performed a hierarchical cluster analysis. I choose k by calculating fviz_nbclust(scaled_variables, hcut, method = "silhouette"). After dividing the dendrogram into k clusters and checking the cluster stability, I ended up with my groups, which seemed to make sense as cases with "islands" were separated from the "more dispersed" ones.

However, given the possibilities of the spatstat package I strongly feel like hitting nails into the wall with a smartphone.

解决方案

It seems you are trying to quantify the way in which the cancer cells are positioned relative to the immune cells. You could do this by something like

Cancer <- split(cells)[["LP"]]
Immune <- split(cells)[["bcell"]]
Dimmune <- density(Immune, sigma=80)
f <- rhohat(Cancer, Dimmune)
plot(f)

Then f is a function that indicates the intensity (number per unit area) of cancer cells as a function of the density of immune cells. The plot shows the density of cancer cells on the vertical axis, against the density of immune cells on the horizontal axis.

If the graph of this function is flat, it means that the cancer cells are not paying attention to the density of immune cells. If the graph is steeply declining it means that cancer cells tend to avoid immune cells.

I suggest you first look at the plot of f for some example datasets to decide whether f has any ability to discriminate between spatial arrangements that you think should be classified as different. If so then you can use as.data.frame to extract the values of f and then use classical discriminant analysis (etc) to classify the slide images into groups.

Instead of density(Immune) you could use any other summary of the immune cells. For example D <- distfun(Immune) would give you the distance to the nearest immune cell, and then f would compute the density of cancer cells as a function of the distance to nearest immune cell. And so on.

这篇关于使用 spatstat 进行点模式分类:我做错了什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆