围绕一个单独点的密度聚类-Python [英] Density clustering around a separate point - Python

查看：71 发布时间：2021/4/22 19:44:00 python cluster-analysis dbscan optics-algorithm

本文介绍了围绕一个单独点的密度聚类-Python的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的目标是根据xy点的邻近程度对其进行聚类.具体来说，是将彼此靠近的分组点进行分组.我也希望使用一个单独的参考点来对数据进行聚类.

注意:我有多组数据需要独立集群.例如，使用下面的代码， Item 中的每个唯一值表示一组不同的数据.我可以有多个唯一的数据集，它们的稀疏性各不相同.因此，任何通过预定数量簇的技术都是不现实的，因为我每次都必须手动检查拟合并调整适当的参数.

正因为如此，到目前为止， best 方法一直是某种形式的密度聚类(DBSCAN，OPTICS).

但是，当我将紧密聚集的点聚在一起时，我希望传递一些截止值，以使目标聚类保持球形.另一方面，我不想减小太多的可到达区域，因为我缺少了靠近参考点和核心点的点，但是有一个很小的间隙会丢弃我希望包含的点.

以下内容显示以下两难处境.第1项表示可达性应如何降低，以确保参考品脱周围的聚类点是球形的.而第2项显示了如何将可到达区域设置为更高以允许将密集区域内的点包括在内.

我希望我可以调整参数或包括一个单独的功能而不是强制使用它.由于参考点周围的密集区域可能会发生变化，因此我不愿意强制排除特定半径之外的每个点.

 将pandas导入为pd导入matplotlib.pyplot作为plt将numpy导入为np从sklearn.cluster导入DBSCAN将seaborn导入为sns从sklearn.cluster导入光学无花果，ax = plt.subplots(figsize =(6,6))ax.grid(假)df = pd.DataFrame({'Item':[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1，1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2，2,2,2,2,2,2,2]，'x':[-4.0，-1.0,0.5,0.0,0.0,2.0,3.0,5.0,10.0，-2.0,2.0,5.0,7.5,15.0,0.0，-22.0，-20.0，-20.0，-6.5，20.5,0.0,20.0，-20.0，-15.0,20.0，-15.0，-10.0，-2.0,0.0,3.0，-3.0，-7.0，-7.5，-9.0，-4.0,1.5，-1.0，-5.0，-4.5，-3.7,15.0，-20.0，-22.0，-20.0，-20.0，-12.0,20.5,6.0,20.0，-20.0，-15.0,20.0，-15.0，-10.0]，'y':[0.0,1.0，-0.5,0.5，-0.5,0.0,1.0,0.0,0.0，-2.0，-2.0，-7.0，-0.5，-10.5，-7.5,0.0,16.0，-15.0，5.0,13.5,3.0，-20.0,2.0，-17.5，-15,19.0,20.0,4.0，-2.0,0.0,0.0,2.5,2.0，-1.5,5.0,0.0,3.5,2.0，-5.5，-6.5，-10.5，-20.5,0.0,16.0，-15.0,5.0,13.5,6.0，-20.0,2.0，-17.5，-15,19.0,20.0]，'X_Ref':[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0，0.0,0.0,0.0,0.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0]，'Y_Ref':[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0，0.0,0.0,0.0,0.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0]，})#非球形df = df [df ['Item'] == 1]#球形，但可到达的区域太小#df = df [df ['Item'] == 2]df ['distance'] = np.sqrt((df ['X_Ref']-df ['x'])** 2 +(df ['Y_Ref']-df ['y'])** 2)Y_sklearn = df [['x'，'y']].valuesax.scatter(df ['x']，df ['y']，marker ='o'，s = 5)ax.scatter(df ['X_Ref']，df ['Y_Ref']，c ='w'，edgecolor ='k'，marker ='o'，s = 7.5，zorder = 2)#clusterer = DBSCAN(eps = 7.5，min_samples = 3)#labels_clusters = clusterer.fit_predict(Y_sklearn)聚类器= OPTICS(最小样本数= 2，xi = 0.25，min_cluster_size = 0.25，max_eps = 5)clusterer.fit(Y_sklearn)labels_clusters = clusterer.fit_predict(Y_sklearn)#将群集标签作为新列添加到原始DataFrame.df ['cluster'] = labels_clustersdf ['cluster'] = df ['cluster'].astype('category')sns.scatterplot(data = df，x ='x'，y ='y'，色相=群集"，斧=斧，图例=已满"，)

第1项:应从核心点中排除半径右侧的点

第2项:半径内的点应包含在核心点中

解决方案

我相信我们可以重新提出问题.我不确定群集方法是最好的.

通过使用距离进行聚类

 "https://stackoverflow.com/questions/66099958/density-clustering-around-a-separate-point-python"将熊猫作为pd导入导入matplotlib.pyplot作为plt将numpy导入为np从sklearn.cluster导入DBSCAN将seaborn导入为sns从sklearn.cluster导入光学从sklearn.cluster导入MiniBatchKMeans，KMeans导入matplotlib.pyplot作为plt#非球形df = pd.DataFrame({'x':[-4.0，-1.0,0.5,0.0,0.0,2.0,3.0,5.0,12.0，-2.0,2.0,8.0,8.5,15.0，-20.0，-22.0，-20.0，-20.0，-10.0，20.5,0.0,20.0，-20.0，-15.0,20.0，-15.0，-10.0]，'y':[0.0,1.0，-0.5,0.5，-0.5,0.0,1.0,0.0,0.0，-2.0，-2.0，-8.0，-0.5，-10.5，-20.5,0.0,16.0，-15.0，5.0,13.5,3.0，-20.0,2.0，-17.5，-15,19.0,20.0]，'X_Ref':[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0，0.0,0.0,0.0,0.0]，'Y_Ref':[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0，0.0,0.0,0.0,0.0]，})#球形，但可到达的区域太小df1 = pd.DataFrame({'x':[-2.0,0.0,2.0，-3.0，-7.0，-7.5，-9.0，-4.0,1.5，-1.0，-5.0，-4.5，-3.7,15.0，-20.0，-22.0，-20.0，-20.0，-15.0,20.5,8.0,20.0，-20.0，-15.0,20.0，-15.0，-10.0]，'y':[4.0，-2.0,0.0,0.0,2.5,2.0，-2.0,5.0,0.0,3.5,2.0，-5.5，-6.5，-10.5，-20.5,0.0,16.0，-15.0,5.0，13.5,5.0，-20.0,2.0，-17.5，-15,19.0,20.0]，'X_Ref':[-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0，-4.0]，'Y_Ref':[-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0，-1.0]，})#距离计算df ['distance'] = np.sqrt((df ['X_Ref']-df ['x'])** 2 +(df ['Y_Ref']-df ['y'])** 2)def distance_func(df):返回np.sqrt((df ['X_Ref']-df ['x'])** 2 +(df ['Y_Ref']-df ['y'])** 2)df1 ['distance'] = distance_func(df1)#更改图形df = df1.copy()Y_sklearn = df ['distance'].values.reshape(-1，1)无花果，ax = plt.subplots(figsize =(6,6))ax.grid(假)ax.scatter(df ['x']，df ['y']，marker ='o'，s = 5)ax.scatter(df ['X_Ref']，df ['Y_Ref']，c ='w'，edgecolor ='k'，marker ='o'，s = 7.5，zorder = 2)clusterer = KMeans(init ='k-means ++'，n_clusters = 2，n_init = 10)clusterer.fit(Y_sklearn)labels_clusters = clusterer.fit_predict(Y_sklearn)#将群集标签作为新列添加到原始DataFrame.df ['cluster'] = labels_clustersdf ['cluster'] = df ['cluster'].astype('category')sns.scatterplot(data = df，x ='x'，y ='y'，色相=群集"，斧=斧，图例=已满"，)

对于df:

对于df1:

使用面积的边际增加

如前所述，我相信可以使用边缘区域的概念来重新提出问题.我们每次添加的每个点都会以不同的方式增加.

换句话说，对每个点使用从碎石图可以看到点数变得太多了.我会说选择10分可能很好.选择是基于Elbow方法.

最终剧情:

对于df1:

丝网图:

按照肘部方法的标准，最好是13分.

最终剧情:

I'm aiming to cluster xy points based on their proximity. Specifically, grouping points that are positioned closely to each other. I'm also hoping to use a separate reference point to cluster the data from.

Note: I have multiple sets of data that need to be clustered independently. For example using below, each unique value in Item signifies a different set of data. I could have multiple unique sets of data that all vary in sparsity. Therefore, any technique that passes a predetermined number of clusters isn't realistic as I'll have to manually check the fit and adjust the appropriate parameter every time.

As such, the best method thus far has been some form of density clustering (DBSCAN, OPTICS).

However, while I'm clustering points that are closely together, I'm hoping to pass some cut-off to keep the intended cluster spherical. On the other hand, I don't want to reduce the reachable area too much as I'm missing points that are close to the reference point and the core points but a small gap discards points that I'm hoping to include.

The following displays the dilemma below. Item 1 represents how the reachable should be lower to ensure the clustered points around the reference pint is spherical. While Item 2 shows how the reachable area needs to be higher to allow for points that are within the dense area to be included.

I'm hoping I can adjust a parameter or include a separate feature rather than force it. Because the dense area around the reference point can vary I'm reluctant to force every point outside a specific radius to be excluded.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import DBSCAN
import seaborn as sns
from sklearn.cluster import OPTICS

fig, ax = plt.subplots(figsize = (6,6))
ax.grid(False)

df = pd.DataFrame({   
    'Item' : [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2],                                
    'x' : [-4.0,-1.0,0.5,0.0,0.0,2.0,3.0,5.0,10.0,-2.0,2.0,5.0,7.5,15.0,0.0,-22.0,-20.0,-20.0,-6.5,20.5,0.0,20.0,-20.0,-15.0,20.0,-15.0,-10.0,-2.0,0.0,3.0,-3.0,-7.0,-7.5,-9.0,-4.0,1.5,-1.0,-5.0,-4.5,-3.7,15.0,-20.0,-22.0,-20.0,-20.0,-12.0,20.5,6.0,20.0,-20.0,-15.0,20.0,-15.0,-10.0],
    'y' : [0.0,1.0,-0.5,0.5,-0.5,0.0,1.0,0.0,0.0,-2.0,-2.0,-7.0,-0.5,-10.5,-7.5,0.0,16.0,-15.0,5.0,13.5,3.0,-20.0,2.0,-17.5,-15,19.0,20.0,4.0,-2.0,0.0,0.0,2.5,2.0,-1.5,5.0,0.0,3.5,2.0,-5.5,-6.5,-10.5,-20.5,0.0,16.0,-15.0,5.0,13.5,6.0,-20.0,2.0,-17.5,-15,19.0,20.0],     
    'X_Ref' : [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0],
    'Y_Ref' : [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0],           
   })

# not spherical
df = df[df['Item'] == 1]

# spherical but reachable area too small
#df = df[df['Item'] == 2]

df['distance'] = np.sqrt((df['X_Ref'] - df['x'])**2 + (df['Y_Ref'] - df['y'])**2)

Y_sklearn = df[['x','y']].values

ax.scatter(df['x'], df['y'], marker = 'o', s = 5)
ax.scatter(df['X_Ref'], df['Y_Ref'], c = 'w', edgecolor = 'k', marker = 'o', s = 7.5, zorder = 2)

#clusterer = DBSCAN(eps = 7.5, min_samples = 3)
#labels_clusters = clusterer.fit_predict(Y_sklearn)

clusterer = OPTICS(min_samples = 2, xi = 0.25, min_cluster_size = 0.25, max_eps = 5)
clusterer.fit(Y_sklearn)
labels_clusters = clusterer.fit_predict(Y_sklearn)

#Add cluster labels as a new column to original DataFrame.
df['cluster'] = labels_clusters
df['cluster'] = df['cluster'].astype('category')

sns.scatterplot(data = df,
            x = 'x',
            y = 'y',
            hue = 'cluster',
            ax = ax,
            legend = 'full',                
            )

Item 1: points to the right of radius should be excluded from core points

Item 2: points within radius should be included in core points

解决方案

I believe we could reformulate the problem. I am not sure the clustering approach is the best.

By clustering using distance

""""
https://stackoverflow.com/questions/66099958/density-clustering-around-a-separate-point-python
"""
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import DBSCAN
import seaborn as sns
from sklearn.cluster import OPTICS
from sklearn.cluster import MiniBatchKMeans, KMeans
import matplotlib.pyplot as plt

# not spherical
df = pd.DataFrame({
    'x' : [-4.0,-1.0,0.5,0.0,0.0,2.0,3.0,5.0,12.0,-2.0,2.0,8.0,8.5,15.0,-20.0,-22.0,-20.0,-20.0,-10.0,20.5,0.0,20.0,-20.0,-15.0,20.0,-15.0,-10.0],
    'y' : [0.0,1.0,-0.5,0.5,-0.5,0.0,1.0,0.0,0.0,-2.0,-2.0,-8.0,-0.5,-10.5,-20.5,0.0,16.0,-15.0,5.0,13.5,3.0,-20.0,2.0,-17.5,-15,19.0,20.0],
    'X_Ref' : [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],
    'Y_Ref' : [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],
       })

# spherical but reachable area too small
df1 = pd.DataFrame({
    'x' : [-2.0,0.0,2.0,-3.0,-7.0,-7.5,-9.0,-4.0,1.5,-1.0,-5.0,-4.5,-3.7,15.0,-20.0,-22.0,-20.0,-20.0,-15.0,20.5,8.0,20.0,-20.0,-15.0,20.0,-15.0,-10.0],
    'y' : [4.0,-2.0,0.0,0.0,2.5,2.0,-2.0,5.0,0.0,3.5,2.0,-5.5,-6.5,-10.5,-20.5,0.0,16.0,-15.0,5.0,13.5,5.0,-20.0,2.0,-17.5,-15,19.0,20.0],
    'X_Ref' : [-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0],
    'Y_Ref' : [-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0],
   })

#Distance calculations
df['distance'] = np.sqrt((df['X_Ref'] - df['x'])**2 + (df['Y_Ref'] - df['y'])**2)
def distance_func(df):
    return np.sqrt((df['X_Ref'] - df['x']) ** 2 + (df['Y_Ref'] - df['y']) ** 2)
df1['distance'] = distance_func(df1)

# Change this for the graphs
df = df1.copy()
Y_sklearn = df['distance'].values.reshape(-1, 1)
fig, ax = plt.subplots(figsize = (6,6))
ax.grid(False)
ax.scatter(df['x'], df['y'], marker = 'o', s = 5)
ax.scatter(df['X_Ref'], df['Y_Ref'], c = 'w', edgecolor = 'k', marker = 'o', s = 7.5, zorder = 2)
clusterer = KMeans(init='k-means++', n_clusters=2, n_init=10)
clusterer.fit(Y_sklearn)
labels_clusters = clusterer.fit_predict(Y_sklearn)

#Add cluster labels as a new column to original DataFrame.
df['cluster'] = labels_clusters
df['cluster'] = df['cluster'].astype('category')

sns.scatterplot(data = df,
            x = 'x',
            y = 'y',
            hue = 'cluster',
            ax = ax,
            legend = 'full',
            )

For df:

For df1:

By using marginal increase of area

As mentioned earlier I believe the problem could be reformulate using the idea of marginal area. Each point we add every time will increase the are considered in different ways.

In other words, use the elbow method for each point.

For area calculation I will just proxy be distance to the power of two.

Code:

""""
https://stackoverflow.com/questions/66099958/density-clustering-around-a-separate-point-python
"""
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import DBSCAN
import seaborn as sns
from sklearn.cluster import OPTICS
from sklearn.cluster import MiniBatchKMeans, KMeans
import matplotlib.pyplot as plt



# not spherical
df = pd.DataFrame({
    'x' : [-4.0,-1.0,0.5,0.0,0.0,2.0,3.0,5.0,12.0,-2.0,2.0,8.0,8.5,15.0,-20.0,-22.0,-20.0,-20.0,-10.0,20.5,0.0,20.0,-20.0,-15.0,20.0,-15.0,-10.0],
    'y' : [0.0,1.0,-0.5,0.5,-0.5,0.0,1.0,0.0,0.0,-2.0,-2.0,-8.0,-0.5,-10.5,-20.5,0.0,16.0,-15.0,5.0,13.5,3.0,-20.0,2.0,-17.5,-15,19.0,20.0],
    'X_Ref' : [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],
    'Y_Ref' : [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],
       })

# spherical but reachable area too small
df1 = pd.DataFrame({
    'x' : [-2.0,0.0,2.0,-3.0,-7.0,-7.5,-9.0,-4.0,1.5,-1.0,-5.0,-4.5,-3.7,15.0,-20.0,-22.0,-20.0,-20.0,-15.0,20.5,8.0,20.0,-20.0,-15.0,20.0,-15.0,-10.0],
    'y' : [4.0,-2.0,0.0,0.0,2.5,2.0,-2.0,5.0,0.0,3.5,2.0,-5.5,-6.5,-10.5,-20.5,0.0,16.0,-15.0,5.0,13.5,5.0,-20.0,2.0,-17.5,-15,19.0,20.0],
    'X_Ref' : [-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0,-4.0],
    'Y_Ref' : [-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0],
   })

df['distance'] = np.sqrt((df['X_Ref'] - df['x'])**2 + (df['Y_Ref'] - df['y'])**2)

def distance_func(df):
    return np.sqrt((df['X_Ref'] - df['x']) ** 2 + (df['Y_Ref'] - df['y']) ** 2)

df1['distance'] = distance_func(df1)

# To shiwtch from one dataset to another.
#df=df1.copy()
df['distance_2'] = df['distance']**2


df.sort_values('distance',inplace=True)
#pd.DataFrame(df['marginal_change'].values).plot()
aux = pd.DataFrame(df['distance_2'].values, columns=['distance ** 2'])
aux.plot()


fig, ax = plt.subplots(figsize = (6,6))
ax.grid(False)
ax.scatter(df['x'], df['y'], marker = 'o', s = 5)
ax.scatter(df['X_Ref'], df['Y_Ref'], c = 'w', edgecolor = 'k', marker = 'o', s = 7.5, zorder = 2)


selected_top=10
labels_clusters = np.zeros(df.shape[0])
labels_clusters[0:selected_top] =1

#Add cluster labels as a new column to original DataFrame.
df['cluster'] = labels_clusters
df['cluster'] = df['cluster'].astype('category')

sns.scatterplot(data = df,
            x = 'x',
            y = 'y',
            hue = 'cluster',
            ax = ax,
            legend = 'full',
            )

For df:

Scree plot From the scree plot you can see were the number of points is becoming too much. I will say the selection of 10 points could be good. The selection is based on the Elbow method.

Final plot:

For df1:

Scree plot:

Following Elbow method criteria 13 points could be the optimal.

Final plot:

这篇关于围绕一个单独点的密度聚类-Python的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

围绕一个单独点的密度聚类-Python [英] Density clustering around a separate point - Python

问题描述

通过使用距离进行聚类

对于df:

对于df1:

使用面积的边际增加

对于df1:

By clustering using distance

For df:

For df1:

By using marginal increase of area

For df:

For df1:

相关文章

Python最新文章

热门教程

热门工具

登录关闭

围绕一个单独点的密度聚类-Python [英] Density clustering around a separate point - Python

问题描述

通过使用距离进行聚类

对于df:

对于df1:

使用面积的边际增加

对于df1:

By clustering using distance

For df:

For df1:

By using marginal increase of area

For df:

For df1:

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭