定义k-1个簇质心-SKlearn KMeans [英] Define k-1 cluster centroids -- SKlearn KMeans

查看：162 发布时间：2021/2/15 19:02:57 python scikit-learn k-means

本文介绍了定义k-1个簇质心-SKlearn KMeans的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在对部分标记的数据集进行二进制分类.我对它的1有一个可靠的估计，但对它的0没有一个可靠的估计.

I am performing a binary classification of a partially labeled dataset. I have a reliable estimate of its 1's, but not of its 0's.

来自sklearn KMeans文档:

From sklearn KMeans documentation:

init : {‘k-means++’, ‘random’ or an ndarray}
Method for initialization, defaults to ‘k-means++’:   
If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.

我想传递一个ndarray，但是我只有1个可靠的质心，而不是2个.

I would like to pass an ndarray, but I only have 1 reliable centroid, not 2.

有没有办法使第K-1个质心和第K个质心之间的熵最大化?另外，是否有一种方法可以手动初始化K-1重心并使用K ++进行其余操作?

Is there a way to maximize the entropy between the K-1st centroids and the Kth? Alternatively, is there a way to manually initialize K-1 centroids and use K++ for the remaining?

================================================ ========

=======================================================

推荐答案

我有足够的信心按预期工作，但是如果发现错误，请更正我. (来自极客为极客拼凑而成):

I'm reasonably confident this works as intended, but please correct me if you spot an error. (cobbled together from geeks for geeks):


import sys

def distance(p1, p2): 
    return np.sum((p1 - p2)**2)


def find_remaining_centroid(data, known_centroids, k = 1): 
    ''' 
    initialized the centroids for K-means++ 
    inputs: 
        data - Numpy array containing the feature space
        known_centroid - Numpy array containing the location of one or multiple known centroids
        k - remaining centroids to be found
    '''
    n_points = data.shape[0]

    # Initialize centroids list
    if known_centroids.ndim > 1:
        centroids = [cent for cent in known_centroids]
    
    else:
        centroids = [np.array(known_centroids)]

    # Perform casting if necessary
    if isinstance(data, pd.DataFrame):
        data = np.array(data)
        
    # Add a randomly selected data point to the list  
    centroids.append(data[np.random.randint( 
            n_points), :])
    
    # Compute remaining k-1 centroids
    for c_id in range(k - 1):
        ## initialize a list to store distances of data 
        ## points from nearest centroid 
        dist = np.empty(n_points)

        for i in range(n_points):
            point = data[i, :] 
            d = sys.maxsize 

            ## compute distance of 'point' from each of the previously 
            ## selected centroid and store the minimum distance 
            for j in range(len(centroids)): 
                temp_dist = distance(point, centroids[j]) 
                d = min(d, temp_dist) 

            dist[i] = d

        ## select data point with maximum distance as our next centroid 
        next_centroid = data[np.argmax(dist), :] 
        centroids.append(next_centroid) 

        # Reinitialize distance array for next centroid
        dist = np.empty(n_points)
    

    
    return centroids[-k:]

它的用法:

# For finding a third centroid:
third_centroid = find_remaining_centroid(X_train, np.array([presence_seed, absence_seed]), k = 1)

# For finding the second centroid:
second_centroid = find_remaining_centroid(X_train, presence_seed, k = 1)

presence_seed和missing_seed是已知的质心位置.

Where presence_seed and absence_seed are known centroid locations.

这篇关于定义k-1个簇质心-SKlearn KMeans的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

定义k-1个簇质心-SKlearn KMeans [英] Define k-1 cluster centroids -- SKlearn KMeans

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

定义k-1个簇质心-SKlearn KMeans [英] Define k-1 cluster centroids -- SKlearn KMeans

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭