如何在原始数据中导出具有ID的k-means算法的输出(集群标签) [英] How to export the output (cluster labels) of k-means algorithm with the ids in the original data

查看：222 发布时间：2020/4/26 10:24:53 python export-to-csv k-means id

本文介绍了如何在原始数据中导出具有ID的k-means算法的输出(集群标签)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个汇总网络的数据，包括用户的cookie ID，会话ID，资料数和网络中的跳转数.我想将它们聚类并进一步分析.因此，需要知道哪个会话在哪个群集中标记了哪个cookie ID.示例数据:

I have a data summarising a network including users' cookie id, session id, number of materials, and number of jumps in the network. I would like to cluster them and further analyse them. So, need to know which cookie id in which session is labelled in which cluster. Example data:

cookie_id|ses_num|num_material|num_jump
2345         1        2           1 
2345         2        8           12
3456         1        3           2

我已经使用后两列应用了k-means聚类，但是无法将聚类输出返回到正确的ID，因为我无法使用Cookie ID和会话ID作为聚类的输入.

I have applied k-means clustering using the last two columns but cannot return the clustering output to the right id as I cannot use cookie id and session id as input for clustering.

columns = defaultdict(list) 
with open('num_jumps_materials_in_network.csv',"r") as file: 
    reader = csv.reader(file, delimiter='|', quotechar='"')
    next(reader)
    for row in reader: 
        for i, v in enumerate(row): 
           columns[i].append(v) 

cookie_id = columns[0]
ses_num = columns[1]
num_mat = columns[2]
num_jump = columns[3]

x1 = []
x2 = []

i = 0
while (i<len(num_mat)):
    a = int(num_mat[i])
    b = int(num_jump[i])
    x1.append(a)
    x2.append(b)
    i+=1

X = np.array(list(zip(x1, x2))).reshape(len(x1), 2)

# 6 according to elbow method
kmeans = KMeans(n_clusters=6)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)


fig, (ax1, ax2) = pyplot.subplots(2, figsize=(15,15))
fig.suptitle('Clustering users by k-means (k=6)')
# whole 
ax1.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=30, cmap='gist_rainbow')
# closer look 
ax2.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=30, cmap='gist_rainbow')
ax2.set_xlim([0, 500])
ax2.set_ylim([0, 500])

pyplot.savefig('k_means_clusters_demo.png')

我想输出如下结果:

cookie_id|ses_num|num_material|num_jump|cluster
2345         1        2           1        0
2345         2        8           12       2
3456         1        3           2        1

非常感谢，答:

如何在原始数据中导出具有ID的k-means算法的输出(集群标签) [英] How to export the output (cluster labels) of k-means algorithm with the ids in the original data

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在原始数据中导出具有ID的k-means算法的输出(集群标签) [英] How to export the output (cluster labels) of k-means algorithm with the ids in the original data

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭