使用C进行K均值聚类的数据挖掘 [英] Data mining with K-means clustering using c

查看:109
本文介绍了使用C进行K均值聚类的数据挖掘的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好!我是Anirban.目前我正在做一个k-means集群项目.有人可以给我用C语言编写的程序代码吗?请帮助我......:((:((: ((< 您可以向我发送程序代码.我的电子邮件ID为[DELETED] @ gmail.com:confused :: confused :: confused ::

[edit]除非您真的喜欢垃圾邮件,否则请不要在任何论坛中发布您的电子邮件地址!如果有人回复您,您将收到一封电子邮件通知您-OriginalGriff [/edit]

Hello everyone!this is Anirban.Currently i am working on a k-means clustering project.Can anyone give me the program code written in C language?please help me...... :(( :(( :((
You can mail me the program code.My email id is [DELETED]@gmail.com:confused::confused::confused::

[edit]Never post your email address in any forum, unless you really like spam! If anyone replies to you, you will receive an email to let you know - OriginalGriff[/edit]

推荐答案

请在发布前阅读FAQ:您不应该这样"在这里要求完整代码.尝试进行开发,每当遇到困难时,都可以在此处发布特定问题.
:)
Please read the FAQ, before posting: you shouldn''t ask here for full code. Try yourself developing and, whenever you''re stuck, post a specific question here.
:)


我希望您会找到一些有用的信息
I expect you will find some useful information here[^].




K-表示聚类是一种简单的算法.没什么好担心的. K代表簇数.所以这里是预定义的簇数.

因此,首先对预期的每个群集采取任意手段.然后计算从该均值到您的数据点的距离,并将该点分配给与聚类均值的距离最小的聚类.对所有点进行分区后,将重新计算每个聚类的均值,并重复第一步,直到聚类的均值不会因可接受的差异而改变.

因为这是迭代的,所以不必分配任意均值,而是分析数据直方图,并采用每个组的模式(如果数据具有聚类,它将显示多种模式)作为初始均值.这样,您可以减少迭代并获得更快的响应.

但是,是的,如果您具有n维数据,则此算法会变得很复杂.对于2D数据,此算法非常简单.再次是,该算法十年来取得了很大的进步,现在可以使用高级算法来提高准确性.我已经给出了基础知识,您现在可以学习更多并编写优化的代码.
Hi,

K- means clustering is a simple algorithm. Nothing much to worry. K stands for number of clusters. So here the number of clusters predefined.

So first take arbitrary means for each cluster expected. Then calculate the distance from that means to your data point and assign that point to the cluster which has minimal distance to the clusters mean. Once you have partitioned all the points recalculate the mean of each cluster and repeat the first step till the cluster mean not change from the accepted difference.

As this is iterative , instead of assign arbitrary means analyst the data histogram and take the modes of each group (if the data has clusters it will show multi modes) as the initial mean. In that way you can reduce the iterations and get faster response.

But yes, this algorithm goes complex if you have n-dimensional data. For a 2D data this very simple algorithm. Again yes, this algorithm improved a lot for a decade and now advanced algorithms available for improved accuracy. I given the basics of it, you can learn more and write your optimized codes now.


这篇关于使用C进行K均值聚类的数据挖掘的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆