应用 K 表示对 3 个暗数据进行聚类 [英] Applying K means clustering to 3 dim data

查看:52
本文介绍了应用 K 表示对 3 个暗数据进行聚类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 (52,168,2) 维数据集上的 sklearn 中应用 k-means 聚类.正如预期的那样,它为估计器提供了维度错误,因为预期是 2D 数据.前进的道路应该是什么?

I am trying to apply k-means clustering in sklearn on a (52,168,2) dimensional dataset. As expected, it's giving dimension error for the estimator as 2D data is expected. What should be the way forward?

我在两个单独的文件中保存了一年的风力和负载数据,这两个文件的每一行中都有每周数据(一小时的分辨率).风和负载数据是相关的(即,第 1 周的风数据对应于第 2 周).我正在尝试应用 K 均值聚类来将操作时间从 52 周减少到适当的周数(理想情况下为 12 周).因此,在这种情况下,每个数据点都是一个 168*2 np 阵列,其中包含每周风和负载数据.

I have wind and load data in two separate files for a year with weekly data (one-hour resolution) in each row in both the files. The wind and load data are correlated (i.e., week 1 wind data corresponds to week 2). I am trying to apply K-means clustering to reduce operating periods from 52 weeks to an appropriate number of weeks(ideally 12 weeks). Hence, each data point, in this case, is a 168*2 np array with weekly wind and load data combined.

数据的维度是 (52,168,2),因为我有 52 周,每个数据点是 168*2.但是,我无法将其应用于 sklearn k-means,因为它需要二维数据.我想知道我是否将数据重塑为 data.reshape(52,168*2),它会保留我的目标吗?

The dimension of data is coming out to be (52,168,2), since I have 52 weeks and each data point is 168*2. However, I can't apply this to sklearn k-means as it requires 2D data. I am wondering if i reshape data as data.reshape(52,168*2), will it preserve what I am aiming to do?

Load_data = pd.read_csv('Scenario_Load_Data.csv', header = None) 
Load_data_final = Load_data.to_numpy() 
Wind_data = pd.read_csv('Scenario_Wind_Data.csv', header = None) 
Wind_data_final = Wind_data.to_numpy()

create_list = []

for i in range(len(Load_data_final)):
 intermediate_v = np.column_stack((Load_data_final[i,:],Wind_data_final[i,:]))
create_list.append(intermediate_v)
data = np.array(create_list)

ValueError: 发现数组,dim 为 3.估计器预期 <= 2.

ValueError: Found array with dim 3. Estimator expected <= 2.

推荐答案

当您想按周分组时,我相信您可以将风和负载数据连接到同一个数组中.我的意思是,1 周将是一条线,而 168 + 168 将是属性.所以,你会有类似的东西:

As you wanna group that by week, I believe that you can concatenate the wind and load data in the same array. I mean, 1 week will be a line and 168 + 168 will be the attributes. So, you're gonna have something like:

Week_1: at1, at2, at3, ..., at336    
Week_2: at1, at2, at3, ..., at336    
...    
Week_52: at1, at2, at3, ..., at336

所以,我认为这很像您打算使用 reshape

SO, I think it's pretty much like you're intending to do with reshape

这篇关于应用 K 表示对 3 个暗数据进行聚类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆