如何对新数据集评分 [英] How to Score on a new Data Set

查看:132
本文介绍了如何对新数据集评分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们已经在R中为集群建立了模型。现在,我们希望将模型的方程式部署给我们想要集群的新客户。在SAS中,群集节点用于提供群集SAS代码,我们只需要插入新的输入变量即可。
在R中有办法吗?我们如何导出聚类方程?

We have built models in R for Clustering. We now want the equation of the model to be deployed for the new customers whom we want to Cluster. In SAS, the Cluster node used to provide a Clustering SAS code where we only had to to plug the new input variables. Is there a way to do that in R? How can we export the Cluster equation?

下面的示例使用标准虹膜数据集。

An example of the same is as below using the standard iris dataset.

irisnew <- iris
library("cluster", lib.loc="~/R/win-library/3.2")
(kc <- kmeans(irisnew, 3)) 

K-means clustering with 3 clusters of sizes 62, 38, 50

Cluster means:
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     5.901613    2.748387     4.393548    1.433871
2     6.850000    3.073684     5.742105    2.071053
3     5.006000    3.428000     1.462000    0.246000

Clustering vector:
  [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [39] 3 3 3 3 3 3 3 3 3 3 3 3 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [77] 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 1 2 2 2 2 2 2 1
[115] 1 2 2 2 2 1 2 1 2 1 2 2 1 1 2 2 2 2 2 1 2 2 2 2 1 2 2 2 1 2 2 2 1 2 2 1

Within cluster sum of squares by cluster:
[1] 39.82097 23.87947 15.15100
 (between_SS / total_SS =  88.4 %)

现在已经定义了簇,我有一个新的花瓣数据集,我需要根据上述簇规则对其进行分类。我的问题是我该如何导出规则?通常,规则定义为

Now that the Cluster is defined, i have a new dataset for petals that I need to classify according to the above clustering rules. My Question is how do i export the rules do that? Typically the rules are defined as

x = a1 * Sepal.Length + a2 * Sepal.Width +a3 * Petal.Length + a4 * Petal.Width + b
Then if x between z1 and z2 then Cluster1
else if x between z3 and z4 then Cluster2
else if x between z5 and z6 then Cluster3
else Cluster4

谢谢,
Manish

Thanks, Manish

推荐答案

对于通用模型使用-predict.glm(glm.model,newdata = newdf))

For Generic Models Use - predict.glm(glm.model, newdata = newdf))

对于集群使用- 分配集群的简单方法k均值聚类后获取新数据

For clustering Use - Simple approach to assigning clusters for new data after k-means clustering

这篇关于如何对新数据集评分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆