“通过胁迫引入的NA”指的是: R中的聚类分析期间 [英] "NAs introduced by coercion" during Cluster Analysis in R

查看:128
本文介绍了“通过胁迫引入的NA”指的是: R中的聚类分析期间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

伙计们,我是这种语言的新手,正在对数据框进行聚类分析,但是当我计算距离时,会收到此警告强制引入NA。这是什么意思?

Guys I'm new to this language ,I'm running cluster analysis on a data frame but when I calculate the distance I get this warning "NAs introduced by coercion". What does this mean?

d <- dist(as.matrix(mydata1))

  Warning message:
In dist(as.matrix(mydata1)) : NAs introduced by coercion

我的数据样本是

Metafamily     Total         July cpc      July cse_pla    July offline   July organic  
xerox 8560     275.829417    0.20943223    0.032628862     0.169210813    0.1130048 
office-supplie  246.9125664  0.057833047   0.020209909     0.535358617    0.136165617

除了Metafamily列之外,所有列都是

In this apart from Metafamily column all columns are numeric in class.

请帮我解决这个问题。

推荐答案

是造成问题的第一列:

> a <- c("1", "2",letters[1:5], "3")
> as.numeric(a)
[1]  1  2 NA NA NA NA NA  3
Warning message:
NAs introduced by coercion 

dist 内部,必须强制数字化,从而生成上面的NA。

Inside dist there must be a coercion to numeric, which generates the NA as above.

我建议不使用第一列就应用 dist ,或者最好将其移至 rownames ,因为结果会有所不同。

I'd suggestion to apply dist without the first column or better move that to rownames if possible, because the result will be different:

> dist(df)
          1         2         3         4
2 1.8842186                              
3 1.9262360 1.2856110                    
4 3.2137871 1.7322788 2.9838920          
5 1.3299455 0.9872963 1.9158079 1.8889050
Warning message:
In dist(df) : NAs introduced by coercion
> dist(df[-1])
         1        2        3        4
2 1.538458                           
3 1.572765 1.049697                  
4 2.624046 1.414400 2.436338         
5 1.085896 0.806124 1.564251 1.542284

btw:呼叫时不需要 as.matrix dist 。无论如何,它都会在内部完成。

btw: you don't need as.matrix when calling dist. It'll do that anyway internally.

编辑:使用行名

rownames(df) <- df$id

> df
  id       var1       var2
A  A -0.6264538 -0.8204684
B  B  0.1836433  0.4874291
C  C -0.8356286  0.7383247
D  D  1.5952808  0.5757814
E  E  0.3295078 -0.3053884

> dist(df[-1]) # you colud also remove the 1st col at all, using df$id <- NULL.
         A        B        C        D
B 1.538458                           
C 1.572765 1.049697                  
D 2.624046 1.414400 2.436338         
E 1.085896 0.806124 1.564251 1.542284

这篇关于“通过胁迫引入的NA”指的是: R中的聚类分析期间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆