结果不一致 [英] Inconsistent results in apply

查看:123
本文介绍了结果不一致的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这基本上是在此处提出的问题(不是我本人),但是我'已经简化了示例,我根本无法弄清楚发生了什么,所以我决定以可能获得更多响应的方式再次摆出姿势.

This is basically the question asked here (not by me), but I've simplified the example and I simply can't figure out what is going on, so I decided I'd pose it again in a way that may get more responses.

获取数据dd:

dd <- structure(list(first = c("118751", "55627", NA), one = c(41006L, 
119098L, 109437L), two = c(118751L, 109016L, 109831L), three = c(122631L, 
104639L, 120634L), four = c(38017L, 118950L, 105440L), five = c(114826L, 
122047L, 124347L), six = c(109438L, 55627L, 118679L), seven = c(27094L, 
107044L, 122161L), eight = c(112473L, 116909L, 124363L), nine = c(120586L, 
114711L, 120509L)), row.names = c(NA, 3L), class = "data.frame")

dd
   first    one    two  three   four   five    six  seven  eight   nine
1 118751  41006 118751 122631  38017 114826 109438  27094 112473 120586
2  55627 119098 109016 104639 118950 122047  55627 107044 116909 114711
3   <NA> 109437 109831 120634 105440 124347 118679 122161 124363 120509

现在,我们想使用apply找到行,其中first列中的数字等于six列中的数字(这是数据帧中的第七列):

Now, we want to find the rows where the number in column first equal the number in column six (which is the seventh column in the dataframe), using apply:

apply(dd,1,function(x) as.integer(x["first"])==x[7])

    1     2     3 
FALSE FALSE    NA 

这个结果显然是错误的-2应该产生一个TRUE.奇怪的是,如果我只在第二行上运行相同的操作,我将得到正确的答案:

This result is clearly false - 2 should have produced a TRUE. Oddly, if I run the same thing ONLY on the second row, I get the correct answer:

apply(dd[2,],1,function(x) as.integer(x["first"])==x[7])

   2 
TRUE 

我还尝试了其他子集-1:2、2:3甚至c(1,3).后者给了我预期的结果,而前两个继续坚持第2行为FALSE.

I also tried other subsets - 1:2, 2:3, and even c(1,3). The latter gives me the expected result, while the first two keep insisting on a FALSE for row 2.

如果放下apply,我将获得正确的响应(无论子集如何):

If I drop the apply, I get the correct response (regardless of subset):

as.integer(dd$first)==dd$six
[1] FALSE  TRUE    NA

到底是怎么回事?

推荐答案

问题出在您的数据类型上.您的第一列是character,其余的列是整数.您尝试使用apply中的as.integer()对此进行更正,但是为时已晚. apply适用于矩阵,而不适用于数据帧.当您给它一个数据帧时,它将立即转换为矩阵.矩阵不能具有不同的列类,并且(通常)不能将character转换为numeric,因此所有数据都将转换为character.

The issue is your data types. Your first column is character, the rest of your columns are integer. You attempt to correct for this with as.integer() inside the apply, but it is too late. apply works on matrices, not data frames. When you give it a data frame, it is immediately converted to a matrix. Matrices can't have different column classes, and (generally) character can't be converted to numeric, so all your data is converted to character.

这是进行转换的窗口:

apply(dd, 1, print)
#       1        2        3       
# first "118751" "55627"  NA      
# one   " 41006" "119098" "109437"
# two   "118751" "109016" "109831"
# three "122631" "104639" "120634"
# four  " 38017" "118950" "105440"
# five  "114826" "122047" "124347"
# six   "109438" " 55627" "118679"
# seven " 27094" "107044" "122161"
# eight "112473" "116909" "124363"
# nine  "120586" "114711" "120509"

不幸的是,您可以看到还添加了空格,这使得等式不成立.

You can see that spaces are added as well, unfortunately, which makes the equality not true.

相反,请先将您的列转换为正确的类型.或者,更好的是,根本不用理会apply:

Instead, convert your column to it's proper type first. Or, better yet, don't bother with apply at all:

# convert
dd[, "first"] = as.integer(dd[, "first"])

# apply now works
apply(dd, 1, function(x) x["first"] == x[7])
#     1     2     3 
# FALSE  TRUE    NA 

# but isn't this easier?
dd[, "first"] == dd[, "six"]
# [1] FALSE  TRUE    NA

这篇关于结果不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆