申请结果不一致 [英] Inconsistent results in apply

查看:47
本文介绍了申请结果不一致的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这基本上是这里(不是我提出的)提出的问题,但我已经简化了这个例子,我根本无法弄清楚发生了什么,所以我决定以一种可能会得到更多回应的方式再次提出它.

This is basically the question asked here (not by me), but I've simplified the example and I simply can't figure out what is going on, so I decided I'd pose it again in a way that may get more responses.

取数据dd:

dd <- structure(list(first = c("118751", "55627", NA), one = c(41006L, 
119098L, 109437L), two = c(118751L, 109016L, 109831L), three = c(122631L, 
104639L, 120634L), four = c(38017L, 118950L, 105440L), five = c(114826L, 
122047L, 124347L), six = c(109438L, 55627L, 118679L), seven = c(27094L, 
107044L, 122161L), eight = c(112473L, 116909L, 124363L), nine = c(120586L, 
114711L, 120509L)), row.names = c(NA, 3L), class = "data.frame")

dd
   first    one    two  three   four   five    six  seven  eight   nine
1 118751  41006 118751 122631  38017 114826 109438  27094 112473 120586
2  55627 119098 109016 104639 118950 122047  55627 107044 116909 114711
3   <NA> 109437 109831 120634 105440 124347 118679 122161 124363 120509

现在,我们要查找列first 中的数字等于列six 中的数字(这是数据框中的第七列)的行,使用<代码>申请:

Now, we want to find the rows where the number in column first equal the number in column six (which is the seventh column in the dataframe), using apply:

apply(dd,1,function(x) as.integer(x["first"])==x[7])

    1     2     3 
FALSE FALSE    NA 

这个结果显然是错误的 - 2 应该产生一个 TRUE.奇怪的是,如果我只在第二行运行相同的东西,我会得到正确的答案:

This result is clearly false - 2 should have produced a TRUE. Oddly, if I run the same thing ONLY on the second row, I get the correct answer:

apply(dd[2,],1,function(x) as.integer(x["first"])==x[7])

   2 
TRUE 

我还尝试了其他子集 - 1:2、2:3,甚至 c(1,3).后者给了我预期的结果,而前两个一直坚持第 2 行为 FALSE.

I also tried other subsets - 1:2, 2:3, and even c(1,3). The latter gives me the expected result, while the first two keep insisting on a FALSE for row 2.

如果我放弃 apply,我会得到正确的响应(无论子集如何):

If I drop the apply, I get the correct response (regardless of subset):

as.integer(dd$first)==dd$six
[1] FALSE  TRUE    NA

这到底是怎么回事?

推荐答案

问题在于您的数据类型.您的第一列是 character,其余列是整数.您尝试使用 apply 中的 as.integer() 对此进行更正,但为时已晚.apply 适用于矩阵,而不是数据框.当你给它一个数据框时,它会立即转换为矩阵.矩阵不能有不同的列类,并且(一般)character不能转换为numeric,所以你所有的数据都转换为character>.

The issue is your data types. Your first column is character, the rest of your columns are integer. You attempt to correct for this with as.integer() inside the apply, but it is too late. apply works on matrices, not data frames. When you give it a data frame, it is immediately converted to a matrix. Matrices can't have different column classes, and (generally) character can't be converted to numeric, so all your data is converted to character.

这是一个了解这种转化的窗口:

Here's a window into that conversion:

apply(dd, 1, print)
#       1        2        3       
# first "118751" "55627"  NA      
# one   " 41006" "119098" "109437"
# two   "118751" "109016" "109831"
# three "122631" "104639" "120634"
# four  " 38017" "118950" "105440"
# five  "114826" "122047" "124347"
# six   "109438" " 55627" "118679"
# seven " 27094" "107044" "122161"
# eight "112473" "116909" "124363"
# nine  "120586" "114711" "120509"

不幸的是,您可以看到还添加了空格,这使得等式不成立.

You can see that spaces are added as well, unfortunately, which makes the equality not true.

相反,首先将您的列转换为正确的类型.或者,更好的是,根本不用 apply:

Instead, convert your column to it's proper type first. Or, better yet, don't bother with apply at all:

# convert
dd[, "first"] = as.integer(dd[, "first"])

# apply now works
apply(dd, 1, function(x) x["first"] == x[7])
#     1     2     3 
# FALSE  TRUE    NA 

# but isn't this easier?
dd[, "first"] == dd[, "six"]
# [1] FALSE  TRUE    NA

这篇关于申请结果不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆