申请结果不一致 [英] Inconsistent results in apply
问题描述
这基本上是这里(不是我提出的)提出的问题,但我已经简化了这个例子,我根本无法弄清楚发生了什么,所以我决定以一种可能会得到更多回应的方式再次提出它.
This is basically the question asked here (not by me), but I've simplified the example and I simply can't figure out what is going on, so I decided I'd pose it again in a way that may get more responses.
取数据dd
:
dd <- structure(list(first = c("118751", "55627", NA), one = c(41006L,
119098L, 109437L), two = c(118751L, 109016L, 109831L), three = c(122631L,
104639L, 120634L), four = c(38017L, 118950L, 105440L), five = c(114826L,
122047L, 124347L), six = c(109438L, 55627L, 118679L), seven = c(27094L,
107044L, 122161L), eight = c(112473L, 116909L, 124363L), nine = c(120586L,
114711L, 120509L)), row.names = c(NA, 3L), class = "data.frame")
dd
first one two three four five six seven eight nine
1 118751 41006 118751 122631 38017 114826 109438 27094 112473 120586
2 55627 119098 109016 104639 118950 122047 55627 107044 116909 114711
3 <NA> 109437 109831 120634 105440 124347 118679 122161 124363 120509
现在,我们要查找列first
中的数字等于列six
中的数字(这是数据框中的第七列)的行,使用<代码>申请代码>:
Now, we want to find the rows where the number in column first
equal the number in column six
(which is the seventh column in the dataframe), using apply
:
apply(dd,1,function(x) as.integer(x["first"])==x[7])
1 2 3
FALSE FALSE NA
这个结果显然是错误的 - 2 应该产生一个 TRUE.奇怪的是,如果我只在第二行运行相同的东西,我会得到正确的答案:
This result is clearly false - 2 should have produced a TRUE. Oddly, if I run the same thing ONLY on the second row, I get the correct answer:
apply(dd[2,],1,function(x) as.integer(x["first"])==x[7])
2
TRUE
我还尝试了其他子集 - 1:2、2:3,甚至 c(1,3).后者给了我预期的结果,而前两个一直坚持第 2 行为 FALSE.
I also tried other subsets - 1:2, 2:3, and even c(1,3). The latter gives me the expected result, while the first two keep insisting on a FALSE for row 2.
如果我放弃 apply
,我会得到正确的响应(无论子集如何):
If I drop the apply
, I get the correct response (regardless of subset):
as.integer(dd$first)==dd$six
[1] FALSE TRUE NA
这到底是怎么回事?
推荐答案
问题在于您的数据类型.您的第一列是 character
,其余列是整数.您尝试使用 apply
中的 as.integer()
对此进行更正,但为时已晚.apply
适用于矩阵,而不是数据框.当你给它一个数据框时,它会立即转换为矩阵.矩阵不能有不同的列类,并且(一般)character
不能转换为numeric
,所以你所有的数据都转换为character
>.
The issue is your data types. Your first column is character
, the rest of your columns are integer. You attempt to correct for this with as.integer()
inside the apply
, but it is too late. apply
works on matrices, not data frames. When you give it a data frame, it is immediately converted to a matrix. Matrices can't have different column classes, and (generally) character
can't be converted to numeric
, so all your data is converted to character
.
这是一个了解这种转化的窗口:
Here's a window into that conversion:
apply(dd, 1, print)
# 1 2 3
# first "118751" "55627" NA
# one " 41006" "119098" "109437"
# two "118751" "109016" "109831"
# three "122631" "104639" "120634"
# four " 38017" "118950" "105440"
# five "114826" "122047" "124347"
# six "109438" " 55627" "118679"
# seven " 27094" "107044" "122161"
# eight "112473" "116909" "124363"
# nine "120586" "114711" "120509"
不幸的是,您可以看到还添加了空格,这使得等式不成立.
You can see that spaces are added as well, unfortunately, which makes the equality not true.
相反,首先将您的列转换为正确的类型.或者,更好的是,根本不用 apply
:
Instead, convert your column to it's proper type first. Or, better yet, don't bother with apply
at all:
# convert
dd[, "first"] = as.integer(dd[, "first"])
# apply now works
apply(dd, 1, function(x) x["first"] == x[7])
# 1 2 3
# FALSE TRUE NA
# but isn't this easier?
dd[, "first"] == dd[, "six"]
# [1] FALSE TRUE NA
这篇关于申请结果不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!