在 R 中使用 apply 时丢失类信息 [英] Losing Class information when I use apply in R

查看:35
本文介绍了在 R 中使用 apply 时丢失类信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我使用 apply 将数据框的一行传递给函数时,我丢失了该行元素的类信息.他们都变成了性格".下面是一个简单的例子.我想在 3 个走狗时代加上几年.当我尝试添加 2 时,一个已经是数字的值 R 表示二元运算符的非数字参数".我如何避免这种情况?

When I pass a row of a data frame to a function using apply, I lose the class information of the elements of that row. They all turn into 'character'. The following is a simple example. I want to add a couple of years to the 3 stooges ages. When I try to add 2 a value that had been numeric R says "non-numeric argument to binary operator." How do I avoid this?

age = c(20, 30, 50) 
who = c("Larry", "Curly", "Mo") 
df = data.frame(who, age) 
colnames(df) <- c( '_who_', '_age_')
dfunc <- function (er) {

   print(er['_age_'])
   print(er[2])
   print(is.numeric(er[2]))

  print(class(er[2]))
  return (er[2] + 2)
}
a <- apply(df,1, dfunc)

输出如下:

_age_ 
 "20" 
_age_ 
 "20" 
[1] FALSE
[1] "character"
Error in er[2] + 2 : non-numeric argument to binary operator

推荐答案

apply 仅适用于矩阵(所有元素的类型相同).当您在 data.frame 上运行它时,它只会先调用 as.matrix.

apply only really works on matrices (which have the same type for all elements). When you run it on a data.frame, it simply calls as.matrix first.

解决此问题的最简单方法是仅处理数字列:

The easiest way around this is to work on the numeric columns only:

# skips the first column
a <- apply(df[, -1, drop=FALSE],1, dfunc)

# Or in two steps:
m <- as.matrix(df[, -1, drop=FALSE])
a <- apply(m,1, dfunc)

需要 drop=FALSE 以避免获得单个列向量.-1 表示除第一列之外的所有列,您可以改为明确指定所需的列,例如 df[, c('foo', 'bar')]

The drop=FALSE is needed to avoid getting a single column vector. -1 means all-but-the first column, you could instead explicitly specify the columns you want, for example df[, c('foo', 'bar')]

更新

如果您希望函数一次访问一个完整的 data.frame 行,则有(至少)两个选项:

If you want your function to access one full data.frame row at a time, there are (at least) two options:

# "loop" over the index and extract a row at a time
sapply(seq_len(nrow(df)), function(i) dfunc(df[i,]))

# Use split to produce a list where each element is a row
sapply(split(df, seq_len(nrow(df))), dfunc)

第一个选项可能更适合大型数据框,因为它不必预先创建巨大的列表结构.

The first option is probably better for large data frames since it doesn't have to create a huge list structure upfront.

这篇关于在 R 中使用 apply 时丢失类信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆