使用* apply函数访问数据框的列名称 [英] Access to column name of dataframe with *apply function

查看:125
本文介绍了使用* apply函数访问数据框的列名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要为初学者使用R *应用功能(第一次不使用重新整形或plyr包)制作教程



我尝试 lapply (因为我读应用对数据帧不好)一个简单的函数到这个数据框,我想使用命名列来访问数据:

  fDist < -  function(x1,x2,y1,y2){
return(0.1 * (x1-x2)^ 2 +(y1-y2)^ 2)^ 0.5)
}

数据< - read.table(textConnection(X1 Y1 X2 Y2
1 3.5 2.1 4.1 2.9
2 3.1 1.2 0.8 4.3
))

data $ dist< - lapply(data,function(df){fDist(df $ X1 ,df $ X2,df $ Y1,df $ Y2)})

我有这个错误 $ operator对于原子向量无效,可能是因为数据帧被laply修改了...有没有一个最好的方法来使用$ named列?



我用@DWin答案解决了我的第一个问题。但我有另一个问题,误解,混合数据框(数字+字符):



在我的新用例中,我使用两个函数来计算距离,因为我的目标是比较所有其他点之间的距离点。

  data2 < -  read.table(textConnection(X1 Y1 X2 Y2 
1 3.5 2.1 4.1 2.9
2 3.1 1.2 0.8 4.3
))

data2 $ char< - c(a,b)

fDist < - function(x1,y1,x2,y2){
return(0.1 *((x1-x2)^ 2 +(y1-y2)^ 2)^ 0.5)
}

fDist2< - function(fixedX,fixedY,vec){
fDist(fixedX,fixedY,vec [['X2']],vec [['Y2 ']])
}

#使用数据(没有字符的数据帧),但不与data2(具有字符的数据框)
#ok
data $ f_dist< ; - apply(data,1,function(df){fDist2(data [1,] $ X1,data [1,] $ Y1,df)})
#not ok
data2 $ f_dist< ; - apply(data2,1,function(df){fDist2(data2 [1,] $ X1,data2 [1,] $ Y1,df)})


解决方案

在这种情况下,应用是您需要的。所有的数据列都是相同的类型,你不用担心丢失属性,这就是应用程序导致问题的原因。您将需要编写不同的函数,因此只需要一个长度为4的向量:

  fDist<  -  function(vec) {
return(0.1 *((vec [1] - vec [2])^ 2 +(vec [3] -vec [4])^ 2)^ 0.5)
}
数据$ f_dist< - apply(data,1,fDist)
data
X1 Y1 X2 Y2 f_dist
1 3.5 2.1 4.1 2.9 0.1843909
2 3.1 1.2 0.8 4.3 0.3982462

如果您想使用数据中列的名称,则需要拼写正确: / p>

  fDist<  -  function(vec){
return(0.1 *((vec ['X1'] - ['X2'])^ 2 +(vec ['Y1'] - vec ['Y2'])^ 2)^ 0.5)
}
data $ f_dist< - apply(data,1 ,fDist)
data
#--------
X1 Y1 X2 Y2 f_dist
1 3.5 2.1 4.1 2.9 0.1000000
2 3.1 1.2 0.8 4.3 0.3860052

您的更新(和非常不同)的问题很容易解决。当您使用 apply 时,会强制使用最低的共模分母,在本例中为character。您有两种选择:1)将 as.numeric 添加到函数内的所有参数中,或者2)仅发送所需的列,我将说明: / p>

  data2 $ f_dist<  -  apply(data2 [,c(X2,Y2)],1,function协调)
{fDist2(data2 [1,] $ X1,data2 [1,] $ Y1,coords)})

我真的不喜欢你将参数传递给这个函数。在正式名单中使用和$看起来不正确。你应该知道,df不会是数据帧,而是一个向量。因为它不是数据帧(或列表),所以您应该更改内部的函数,以便使用[而不是[[。既然你只需要两个坐标,那么只能传递你将使用的两个(数字)。


I need to make tutorial for beginner using the R *apply function (without using reshape or plyr package in a first time)

I try to lapply (because i read apply is not good for dataframe) a simple function to this dataframe, and i want to use named column to access data :

fDist <- function(x1,x2,y1,y2) {
  return (0.1*((x1 - x2)^2 + (y1-y2)^2)^0.5)  
}

data <- read.table(textConnection("X1 Y1 X2 Y2
 1 3.5 2.1 4.1 2.9
 2 3.1 1.2 0.8 4.3
 "))

data$dist <- lapply(data,function(df) {fDist(df$X1 , df$X2 , df$Y1 , df$Y2)})

I have this error $ operator is invalid for atomic vectors, it is probably because the dataframe is modified by laply ?... is there a best way to do that with $ named column?

I resolve my first question with @DWin answer. But i have another problem, misunderstanding, with mixed dataframe (numeric + character) :

In my new use case, i use two function to compute distance, because my objective is to compare a distance Point between all of other Point.

data2 <- read.table(textConnection("X1 Y1 X2 Y2
     1 3.5 2.1 4.1 2.9
     2 3.1 1.2 0.8 4.3
     "))

data2$char <- c("a","b")

fDist <- function(x1,y1,x2,y2) {
 return (0.1*((x1 - x2)^2 + (y1-y2)^2)^0.5) 
}

fDist2 <- function(fixedX,fixedY,vec) { 
 fDist(fixedX,fixedY,vec[['X2']],vec[['Y2']])
}

# works with data (dataframe without character), but not with data2 (dataframe with character)
#ok
data$f_dist <- apply(data, 1, function(df) {fDist2(data[1,]$X1,data[1,]$Y1,df)})
#not ok
data2$f_dist <- apply(data2, 1, function(df) {fDist2(data2[1,]$X1,data2[1,]$Y1,df)})

解决方案

In this case apply is what you need. All of the data columns are of the same type and you don't have any worries about loosing attributes, which is where apply causes problems. You will need to write your function differently so it just takes one vector of length 4:

 fDist <- function(vec) {
   return (0.1*((vec[1] - vec[2])^2 + (vec[3]-vec[4])^2)^0.5)  
                        }
 data$f_dist <- apply(data, 1, fDist)
 data
   X1  Y1  X2  Y2    f_dist
1 3.5 2.1 4.1 2.9 0.1843909
2 3.1 1.2 0.8 4.3 0.3982462

If you wanted to use the names of the columns in 'data' then they need to be spelled correctly:

 fDist <- function(vec) {
   return (0.1*((vec['X1'] - vec['X2'])^2 + (vec['Y1']-vec['Y2'])^2)^0.5)  
                        }
 data$f_dist <- apply(data, 1, fDist)
 data
#--------    
X1  Y1  X2  Y2    f_dist
1 3.5 2.1 4.1 2.9 0.1000000
2 3.1 1.2 0.8 4.3 0.3860052

Your updated (and very different) question is easy to resolve. When you use apply it coerces to the lowest common mode denominator, in this case 'character'. You have two choices: either 1) add as.numeric to all of your arguments inside the functions, or 2) only send the columns that are needed which I will illustrate:

data2$f_dist <- apply(data2[ , c("X2", "Y2") ], 1, function(coords) 
                                       {fDist2(data2[1,]$X1,data2[1,]$Y1, coords)} )

I really do not like how you are passing parameters to this function. Using "[" and "$" within the formals list "just looks wrong." And you should know that "df" will not be a dataframe, but rather a vector. Because it's not a dataframe (or a list) you should alter the function inside so that it uses "[" rather than "[[". Since you only want two of the coordinates, then only pass the two (numeric) ones that you would be using.

这篇关于使用* apply函数访问数据框的列名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆