R编程:dataframe $ column [< boolean>] =< value>工作? [英] R programming: How does dataframe$column[<boolean>] = <value> work?

查看:56
本文介绍了R编程:dataframe $ column [< boolean>] =< value>工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  df = data.frame(c(-2,-1,1,2),NA)
colnames(df)<-c( values, pos_neg)
标志<-with(df,values< 0)
df $ pos_neg [flag] = negative
df $ pos_neg [!flag] = positive

给我这个





它可以按预期工作。问题是我不确定如何或为什么这样做。如果将布尔值放在方括号中,会发生什么情况?到目前为止,我认为数据帧是一个数组,我只能按数字( df [1] )或按名称(如果可用)访问值( df [ pants] )。



预先感谢!

解决方案

在值不是全部 NA 后,查看子集是否容易一些:

  df<-data.frame(values = c(-2,-1,1, 2),
pos_neg = NA)
标志<-df $ values< 0

df $ pos_neg [flag]<-负数
df $ pos_neg [!flag]<-正数

这里的第一个重要概念是数据框是变量的列表(具有类,一些限制和许多方法,但仍然是列表)( 列),而不是二维数组(矩阵)。因此, $ [[子集会提取单个变量,它是单个向量,所以

  df $ pos_neg 
#> [1]负负正正

您可以使用逻辑向量,因此逻辑子集的工作方式与 c('a','b')[c(FALSE TRUE)] 一样:

  df $ pos_neg [flag] 
#> [1]负负
df $ pos_neg [!flag]
#> [1]正正

使用 <-分配给这些子集的方法在这里起作用,因为您提供的是长度为1的向量,该向量将被回收以适合该子集。 b $ b

在数据帧上使用带有两个参数(用于行和列)的 [子集,例如 df [2:3,'values'] 在某些方面更复杂,即使从矩阵类似物来看更直观。特别是 [。data.frame 方法默认情况下为 drop = TRUE ,这可能会使它不清楚返回另一个数据帧或向量。在大多数情况下,这无关紧要,但这可能会导致程序用法出现错误。



使用 [子集在数据帧上具有单个参数,例如 df [1] 的作用类似于 [对列表的处理,按名称,索引或逻辑掩码设置列,总是返回相同类别的另一个列表(即另一个数据框)。


This

df = data.frame(c(-2,-1,1,2), NA)
colnames(df) <- c("values", "pos_neg")
flag <- with(df, values < 0)
df$pos_neg[flag] = "negative"
df$pos_neg[!flag] = "positive"

gives me this

And it works as intended. The problem is that I'm not really sure how or why it does. What happens exactly if I put a boolean value into the brackets? Up to now I thought a dataframe is an array and I can access values only by number (df[1]) or by name if available (df["pants"]).

Thanks in advance!

解决方案

It's a little easier to see if you look at the subsetting after the values are not all NA:

df <- data.frame(values = c(-2,-1,1,2), 
                 pos_neg = NA)
flag <- df$values < 0

df$pos_neg[flag] <- "negative"
df$pos_neg[!flag] <- "positive"

The first important concept here is that a data frame is a list (with a class, some restrictions and lots of methods, but still a list) of variables ("columns"), not a two-dimensional array (a matrix). Thus, $ or [[ subsetting pulls out a single variable, which is a single vector, so

df$pos_neg
#> [1] "negative" "negative" "positive" "positive"

You can subset any vector with a logical vector, so logical subsetting works just like c('a', 'b')[c(FALSE TRUE)] does:

df$pos_neg[flag]
#> [1] "negative" "negative"
df$pos_neg[!flag]
#> [1] "positive" "positive"

Using <- to assign to those subsets works here because you are supplying a length-1 vector that is getting recycled to fit the subset.


Using [ subsetting with two parameters (for rows and columns) on a data frame, e.g. df[2:3, 'values'] is in some regards more complicated, even if more intuitive from the matrix analogue. In particular, the [.data.frame method defaults to drop = TRUE, which can make it unclear if it will return another data frame or a vector. Most of the time this doesn't matter, but it can cause bugs in programmatic usages.

Using [ subsetting with a single parameter on a data frame, e.g. df[1], works like [ does on a list, subsetting columns by name, index, or logical mask and always returning another list of the same class (i.e. another data frame).

这篇关于R编程:dataframe $ column [&lt; boolean&gt;] =&lt; value&gt;工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆