列表上的< - NULL行为,以及用于删除数据的数据框架 [英] Behavior of &lt;- NULL on lists versus data.frames for removing data

查看:130
本文介绍了列表上的< - NULL行为,以及用于删除数据的数据框架的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

许多R用户终于找出了从数据中删除元素的很多方法。一种方法是使用 NULL ,特别是当您想要从 data.frame 中删除​​列时,或从列表中删除一个元素



最终,用户遇到一个想要删除几列的情况从 data.frame 一次,他们点击< - list(NULL)作为解决方案使用< - NULL 将导致错误)。



A data.frame 是一种特殊类型的列表,所以不要想象删除项目的方法从列表应该与删除 data.frame 中的列相同。但是,它们产生不同的结果,如下面的示例所示。

  ##制作一些小数据 - 两个数据.frame和两个列表
cars1< - cars2< - head(mtcars)[1:4]
cars3 < - cars4 < - as.list(cars2)

##表示'list(NULL)'方法的工作原理
cars1 [c(mpg,cyl)]< - list(NULL)
cars1
#disp hp
#马自达RX4 160 110
#马自达RX4 Wag 160 110
#Datsun 710 108 93
#大黄蜂4驱动258 110
#大黄蜂Sportabout 360 175
#Valiant 225 105

##只使用NULL的演示不起作用
cars2 [c(mpg,cyl)]< - NULL $ b $在`[< - 。data.frame`(`* tmp *`,c(mpg,cyl),value = NULL)中的b#错误:
#替换有0项,需要12

切换到相同的概念应用于列表 ,并比较行为差异。

  ##不完全删除项目,但将其设置为NULL
cars3 [c(mpg ,cyl)]< - list(NULL)
#$ mpg
#NULL

#$ cyl
#NULL

#$ disp
#[1] 160 160 108 258 360 225

#$ hp
#[1] 110 110 93 110 175 105

## * *删除`list`项目,而
##会产生一个data.frame错误
cars4 [c(mpg,cyl )]< - NULL
#$ disp
#[1] 160 160 108 258 360 225

#$ hp
#[1] 110 110 93 110 175 105






我有的主要问题如果 data.frame 是一个列表,为什么在这种情况下它的行为会有所不同?有没有一个愚蠢的方式来知道元素何时被删除,当它将产生错误时,何时将简单地给出一个 NULL 值?或者我们是否依赖于试错 -

解决方案

免责声明:这是一个相对长的答案,不是很清楚,并不是非常有趣,所以随意跳过或只阅读(一种)的结论。



我已经尝试过根据Ari B. Friedman的建议,
[< - 。data.frame 的跟踪位。调试从函数的第162行开始,其中有一个测试来确定(替换值参数)是否不是列表。



案例1:不是列表



然后它被认为是一个向量。矩阵和数组被视为一个向量,如帮助页面所示:


请注意,当替换值是数组(包括矩阵)
它是不是被视为一系列列(如data.frame和
as.data.frame'do),而是作为单列插入。 / p>

如果在LHS中仅选择了一列数据框,则唯一的约束是要替换的行数必须等于或 length(value)的倍数。如果是这种情况,如果需要,将循环使用 rep 并转换为列表。如果 length(value)== 0 ,则没有回收(因为这是不可能的),而只是转换为列表。



如果在LHS中选择了数据帧的几列,则约束有点复杂: length值)必须等于要替换的元素总数的倍数,即行数*列数。



确切的测试如下:

 (m< n * p&&(m == 0L | |(n * p)%% m))

其中 n 是行数, p 列数,而 m 长度。如果条件为FALSE,则将转换为 nxp 矩阵(如有需要则可循环使用),矩阵为



如果为NULL,则条件为TRUE,为 m == 0 ,功能停止。
请注意,每个的长度为0.出现问题。例如,

  cars1 [,c(mpg)]<  -  numeric(0)

工作,而:

  cars1 [,c(mpg,disp)]<  -  numeric )

失败的方式与 cars1 [,c(mpg ,disp)]< - NULL



案例2:一个列表



如果是一个列表,那么它用于同时替换几个列。例如:

  cars1 [,c(mpg,disp)]<  -  list(1,2) 

将用矢量替换 cars1 $ mpg 的1s和$ code> cars1 $ disp ,向量为2s。



有一种双重回收这里发生:




  • 首先,列表的长度必须少于大于或等于要替换的列数。如果它较少,那么经典的回收是完成的。

  • 其次,对于列表的每个元素,其长度必须为等于,大于或等于要替换的行数的倍数。如果它较少,则为每个列表元素执行另一次循环以匹配行数。如果更多,则显示警告。



RHS是 list(NULL),没有什么真正的发生,因为回收是不可能的( rep(NULL,10)总是 NULL )。但是代码继续,最后每个要替换的列被分配 NULL ,即删除。



摘要和(一种)结论



data.frame 列表由于数据帧的具体约束,其中每个元素必须具有相同的长度,因此行为不同。通过分配 NULL 来删除多个列,失败不是因为 NULL 值本身,而是因为 NULL 的长度为0.错误来自一个测试,它会验证分配值的长度是否要替换的元素数量(行数*列数)。 / p>

处理多个列的 value = NULL 的情况看起来并不困难(通过添加大约四行简单代码),但它需要考虑 NULL 作为一个特殊情况。我无法确定它是否被处理,因为它会破坏函数实现的逻辑,或者因为它会有副作用,我不知道。


Many R users eventually figure out lots of ways to remove elements from their data. One way is to use NULL, particularly when you want to do something like drop a column from a data.frame or drop an element from a list.

Eventually, a user comes across a situation where they want to drop several columns from a data.frame at once, and they hit upon <- list(NULL) as the solution (since using <- NULL will result in an error).

A data.frame is a special type of list, so it wouldn't be too tough to imagine that the approaches for removing items from a list should be the same as removing columns from a data.frame. However, they produce different results, as can be seen in the example below.

## Make some small data--two data.frames and two lists
cars1 <- cars2 <- head(mtcars)[1:4]
cars3 <- cars4 <- as.list(cars2)

## Demonstration that the `list(NULL)` approach works
cars1[c("mpg", "cyl")] <- list(NULL)
cars1
#                   disp  hp
# Mazda RX4          160 110
# Mazda RX4 Wag      160 110
# Datsun 710         108  93
# Hornet 4 Drive     258 110
# Hornet Sportabout  360 175
# Valiant            225 105

## Demonstration that simply using `NULL` does not work
cars2[c("mpg", "cyl")] <- NULL
# Error in `[<-.data.frame`(`*tmp*`, c("mpg", "cyl"), value = NULL) : 
#   replacement has 0 items, need 12

Switch to applying the same concept to a list, and compare the difference in behavior.

## Does not fully drop the items, but sets them to `NULL`
cars3[c("mpg", "cyl")] <- list(NULL)
# $mpg
# NULL
# 
# $cyl
# NULL
# 
# $disp
# [1] 160 160 108 258 360 225
# 
# $hp
# [1] 110 110  93 110 175 105

## *Does* drop the `list` items while this would
##   have produced an error with a `data.frame`
cars4[c("mpg", "cyl")] <- NULL
# $disp
# [1] 160 160 108 258 360 225
# 
# $hp
# [1] 110 110  93 110 175 105


The main questions I have are, if a data.frame is a list, why does it behave so differently in this scenario? Is there a foolproof way of knowing when an element will be dropped, when it will produce an error, and when it will simply be given a NULL value? Or do we depend on trial-and-error for this?

解决方案

DISCLAIMER : This is a relatively long answer, not very clear, and not very interesting, so feel free to skip it or to only read the (sort of) conclusion.

I've tried a bit of tracing on [<-.data.frame, as suggested by Ari B. Friedman. Debugging starts on line 162 of the function, where there is a test to determine if value (the replacement value argument) is not a list.

Case 1 : value is not a list

Then it is considered as a vector. Matrices and arrays are considered as one vector, like the help page says :

Note that when the replacement value is an array (including a matrix) it is not treated as a series of columns (as 'data.frame’ and ‘as.data.frame’ do) but inserted as a single column.

If only one column of the data frame is selected in the LHS, then the only constraint is that the number of rows to be replaced must be equal to or a multiple of length(value). If this is the case, value is recycled with rep if necessary and converted to a list. If length(value)==0, there is no recycling (as it is impossible), and value is just converted to a list.

If several columns of the data frame are selected in the LHS, then the constraint is a bit more complex : length(value) must be equal to or a multiple of the total number of elements to be replaced, ie the number of rows * the number of columns.

The exact test is the following :

(m < n * p && (m == 0L || (n * p)%%m))

Where n is the number of rows, p the number of columns, and m the length of value. If the condition is FALSE, then value is converted into an n x p matrix (thus recycled if necessary) and the matrix is splitted by columns into a list.

If value is NULL, then the condition is TRUE as m==0, and the function is stopped. Note that the problem occurs for every value of length 0. For example,

cars1[,c("mpg")] <- numeric(0)

works, whereas :

cars1[,c("mpg","disp")] <- numeric(0)

fails in the same way as cars1[,c("mpg","disp")] <- NULL

Case 2 : value is a list

If value is a list, then it is used to replace several columns at the same time. For example :

cars1[,c("mpg","disp")] <- list(1,2)

will replace cars1$mpg with a vector of 1s, and cars1$disp with a vector of 2s.

There is a sort of "double recycling" which happens here :

  • first, the length of the value list must be less than or equal to the number of columns to be replaced. If it is less, then a classic recycling is done.
  • second, for each element of the value list, its length must be equal to, greater than or a multiple of the number of rows to be replaced. If it is less, another recycling is done for each list element to match the number of rows. If it is more, a warning is displayed.

When the value in RHS is list(NULL), nothing really happens, as recycling is impossible (rep(NULL, 10) is always NULL). But the code continues and in the end each column to be replaced is assigned NULL, ie is removed.

Summary and (sort of) conclusion

data.frame and list behave differently because of the specific constraint on data frames, where each element must be of the same length. Removing several columns by assigning NULL fails not because of the NULL value by itself, but because NULL is of length 0. The error comes from a test which verifies if the length of the assigned value is a multiple of the number of elements to be replaced (number of rows * number of columns).

Handling the case of value=NULL for multiple columns doesn't seem difficult (by adding about four lines of simple code), but it requires to consider NULL as a special case. I'm not able to determine if it is not handled because it would break the logic of the function implementation, or because it would have side effects I don't know.

这篇关于列表上的< - NULL行为,以及用于删除数据的数据框架的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆