如何在 R 中正确使用列表? [英] How to Correctly Use Lists in R?

查看:34
本文介绍了如何在 R 中正确使用列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简要背景:许多(大多数?)当代广泛使用的编程语言至少有少数共同的 ADT [抽象数据类型],特别是

  • string(由字符组成的序列)

  • list(值的有序集合)和

  • 基于映射的类型(将键映射到值的无序数组)

在R编程语言中,前两者分别实现为charactervector.

当我开始学习 R 时,有两件事几乎从一开始就很明显:list 是 R 中最重要的数据类型(因为它是 R data.frame 的父类),其次,我就是无法理解它们是如何工作的,至少还不足以在我的代码中正确使用它们.

一方面,在我看来,R 的 list 数据类型是映射 ADT 的直接实现(Python 中的 dictionaryNSMutableDictionary> 在 Objective C 中,hash 在 Perl 和 Ruby 中,object literal 在 Javascript 中,等等).

例如,您可以像创建 Python 字典一样创建它们,方法是将键值对传递给构造函数(在 Python 中是 dict 而不是 list):

x = list("ev1"=10, "ev2"=15, "rv"="Group 1")

并且您可以像访问 Python 字典一样访问 R 列表的项目,例如 x['ev1'].同样,您可以通过以下方式仅检索 'keys' 或仅检索 'values':

names(x) # 只获取 R 列表的键"# [1] "ev1" "ev2" "rv"unlist(x) # 只获取 R 列表的值"# ev1 ev2 rv# "10" "15" "第 1 组"x = list("a"=6, "b"=9, "c"=3)总和(取消列表(x))# [1] 18

但是 R list不同于 其他地图类型的 ADT(无论如何我都学过这些语言).我的猜测是,这是 S 初始规范的结果,即打算从头开始设计数据/统计 DSL [领域特定语言].

三个 R list 与其他广泛使用的语言(例如 Python、Perl、JavaScript)中的映射类型之间的显着差异:

first, list 在 R 中是一个有序集合,就像向量一样,即使值是键控的(即键可以是任何可散列值,而不仅仅是连续整数).其他语言中的映射数据类型几乎总是无序.

second, lists 可以从函数返回,即使你在调用函数时从未传入list,并且即使返回 list 的函数不包含(显式)list 构造函数(当然,您可以通过包装在实践中处理这个在调用 unlist) 时返回的结果:

x = strsplit(LETTERS[1:10], "") # 传入'character'类型的对象class(x) # 返回列表",而不是长度为 2 的向量# [1] 列表

R 的lists 的第三个​​特殊功能:它们似乎不能成为另一个 ADT 的成员,如果您尝试这样做,那么主要容器被强制为 list.例如,

x = c(0.5, 0.8, 0.23, list(0.5, 0.2, 0.9), recursive=TRUE)班级(x)# [1] 列表

我在这里的目的不是批评该语言或它的记录方式;同样,我并不是暗示 list 数据结构或它的行为方式有什么问题.我所追求的只是纠正我对它们如何工作的理解,以便我可以在我的代码中正确使用它们.

以下是我想更好地理解的内容:

  • 确定函数调用何时返回list(例如,上面引用的strsplit 表达式)的规则是什么?

  • 如果我没有为 list 明确指定名称(例如,list(10,20,30,40))只是默认名称以1开头的连续整数?(我假设,但我不确定答案是肯定的,否则我们将无法将这种类型的 list 强制转换为带有对 unlist.)

  • 为什么这两个不同的运算符 [][[]] 返回相同的结果?>

    x = list(1, 2, 3, 4)

    两个表达式都返回1":

    x[1]

    x[[1]]

  • 为什么这两个表达式返回相同的结果?

    x = list(1, 2, 3, 4)

    x2 = list(1:4)

请不要将我指向 R 文档(?list, R-intro)--我已经仔细阅读了它,但它并不能帮助我回答上面提到的问题类型.

(最后,我最近了解到并开始使用名为 的 R 包(在 CRAN 上可用)hash 它通过一个 S4 类实现了传统地图类型的行为;我当然可以推荐这个包.)

解决方案

只是为了解决您问题的最后一部分,因为那确实指出了 listvector<之间的区别/code> 在 R 中:

<块引用>

为什么这两个表达式返回的结果不一样?

x = 列表(1, 2, 3, 4);x2 = 列表(1:4)

列表可以包含任何其他类作为每个元素.所以你可以有一个列表,其中第一个元素是一个字符向量,第二个元素是一个数据框,等等.在这种情况下,你已经创建了两个不同的列表.x 有四个向量,每个长度为 1.x2 有 1 个长度为 4 的向量:

>长度(x[[1]])[1] 1>长度(x2[[1]])[1] 4

所以这些是完全不同的列表.

R 列表非常类似于 哈希映射 数据结构,因为每个索引值都可以与任何对象相关联.这是一个包含 3 个不同类(包括一个函数)的列表的简单示例:

>complex.list <- list(a"=1:4, b"=1:3, c"=matrix(1:4, nrow=2), d"=search)>lapply(complicated.list, class)$a[1] 整数"$b[1] 整数"$c[1]矩阵"$d[1]功能"

鉴于最后一个元素是搜索功能,我可以这样调用它:

>complex.list[[d"]]]()[1].GlobalEnv"...

作为最后的评论:应该注意 data.frame 实际上是一个列表(来自 data.frame 文档):

<块引用>

数据框是具有唯一行名称的相同行数的变量列表,给定类data.frame"

这就是为什么 data.frame 中的列可以有不同的数据类型,而矩阵中的列不能.例如,这里我尝试创建一个包含数字和字符的矩阵:

><- 1:4>类(一)[1] 整数">b <- c(a"、b"、c"、d")>d <- cbind(a, b)>d乙[1,] 1"一个"[2,] 2"b"[3,]3"c"[4,]4"d">类(d[,1])[1]字符"

注意我无法将第一列中的数据类型更改为数字,因为第二列有字符:

>d[,1] <- as.numeric(d[,1])>类(d[,1])[1]字符"

Brief background: Many (most?) contemporary programming languages in widespread use have at least a handful of ADTs [abstract data types] in common, in particular,

  • string (a sequence comprised of characters)

  • list (an ordered collection of values), and

  • map-based type (an unordered array that maps keys to values)

In the R programming language, the first two are implemented as character and vector, respectively.

When I began learning R, two things were obvious almost from the start: list is the most important data type in R (because it is the parent class for the R data.frame), and second, I just couldn't understand how they worked, at least not well enough to use them correctly in my code.

For one thing, it seemed to me that R's list data type was a straightforward implementation of the map ADT (dictionary in Python, NSMutableDictionary in Objective C, hash in Perl and Ruby, object literal in Javascript, and so forth).

For instance, you create them just like you would a Python dictionary, by passing key-value pairs to a constructor (which in Python is dict not list):

x = list("ev1"=10, "ev2"=15, "rv"="Group 1")

And you access the items of an R List just like you would those of a Python dictionary, e.g., x['ev1']. Likewise, you can retrieve just the 'keys' or just the 'values' by:

names(x)    # fetch just the 'keys' of an R list
# [1] "ev1" "ev2" "rv"

unlist(x)   # fetch just the 'values' of an R list
#   ev1       ev2        rv 
#  "10"      "15" "Group 1" 

x = list("a"=6, "b"=9, "c"=3)  

sum(unlist(x))
# [1] 18

but R lists are also unlike other map-type ADTs (from among the languages I've learned anyway). My guess is that this is a consequence of the initial spec for S, i.e., an intention to design a data/statistics DSL [domain-specific language] from the ground-up.

three significant differences between R lists and mapping types in other languages in widespread use (e.g,. Python, Perl, JavaScript):

first, lists in R are an ordered collection, just like vectors, even though the values are keyed (ie, the keys can be any hashable value not just sequential integers). Nearly always, the mapping data type in other languages is unordered.

second, lists can be returned from functions even though you never passed in a list when you called the function, and even though the function that returned the list doesn't contain an (explicit) list constructor (Of course, you can deal with this in practice by wrapping the returned result in a call to unlist):

x = strsplit(LETTERS[1:10], "")     # passing in an object of type 'character'

class(x)                            # returns 'list', not a vector of length 2
# [1] list

A third peculiar feature of R's lists: it doesn't seem that they can be members of another ADT, and if you try to do that then the primary container is coerced to a list. E.g.,

x = c(0.5, 0.8, 0.23, list(0.5, 0.2, 0.9), recursive=TRUE)

class(x)
# [1] list

my intention here is not to criticize the language or how it is documented; likewise, I'm not suggesting there is anything wrong with the list data structure or how it behaves. All I'm after is to correct is my understanding of how they work so I can correctly use them in my code.

Here are the sorts of things I'd like to better understand:

  • What are the rules which determine when a function call will return a list (e.g., strsplit expression recited above)?

  • If I don't explicitly assign names to a list (e.g., list(10,20,30,40)) are the default names just sequential integers beginning with 1? (I assume, but I am far from certain that the answer is yes, otherwise we wouldn't be able to coerce this type of list to a vector w/ a call to unlist.)

  • Why do these two different operators, [], and [[]], return the same result?

    x = list(1, 2, 3, 4)

    both expressions return "1":

    x[1]

    x[[1]]

  • why do these two expressions not return the same result?

    x = list(1, 2, 3, 4)

    x2 = list(1:4)

Please don't point me to the R Documentation (?list, R-intro)--I have read it carefully and it does not help me answer the type of questions I recited just above.

(lastly, I recently learned of and began using an R Package (available on CRAN) called hash which implements conventional map-type behavior via an S4 class; I can certainly recommend this Package.)

解决方案

Just to address the last part of your question, since that really points out the difference between a list and vector in R:

Why do these two expressions not return the same result?

x = list(1, 2, 3, 4); x2 = list(1:4)

A list can contain any other class as each element. So you can have a list where the first element is a character vector, the second is a data frame, etc. In this case, you have created two different lists. x has four vectors, each of length 1. x2 has 1 vector of length 4:

> length(x[[1]])
[1] 1
> length(x2[[1]])
[1] 4

So these are completely different lists.

R lists are very much like a hash map data structure in that each index value can be associated with any object. Here's a simple example of a list that contains 3 different classes (including a function):

> complicated.list <- list("a"=1:4, "b"=1:3, "c"=matrix(1:4, nrow=2), "d"=search)
> lapply(complicated.list, class)
$a
[1] "integer"
$b
[1] "integer"
$c
[1] "matrix"
$d
[1] "function"

Given that the last element is the search function, I can call it like so:

> complicated.list[["d"]]()
[1] ".GlobalEnv" ...

As a final comment on this: it should be noted that a data.frame is really a list (from the data.frame documentation):

A data frame is a list of variables of the same number of rows with unique row names, given class ‘"data.frame"’

That's why columns in a data.frame can have different data types, while columns in a matrix cannot. As an example, here I try to create a matrix with numbers and characters:

> a <- 1:4
> class(a)
[1] "integer"
> b <- c("a","b","c","d")
> d <- cbind(a, b)
> d
 a   b  
[1,] "1" "a"
[2,] "2" "b"
[3,] "3" "c"
[4,] "4" "d"
> class(d[,1])
[1] "character"

Note how I cannot change the data type in the first column to numeric because the second column has characters:

> d[,1] <- as.numeric(d[,1])
> class(d[,1])
[1] "character"

这篇关于如何在 R 中正确使用列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆