如何正确使用R中的列表? [英] How to Correctly Use Lists in R?

查看:236
本文介绍了如何正确使用R中的列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简要背景:广泛使用的许多(大多数)当代编程语言至少有少数ADT [抽象数据类型]的共同点,特别是 b
$ b

  • 字符串(由字符组成的序列)


  • 列表(值的有序集合)和


  • 基于地图的类型(将键映射到值的无序数组)




在R编程语言中,前两个实现为字符 vector



当我开始学习R时,几乎从一开始就有两件事情是显而易见的: 列表是R中最重要的数据类型(因为它是R data.frame 的父类)第二,我不明白他们是如何工作的,至少不够好在我的代码中正确使用它们。



有一件事,在我看来R的 lis t 数据类型是Python中简单实现的地图ADT(字典 NSMutableDictionary 在Objective C中,Perl和Ruby中的 hash ,Javascript中的对象文字等)。



例如,您可以像Python字典一样创建它们,方法是将键值对传递给构造函数(在Python中为 dict not list ):

  x = list = 10,ev2= 15,rv=组1)

访问R列表的项目就像您将使用Python字典的项目,例如x ['ev1']。同样,您可以通过以下方式检索键或值:

  names(x)#fetch just R列表的键
#[1]ev1ev2rv

unlist(x)#只获取R列表的'值'
#ev1 ev2 rv
#1015组1

x = list(a= 6,b= 9,c= 3) b
$ b sum(unlist(x))
#[1] 18

但R 列表也是 不同于 其他地图类型的ADT(从我学到的语言无论如何)。我的猜测是,这是S的初始规范的结果,即从根本上设计数据/统计DSL [特定于域的语言]的意图。



三个 R 列表之间的显着差异和其他语言的映射类型使用(例如,Python,Perl,JavaScript):



第一个列表在R中的s是有序的集合,就像向量一样,即使值被键入(即,键可以是任何可散列值,而不仅仅是顺序整数)。几乎总是,其他语言的映射数据类型是无序的。



第二个即使您在调用该函数时从未传递过列表,即使 可以从函数返回列表 >返回列表的函数不包含(显式)列表构造函数(当然,您可以处理在实践中通过将返回的结果包裹在调用 unlist )中:

  x = strsplit(LETTERS [1:10],)#传递一个类型为'character'的对象

class(x)#返回'list',而不是长度的向量2
#[1]列表

第三个特有的功能的R $ 列表 s:似乎它们不是另一个ADT的成员,如果您尝试这样做,则主容器被强制为列表。例如,

  x = c(0.5,0.8,0.23,list(0.5,0.2,0.9),recursive = TRUE)

class(x)
#[1]列表

我的这里的意图是不要批评语言或者如何记录;同样的,我并不是建议使用 list 数据结构或它的行为有什么问题。我所追求的是纠正是我对如何工作的理解,所以我可以在我的代码中正确使用它们。



以下是我想更好地了解的各种事项:




  • 什么是确定什么时候函数调用将返回一个列表(例如, strsplit 表达式的规则如果我没有明确地将名称分配给列表(例如, list(10,20,30,40))是以1开头的顺序整数的默认名称? (我假设,但我远不能肯定答案是肯定的,否则我们将无法强制这种类型的列表到向量w / unlist


  • 为什么这两个不同的运算符 code>和 [[]] ,返回相同的结果



    x = list(1,2,3,4)



    两个表达式返回1:



    x [1]



    x [[1]]


  • 为什么这两个表达式返回相同的结果?



    x =列表(1,2,3,4)



    x2 = list(1:4)




请不要指出我的R文档( ?list R-intro ) - 我仔细阅读,我(不,我最近才学到并开始使用R Package(可用于CRAN),名为<一个href =http://mran.revolutionanalytics.com/packages/info/?hash =noreferrer> 哈希 实现常规的地图类型行为通过S4类;我可以推荐这个套餐。)

解决方案

只是为了解决你的问题的最后一部分,因为这真的指出了差异在R $中的列表向量之间


为什么这两个表达式不会返回相同的结果?



x =列表(1,2,3,4); x2 = list(1:4)


列表可以包含任何其他类作为每个元素。所以你可以有第一个元素是一个字符向量的列表,第二个是一个数据框架等。在这种情况下,你创建了两个不同的列表。 x有四个向量,每个长度为1. x2有1个长度为4的向量:

 > length(x [[1]])
[1] 1
>长度(x2 [[1]])
[1] 4

不同的清单。



R列表非常像哈希贴图数据结构,其中每个索引值可以与任何对象相关联。以下是一个包​​含3个不同类(包括函数)的列表的简单示例:

 > complex.list<  -  list(a= 1:4,b= 1:3,c=矩阵(1:4,nrow = 2),d=搜索)
> ; lapply(complex.list,class)
$ a
[1]整数

$ b [1]整数
$ c
[1]矩阵
$ d
[1]函数

鉴于最后一个元素是搜索函数,我可以这样调用:

 > complex.list [[d]]()
[1].GlobalEnv...

作为对此的最终评论:应该注意的是,一个 data.frame 真的是一个列表(来自data.frame文档):


数据帧是具有唯一行名称的相同数量行的变量列表,给定类'data.frame'


这就是为什么一个data.frame中的列可以有不同的数据类型,而矩阵中的列不能。例如,这里我尝试创建一个数字和字符的矩阵:

 > a<  -  1:4 
> class(a)
[1]integer
> b< - c(a,b,c,d)
> d< - cbind(a,b)
> d
ab
[1,]1a
[2,]2b
[3,]3c
[4,]4d
> class(d [,1])$ ​​b $ b [1]character

不能将第一列中的数据类型更改为数字,因为第二列具有字符:

 > d [,1]<  -  as.numeric(d [,1])$ ​​b $ b> class(d [,1])$ ​​b $ b [1]character


Brief background: Many (most?) contemporary programming languages in widespread use have at least a handful of ADTs [abstract data types] in common, in particular,

  • string (a sequence comprised of characters)

  • list (an ordered collection of values), and

  • map-based type (an unordered array that maps keys to values)

In the R programming language, the first two are implemented as character and vector, respectively.

When I began learning R, two things were obvious almost from the start: list is the most important data type in R (because it is the parent class for the R data.frame), and second, I just couldn't understand how they worked, at least not well enough to use them correctly in my code.

For one thing, it seemed to me that R's list data type was a straightforward implementation of the map ADT (dictionary in Python, NSMutableDictionary in Objective C, hash in Perl and Ruby, object literal in Javascript, and so forth).

For instance, you create them just like you would a Python dictionary, by passing key-value pairs to a constructor (which in Python is dict not list):

x = list("ev1"=10, "ev2"=15, "rv"="Group 1")

And you access the items of an R List just like you would those of a Python dictionary, e.g., x['ev1']. Likewise, you can retrieve just the 'keys' or just the 'values' by:

names(x)    # fetch just the 'keys' of an R list
# [1] "ev1" "ev2" "rv"

unlist(x)   # fetch just the 'values' of an R list
#   ev1       ev2        rv 
#  "10"      "15" "Group 1" 

x = list("a"=6, "b"=9, "c"=3)  

sum(unlist(x))
# [1] 18

but R lists are also unlike other map-type ADTs (from among the languages I've learned anyway). My guess is that this is a consequence of the initial spec for S, i.e., an intention to design a data/statistics DSL [domain-specific language] from the ground-up.

three significant differences between R lists and mapping types in other languages in widespread use (e.g,. Python, Perl, JavaScript):

first, lists in R are an ordered collection, just like vectors, even though the values are keyed (ie, the keys can be any hashable value not just sequential integers). Nearly always, the mapping data type in other languages is unordered.

second, lists can be returned from functions even though you never passed in a list when you called the function, and even though the function that returned the list doesn't contain an (explicit) list constructor (Of course, you can deal with this in practice by wrapping the returned result in a call to unlist):

x = strsplit(LETTERS[1:10], "")     # passing in an object of type 'character'

class(x)                            # returns 'list', not a vector of length 2
# [1] list

A third peculiar feature of R's lists: it doesn't seem that they can be members of another ADT, and if you try to do that then the primary container is coerced to a list. E.g.,

x = c(0.5, 0.8, 0.23, list(0.5, 0.2, 0.9), recursive=TRUE)

class(x)
# [1] list

my intention here is not to criticize the language or how it is documented; likewise, I'm not suggesting there is anything wrong with the list data structure or how it behaves. All I'm after is to correct is my understanding of how they work so I can correctly use them in my code.

Here are the sorts of things I'd like to better understand:

  • What are the rules which determine when a function call will return a list (e.g., strsplit expression recited above)?

  • If I don't explicitly assign names to a list (e.g., list(10,20,30,40)) are the default names just sequential integers beginning with 1? (I assume, but I am far from certain that the answer is yes, otherwise we wouldn't be able to coerce this type of list to a vector w/ a call to unlist.

  • why do these two different operators, [], and [[]], return the same result?

    x = list(1, 2, 3, 4)

    both expressions return "1":

    x[1]

    x[[1]]

  • why do these two expressions not return the same result?

    x = list(1, 2, 3, 4)

    x2 = list(1:4)

Please don't point me to the R Documentation (?list, R-intro)--I have read it carefully and it does not help me answer the type of questions I recited just above.

(lastly, I recently learned of and began using an R Package (available on CRAN) called hash which implements conventional map-type behavior via an S4 class; I can certainly recommend this Package.)

解决方案

Just to address the last part of your question, since that really points out the difference between a list and vector in R:

Why do these two expressions not return the same result?

x = list(1, 2, 3, 4); x2 = list(1:4)

A list can contain any other class as each element. So you can have a list where the first element is a character vector, the second is a data frame, etc. In this case, you have created two different lists. x has four vectors, each of length 1. x2 has 1 vector of length 4:

> length(x[[1]])
[1] 1
> length(x2[[1]])
[1] 4

So these are completely different lists.

R lists are very much like a hash map data structure in that each index value can be associated with any object. Here's a simple example of a list that contains 3 different classes (including a function):

> complicated.list <- list("a"=1:4, "b"=1:3, "c"=matrix(1:4, nrow=2), "d"=search)
> lapply(complicated.list, class)
$a
[1] "integer"
$b
[1] "integer"
$c
[1] "matrix"
$d
[1] "function"

Given that the last element is the search function, I can call it like so:

> complicated.list[["d"]]()
[1] ".GlobalEnv" ...

As a final comment on this: it should be noted that a data.frame is really a list (from the data.frame documentation):

A data frame is a list of variables of the same number of rows with unique row names, given class ‘"data.frame"’

That's why columns in a data.frame can have different data types, while columns in a matrix cannot. As an example, here I try to create a matrix with numbers and characters:

> a <- 1:4
> class(a)
[1] "integer"
> b <- c("a","b","c","d")
> d <- cbind(a, b)
> d
 a   b  
[1,] "1" "a"
[2,] "2" "b"
[3,] "3" "c"
[4,] "4" "d"
> class(d[,1])
[1] "character"

Note how I cannot change the data type in the first column to numeric because the second column has characters:

> d[,1] <- as.numeric(d[,1])
> class(d[,1])
[1] "character"

这篇关于如何正确使用R中的列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆