是否有语法糖来定义 R 中的数据框 [英] Is there syntactic sugar to define a data frame in R

查看:28
本文介绍了是否有语法糖来定义 R 中的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想按地区重新组合美国各州,因此我需要定义一个美国州"->美国地区"映射函数,这是通过设置适当的数据框来完成的.

I want to regroup US states by regions and thus I need to define a "US state" -> "US Region" mapping function, which is done by setting up an appropriate data frame.

基础是这个练习(显然这是辐射联邦"的地图):

The basis is this exercise (apparently this is a map of the "Commonwealth of the Fallout"):

从原始形式的原始列表开始:

One starts off with an original list in raw form:

Alabama = "Gulf"
Arizona = "Four States"
Arkansas = "Texas"
California = "South West"
Colorado = "Four States"
Connecticut = "New England"
Delaware = "Columbia"

最终导致此 R 代码:

which eventually leads to this R code:

us_state <- c("Alabama","Arizona","Arkansas","California","Colorado","Connecticut",
"Delaware","District of Columbia","Florida","Georgia","Idaho","Illinois","Indiana",
"Iowa","Kansas","Kentucky","Louisiana","Maine","Maryland","Massachusetts","Michigan",
"Minnesota","Mississippi","Missouri","Montana","Nebraska","Nevada","New Hampshire",
"New Jersey","New Mexico","New York","North Carolina","North Dakota","Ohio","Oklahoma",
"Oregon","Pennsylvania","Rhode Island","South Carolina","South Dakota","Tennessee",
"Texas","Utah","Vermont","Virginia","Washington","West Virginia ","Wisconsin","Wyoming")

us_region <- c("Gulf","Four States","Texas","South West","Four States","New England",
"Columbia","Columbia","Gulf","Southeast","North West","Midwest","Midwest","Plains",
"Plains","East Central","Gulf","New England","Columbia","New England","Midwest",
"Midwest","Gulf","Plains","North","Plains","South West","New England","Eastern",
"Four States","Eastern","Southeast","North","East Central","Plains","North West",
"Eastern","New England","Southeast","North","East Central","Texas","Four States",
"New England","Columbia","North West","Eastern","Midwest","North")

us_state_to_region_map <- data.frame(us_state, us_region, stringsAsFactors=FALSE)

这是非常丑陋且不可维护的状态 -> 区域映射是有效的混淆.

which is supremely ugly and unmaintainable as the State -> Region mapping is effectively obfuscated.

我实际上编写了一个 Perl 程序来从原始列表中生成上述内容.

I actually wrote a Perl program to generate the above from the original list.

在 Perl 中,人们会这样写:

In Perl, one would write things like:

#!/usr/bin/perl

$mapping = {
"Alabama"=> "Gulf",
"Arizona"=> "Four States",
"Arkansas"=> "Texas",
"California"=> "South West",
"Colorado"=> "Four States",
"Connecticut"=> "New England",
...etc...etc...
"West Virginia "=> "Eastern",
"Wisconsin"=> "Midwest",
"Wyoming"=> "North" };

哪个可维护的,因为可以逐行验证映射.

which is maintainable because one can verify the mapping on a line-by-line basis.

R 中一定有类似 Perl 优点的东西吗?

There must be something similar to this Perl goodness in R?

推荐答案

正如@tim-biegeleisen 所说,在数据库、CSV 文件或电子表格中维护此数据集并在 R 中打开它可能更合适(使用 <代码>readxl::read_excel(), readr::read_csv(),...).

As @tim-biegeleisen says it could be more appropriate to maintain this dataset in a database, a CSV file or a spreadsheet and open it in R (with readxl::read_excel(), readr::read_csv(),...).

但是,如果您想直接在代码中编写它,您可以使用 tibble:tribble() 允许逐行写入数据帧:

However if you want to write it directly in your code you can use tibble:tribble() which allows to write a dataframe row by row :

library(tibble)
tribble(~ state, ~ region,
        "Alabama", "Gulf",
        "Arizona", "Four States",
(...)
        "Wisconsin", "Midwest", 
        "Wyoming", "North")

这篇关于是否有语法糖来定义 R 中的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆