是否有语法糖来定义 R 中的数据框 [英] Is there syntactic sugar to define a data frame in R
问题描述
我想按地区重新组合美国各州,因此我需要定义一个美国州"->美国地区"映射函数,这是通过设置适当的数据框来完成的.
I want to regroup US states by regions and thus I need to define a "US state" -> "US Region" mapping function, which is done by setting up an appropriate data frame.
基础是这个练习(显然这是辐射联邦"的地图):
The basis is this exercise (apparently this is a map of the "Commonwealth of the Fallout"):
从原始形式的原始列表开始:
One starts off with an original list in raw form:
Alabama = "Gulf"
Arizona = "Four States"
Arkansas = "Texas"
California = "South West"
Colorado = "Four States"
Connecticut = "New England"
Delaware = "Columbia"
最终导致此 R 代码:
which eventually leads to this R code:
us_state <- c("Alabama","Arizona","Arkansas","California","Colorado","Connecticut",
"Delaware","District of Columbia","Florida","Georgia","Idaho","Illinois","Indiana",
"Iowa","Kansas","Kentucky","Louisiana","Maine","Maryland","Massachusetts","Michigan",
"Minnesota","Mississippi","Missouri","Montana","Nebraska","Nevada","New Hampshire",
"New Jersey","New Mexico","New York","North Carolina","North Dakota","Ohio","Oklahoma",
"Oregon","Pennsylvania","Rhode Island","South Carolina","South Dakota","Tennessee",
"Texas","Utah","Vermont","Virginia","Washington","West Virginia ","Wisconsin","Wyoming")
us_region <- c("Gulf","Four States","Texas","South West","Four States","New England",
"Columbia","Columbia","Gulf","Southeast","North West","Midwest","Midwest","Plains",
"Plains","East Central","Gulf","New England","Columbia","New England","Midwest",
"Midwest","Gulf","Plains","North","Plains","South West","New England","Eastern",
"Four States","Eastern","Southeast","North","East Central","Plains","North West",
"Eastern","New England","Southeast","North","East Central","Texas","Four States",
"New England","Columbia","North West","Eastern","Midwest","North")
us_state_to_region_map <- data.frame(us_state, us_region, stringsAsFactors=FALSE)
这是非常丑陋且不可维护的状态 -> 区域映射是有效的混淆.
which is supremely ugly and unmaintainable as the State -> Region mapping is effectively obfuscated.
我实际上编写了一个 Perl 程序来从原始列表中生成上述内容.
I actually wrote a Perl program to generate the above from the original list.
在 Perl 中,人们会这样写:
In Perl, one would write things like:
#!/usr/bin/perl
$mapping = {
"Alabama"=> "Gulf",
"Arizona"=> "Four States",
"Arkansas"=> "Texas",
"California"=> "South West",
"Colorado"=> "Four States",
"Connecticut"=> "New England",
...etc...etc...
"West Virginia "=> "Eastern",
"Wisconsin"=> "Midwest",
"Wyoming"=> "North" };
哪个是可维护的,因为可以逐行验证映射.
which is maintainable because one can verify the mapping on a line-by-line basis.
R 中一定有类似 Perl 优点的东西吗?
There must be something similar to this Perl goodness in R?
推荐答案
正如@tim-biegeleisen 所说,在数据库、CSV 文件或电子表格中维护此数据集并在 R 中打开它可能更合适(使用 <代码>readxl::read_excel(), readr::read_csv()
,...).
As @tim-biegeleisen says it could be more appropriate to maintain this dataset in a database, a CSV file or a spreadsheet and open it in R (with readxl::read_excel()
, readr::read_csv()
,...).
但是,如果您想直接在代码中编写它,您可以使用 tibble:tribble()
允许逐行写入数据帧:
However if you want to write it directly in your code you can use tibble:tribble()
which allows to write a dataframe row by row :
library(tibble)
tribble(~ state, ~ region,
"Alabama", "Gulf",
"Arizona", "Four States",
(...)
"Wisconsin", "Midwest",
"Wyoming", "North")
这篇关于是否有语法糖来定义 R 中的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!