R假人通过.Rmd编织时包装奇怪的列名 [英] R dummies package weird column names when knitted via .Rmd
问题描述
我刚刚注意到在.Rmd
中编织时R的dummies
包中有一个非常奇怪的行为.这是可复制的示例.
I've just noticed a very weird behavior in the dummies
package of R when knitted in .Rmd
. Here's the reproducible example.
---
title: "Dummies Package Behavior"
author: "Kim"
date: '`r Sys.Date()`'
output:
pdf_document:
toc: yes
toc_depth: '3'
---
Load the libraries
```{r}
library(tidyverse)
library(dummies)
```
Main data wrangling
```{r}
df <- data_frame(year = c(2016, 2017, 2018))
temp <- dummy(df$year)
temp <- as_data_frame(temp)
df <- bind_cols(df, temp)
```
View output
```{r}
df
```
当我查看df
时期望看到的是year2016
,year2017
和year2018
的0-1列,这是dummies
包的正常行为.
What I'm expecting to see when I view the df
are nice 0-1 columns of year2016
, year2017
, and year2018
, which is the normal behavior for the dummies
package.
在RStudio中编织此R Markdown文档时,它会显示以下内容:C:/Users/Kim/Desktop/dummies.Rmd2016
,C:/Users/Kim/Desktop/dummies.Rmd2017
和C:/Users/Kim/Desktop/dummies.Rmd2018
.即,它使用整个文档地址作为列名.
When you knit this R Markdown document in RStudio, it instead brings out the following: C:/Users/Kim/Desktop/dummies.Rmd2016
, C:/Users/Kim/Desktop/dummies.Rmd2017
, and C:/Users/Kim/Desktop/dummies.Rmd2018
. That is, it uses the whole document address to make the column names.
我不明白为什么会发生这种行为.显然,我希望列名称为year2016
,year2017
和year2018
.
I don't understand why such behavior occurs. Obviously, I want to have column names as year2016
, year2017
, and year2018
.
推荐答案
该问题与dplyr
无关,因为我们可以使用data.frame()
重现它.当作为R Markdown文档的一部分执行时,在dummy()
函数中分配列标签显然存在问题.如Luke的回答所述,一种解决方法是使用dummy.data.frame()
.另一个方法是在将年份和虚拟变量与cbind()
绑定后,使用colnames()
函数重命名列,这也启用了基于dplyr
的解决方案.
The problem is not related to dplyr
because we can reproduce it with data.frame()
. Apparently there is a problem with assigning column labels in the dummy()
function when executed as part of an R Markdown document. As noted in Luke's answer, one workaround is to use dummy.data.frame()
. Another would be to use the colnames()
function to rename the columns after binding the year and dummy variables with cbind()
, which also enables a dplyr
-based solution.
这可能应该作为dummies
软件包的错误报告提交.
This should probably be submitted as a bug report for the dummies
package.
---
title: "Behavior of dummies package"
author: "anAuthor"
date: "12/26/2017"
output:
html_document: default
pdf_document: default
word_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# first, reproduce error with data.frame()
```{r}
library(dummies)
df <- data.frame(year = c(2016, 2017, 2018))
df
dummyCols <- dummy(df$year)
dummyCols <- as.data.frame(dummyCols)
dummyCols
```
# data.frame() approach to fix the error
```{r}
df <- data.frame(year = c(2016, 2017, 2018))
df
dummyCols <- dummy.data.frame(data=df,dummy.classes="ALL")
dummyCols
df <- cbind(df, dummyCols)
df
```
...和输出,首先重现错误.
...and the output, first reproducing the error.
...秒,使用dummies.data.frame()
避免错误.
...second, using dummies.data.frame()
to avoid the error.
dplyr
校正的工作原理如下.
The dplyr
correction works as follows.
# dplyr approach
```{r}
library(tidyverse)
df <- data_frame(year = c(2016, 2017, 2018))
temp <- dummy(df$year)
temp <- as_data_frame(temp)
df <- bind_cols(df, temp)
colnames(df) <- c("year",unlist(lapply(2016:2018,function(x) {
paste("year",x,sep="")
})))
df
```
这篇关于R假人通过.Rmd编织时包装奇怪的列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!