KnitR HTML输出显示不正确/奇怪的结果.内联代码和修改选项无法产生正确的输出 [英] KnitR HTML output showing incorrect/strange results. Inline code and modifying options not yielding the correct output

查看:65
本文介绍了KnitR HTML输出显示不正确/奇怪的结果.内联代码和修改选项无法产生正确的输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建有关几种分布的统计分析的报告;更具体地讲,随机种群及其样本与它们的区别在于后者遵循正态分布的属性,而大多数情况下它们的较大种群仍然偏斜.

尽管我对其余的输出感到非常满意,但是我无法弄清楚为什么某些数值及其可视化与通过命令行完成的数值有所不同.以下是一些用于差异的代码(首先,我生成了1000个随机指数):

set.seed(1000)
pop <- rexp(1000, 0.2)

在提取pop的平均值时,我通过控制台获得了准确的正确结果,即4.76475.这是我应该通过markdown输出获得的值,但knitr会将其显示为5.015616.

mean(pop)
[1] 4.76475

```{r, echo = T}
mean(pop)
```
[1] 5.015616

它不仅是平均值,而且在人口和样本所需的几乎所有其余统计变量中也是如此.此外,在针织输出中我还得到了错误的可视化效果:

原始/正确的情节

针织图

由于不正确的结果,曲线图本身显示不一致.我以为这是digits设置的问题,但是digits(options)并没有真正解决它,默认的scipen = 0设置也没有.我尝试插入内联代码,但是它仍然显示不正确的值.如果缺少块设置,但在那里找不到真正的故障,请参考knitR的手册.这里是否缺少某些内容或与随机分布有关的错误?

我注意到了另一个特殊属性.我创建了一个新的markdown文件,以查看结果是否根据我创建的每个新输出而有所不同.让我们将其命名为test.Rmd,但它包含与我在此处使用相同种子复制的相同命令.现在,我得到的结果完全不同,仍然与命令会话的原始值不同.

Roman的观点似乎起作用.编织结果越来越接近原始值,但仍不完全匹配.设置为357的种子使我的mean(pop)为4.881604,仅比原始值小数点.但是,为什么种子在这里改变了游戏规则?我认为必须是1000.

这是Phil要求的.Rmd文件中的一些代码.

# Load packages
library(ggplot2)
library(knitr)
library(gridExtra)

# Generate random exponentials
set.seed(357)
pop = rexp(1000,0.2) # lambs is 0.2 with n = 1000
pop.table <- as.data.frame(pop)

# Take a sample simulating 1000 averages of 40 exponentials
sample.exp = NULL
for (i in 1:1000){
     sample.exp = c(sample, rexp(40, 0.2)} # n = 40 here
     sample.df <- as.data.frame(sample.exp)

# Generate means and compare
mean(pop) # 4.881604
mean(sample.exp) # 4.992426

# Generate variances and compare
var(pop) # 26.07005
var(sample.exp) # 0.6562298

# Some plots
plot.means.pop <- ggplot(pop.table, aes(pop.table$pop)) + geom_histogram(binwidth = 0.9, fill = 'white', colour = 'black') + geom_vline(aes(xintercept = mean(pop.table$pop), colour = 'red')) + labs(title =  'Population Mean', x = 'Exponential', y = 'Frequency') + theme(legend.position = 'none') +theme(plot.title = element_text(hjust = 0.5))

plot.means.sample <- ggplot(sample.df, aes(sample.df$sample.exp)) + geom_histogram(binwidth = 0.2, fill = 'white', colour = 'black') + geom_vline(aes(xintercept = mean(sample.df$sample.exp)), colour = 'red', size = 0.8) + labs(title = 'Sample Mean', x = 'Exponential', y = 'Frequency') + guides(fill = F) + theme(plot.title = element_text(hjust = 0.5))

grid.arrange(plot.means.sample, plot.means.pop, ncol = 2, nrow = 1)

因此,这几乎是文件的主要部分,如果没有错误或命令行产生的确切结果,该部分将为我提供关闭"值.注意:将种子设置为357后,注释的值是 new 值,并且我为全局环境设置了相同的值.我在控制台上收到的值是:

  • 人口均值<4.76475
  • 样本均值5.00238
  • 人口差异的21.80913
  • 0.6492991(样本差异)

解决方案

在对Stack Overflow提出问题时,必不可少的提供 第一个答案自定义启动,以打开并检查启动选项,并在必要时进行更正他们.

您看到的输出不是由于scipen引起的,因为科学/工程表示法中没有数字,也不是digits,因为您看到的差异大于四舍五入的差异.

如果这些建议 still 仍不能解决您的问题,请张贴可复制的示例,然后在其他计算机上尝试.

I'm creating a report on statistical analysis of several distributions; more specifically random populations and how their samples differ from them with the latter adhering to properties of normal distributions while their larger populations remain skewed in most cases.

Although I'm more than satisfied with the rest of the output, I'm unable to figure out why certain numeric values and their visualisations are differing from the ones done through the command line. Here's some of the reproduced code for the discrepancy(first I generate a 1000 random exponentials):

set.seed(1000)
pop <- rexp(1000, 0.2)

In extracting say, the mean of pop, I get the exact correct result through the console, which is 4.76475. This is the value I should be getting through the markdown output, but instead knitr displays it as 5.015616.

mean(pop)
[1] 4.76475

```{r, echo = T}
mean(pop)
```
[1] 5.015616

Its not just the mean, but in almost all of the rest of the required statistical variables for the population as well as sample. In addition, I also get wrong visualisations in the knitted output:

Original/correct plot

Knitted plot

The plots themselves are being displayed discrepant because of the incorrect results. I thought this is a problem with the digits setting, but digits(options) isn't really solving it, neither is default scipen = 0 setting. I've tried inserting inline code but its still showing me the incorrect values. Referred to knitR's manual if a chunk setting was missing but couldn't really find a fault there. Is there something missing here or a bug related to random distributions?

EDIT: I noticed another peculiar property. I created a new markdown file to see if the results varied according to each new output that I created. Let's name this as test.Rmd but it contains the same commands that I've reproduced here with the same seed. And I'm getting a totally different result now, still different from the original value from the command session.

EDIT: Roman's point seem to be working. Knitted result are coming closer to original values but are still not exactly matching. The seed set to 357 gave me a mean(pop) of 4.881604 which is only a decimal point away from the original value. But why is seed being the game changer here? I thought it has to be 1000.

EDIT: Here's some of the code from the .Rmd file as requested by Phil.

# Load packages
library(ggplot2)
library(knitr)
library(gridExtra)

# Generate random exponentials
set.seed(357)
pop = rexp(1000,0.2) # lambs is 0.2 with n = 1000
pop.table <- as.data.frame(pop)

# Take a sample simulating 1000 averages of 40 exponentials
sample.exp = NULL
for (i in 1:1000){
     sample.exp = c(sample, rexp(40, 0.2)} # n = 40 here
     sample.df <- as.data.frame(sample.exp)

# Generate means and compare
mean(pop) # 4.881604
mean(sample.exp) # 4.992426

# Generate variances and compare
var(pop) # 26.07005
var(sample.exp) # 0.6562298

# Some plots
plot.means.pop <- ggplot(pop.table, aes(pop.table$pop)) + geom_histogram(binwidth = 0.9, fill = 'white', colour = 'black') + geom_vline(aes(xintercept = mean(pop.table$pop), colour = 'red')) + labs(title =  'Population Mean', x = 'Exponential', y = 'Frequency') + theme(legend.position = 'none') +theme(plot.title = element_text(hjust = 0.5))

plot.means.sample <- ggplot(sample.df, aes(sample.df$sample.exp)) + geom_histogram(binwidth = 0.2, fill = 'white', colour = 'black') + geom_vline(aes(xintercept = mean(sample.df$sample.exp)), colour = 'red', size = 0.8) + labs(title = 'Sample Mean', x = 'Exponential', y = 'Frequency') + guides(fill = F) + theme(plot.title = element_text(hjust = 0.5))

grid.arrange(plot.means.sample, plot.means.pop, ncol = 2, nrow = 1)

So thats pretty much the main portion of the file that is giving me 'close' values if not errors or the exact results from the command line. Note: The values annotated are new values after setting the seed to 357 and I've set the same for the global environment. The values that I'm receiving at console are:

  • 4.76475 for population mean
  • 5.00238 for sample mean
  • 21.80913 for population variance
  • 0.6492991 for sample variance

解决方案

When asking a question on Stack Overflow it's essential to provide a minimal reproducible example. In particular, have a good read of the first answer and this advice and this will guide you through the process.

I think we've all struggled to help you (and we want to!) because we can't reproduce your issue. Compare the following R and Rmd code when run or knitted, respectively:

# Generate random exponentials
set.seed(1000)
pop = rexp(1000, 0.2) # lambs is 0.2 with n = 1000
mean(pop)
## [1] 5.015616
var(pop)
## [1] 26.07005

and the Rmd:

---
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
    echo = TRUE,
    message = TRUE,
    warning = TRUE
)
```

```{r}
# Generate random exponentials
set.seed(1000)
pop = rexp(1000, 0.2) # lambs is 0.2 with n = 1000
mean(pop)
var(pop)
```

Which produces the following output:

# Generate random exponentials
set.seed(1000)
pop = rexp(1000, 0.2) # lambs is 0.2 with n = 1000
mean(pop)
## [1] 5.015616
var(pop)
## [1] 26.07005

As you can see, the result are identical from a clean R session and a clean knitr session. This is as expected, because the set.seed(), when set the same, should provide the same results every time (see the set.seed man page). When you change the seed to 357, the results vary together:

              | mean    | var      |
console (`R`) | 4.88... | 22.88... |
knitr (`Rmd`) | 4.88... | 22.88... |

In your second code block your knitr chunk result is correct for the 1000 seed, but the console result of 4.76 is incorrect, suggesting to me your console is producing the incorrect output. This could be for one of a few reasons:

  • You forgot to set the seed in the console before running the rexp() function. If you run this line without setting the seed the result will vary every time. Ensure you run the set.seed(1000) first or use an R script and source this to ensure steps are run through in order.
  • There's something in your global R environment that is affecting your results. This is less likely because you cleared your R environment, but this is one of the reasons it's important to create a new session from time to time, either by closing and re-opening RStudio or pressing CTRL + Shift + F10
  • There might be something set in your RProfile.site or .Rprofile that are setting an option on startup that's affecting your results. Have a look at Customizing startup to open and check your startup options, and if necessary correct them.

The output you're seeing isn't because of scipen because there are no numbers in scientific/engineering notation, and it's not digits because the differences you're seeing are more than differences in rounding.

If these suggestions still don't solve your issue, post the minimal reproducible example and try on other computers.

这篇关于KnitR HTML输出显示不正确/奇怪的结果.内联代码和修改选项无法产生正确的输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆