将研究项目过渡到基于编织器的设置 [英] Transitioning research project to knitr-based setup

查看:70
本文介绍了将研究项目过渡到基于编织器的设置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最后,根据我的情况,我已经决定将我的论文研究朝着使其成为尽可能出色的可重复研究的目标迈进.由于目前我不在论文报告中使用LaTeX(尽管我正在考虑使用此选项),所以我相信knitr是最好的选择.

Finally, I've decided to move my dissertation research closer toward the goal of making it as good reproducible research as it can be, given my circumstances. Since currently I don't use LaTeX for my dissertation report (though I'm considering this option), I believe that knitr is the best way to go.

实现我论文研究的经验部分(数据分析)的软件项目用R编写.该项目在目录结构中包含多个文件,这对于科学工作流程而言是非常典型的(顶级子目录:analysis, cache, data, figures, import, prepare, present, results, sandbox, utils).

The software project, implementing empirical part of my dissertation research (data analysis), is being written in R. The project's contains multiple files within directory structure, which is rather typical for scientific workflows (top level sub-directories: analysis, cache, data, figures, import, prepare, present, results, sandbox, utils).

通常,我已经阅读了很多有关使用knitr来自动生成报告和可重复研究的信息(包括示例).但是,我对众多配置选项感到不知所措,更重要的是,对于在像我这样的项目中使用knitr且包含多个文件的最佳/正确/最佳方法仍然感到困惑和目录.尤其是,我对有关过渡 现有代码库的框架和步骤的建议很感兴趣,而无需在R模块中进行太多修改.

I have read a lot of information (including examples) on using knitr for auto-generating reports and reproducible research, in general. However, I'm somewhat overwhelmed by multitude of configuration options and, more importantly, still confused on the best/correct/optimal approach for using knitr in projects like mine, containing multiple files and directories. In particular, I'm interested in advice on framework and steps for transitioning existing codebase without too many modifications in R modules.

作为示例,让我们考虑与探索性数据分析(EDA)相关的模块.我当前的EDA工作流程包括:

As an example, let's consider my modules, related to exploratory data analysis (EDA). My current EDA workflow includes:

  • 从原始原始数据(位于数据/转换"子目录中)转换而来的初步数据;

  • preliminary data, transformed from the original raw data (located in "data/transform" sub-directories);

模块"eda.R",位于分析"目录中;

module "eda.R", located in "analysis" directory;

目录结果/eda",其中我当前的代码正在生成单变量和多变量EDA的图形(SVG文件),以及具有相同图形信息(生成描述性信息)的单个文档报告(PDF文件)运行"eda.R"脚本时,统计信息将作为控制台输出生成.

directory "results/eda", where my current code is generating figures (SVG files) of univariate and multivariate EDA, as well as a single document report (PDF file) with the same graphical only information (generated descriptive statistics is being produced as a console output, when running the "eda.R" script).

为了过渡到基于knitr的项目,我用R Markdown语句创建了文件"eda-report.Rmd",用于设置本地knitr选项,包括read_chunk("eda.R").我的理解是,根据我的EDA工作流程,现在我需要在"eda.R"中将现有的R代码块定义为knitr chunks ,然后调用这些命名的块.

In order to transition to knitr-based project, I have created file "eda-report.Rmd" with R Markdown statements for setting local knitr options, including read_chunk("eda.R"). My understanding is that now I need to define existing blocks of R code in "eda.R" as knitr chunks and then call these named chunks, according to my EDA workflow.

问题:

这是正确的方法吗?关于使用knitr设置项目路径,使用source()并通过gridExtra分组某些图并防止潜在问题的最佳实践是什么?在我看来,除了"eda-report.Rmd"之外,我还需要创建另一个R模块,该模块将通过knitr启动对.Rmd文件的处理.如果是,我应该使用哪个调用:rmarkdown::render()knitr::knit()(当我使用RStudio进行开发时,我希望我的代码独立于开发环境)?

Is it correct approach? What are best practices for using knitr in regard to setting up project paths, using source(), grouping some plots via gridExtra, preventing potential issues? It seems to me that, in addition to "eda-report.Rmd", I need to create another R module, which will be initiating processing of the .Rmd file by knitr. If Yes, which call should I use: rmarkdown::render() or knitr::knit() (while I use RStudio for development, I want my code to be independent from the development environment)?

更新1(其他问题):

为什么通过编织HTML"按钮处理RStudio中的.Rmd文件会生成HTML文档,而通过Makefile命令Rscript -e 'library("knitr"); knit("eda-report.Rmd")'处理会生成.md文件,但不会生成HTML文件,尽管是否存在output: html_document指令?

Why processing of an .Rmd file in RStudio via "Knit HTML" button produces HTML document, while processing via Makefile command Rscript -e 'library("knitr"); knit("eda-report.Rmd")' produces .md file, but not HTML, despite the presence of output: html_document directive?

谢谢您的阅读!您的建议将不胜感激!

推荐答案

为了将您的工作流程转换为使用knitr,我建议不要尝试使所编写的每段代码都具有可重现性,而应从头开始这将是最有用的.

In order to transition your workflow to using knitr, I suggest that rather than trying to make every last piece of code you write reproducible, you should start with the bits that will be most useful.

由于knitr是报告生成工具,因此最好的起点是将您的论文用knitr编写. (您提到您目前不使用LaTeX.没关系:knitr还支持AsciiDoc,我发现它更容易编写.如果您的论文没有太多方程式或表格,您可能还不愿意用LaTeX编写) Markdown或Textile,这更容易.)

Since knitr is a report generation tool, the best place to start is by writing your dissertation in knitr. (You mention that you don't use LaTeX at the moment. That's fine: knitr also supports AsciiDoc, which I find easier to write. If your dissertation doesn't have many equations or tables, you might also get away with writing it in Markdown or Textile, which are even easier.)

类似地,knitr适合您可能写的任何报告或论文.

Similarly, knitr is good for any reports or papers that you might write.

有关更高级的用法,您可以使用knitr创建演示文稿. (有时我会编成xhtml Slidy 演示文稿.)

For more advanced usage, you can create presentations using knitr. (I sometimes knit xhtml Slidy presentations.)

我不想打扰的是尝试编织 all 您的探索性数据分析.您会发现大多数事情很无聊或死胡同,因此不值得付出额外的努力.专注于尽可能快地探索,然后编织有趣的部分.同样,数据清理通常也不是那么有趣,因此注释良好的代码通常就足够了.

What I wouldn't bother with is trying to knit all your exploratory data analysis. Most things you'll find are boring or dead ends, so it isn't worth the extra effort. Concentrate on exploring as fast as you can, then knit the interesting bits afterwards. Likewise, data cleaning isn't usually that interesting, so well commented code often suffices.

要回答有关目录结构的问题,我的偏好是,由于编织报告是最终输出,因此应将其沙箱化,以免进行搜寻工作.也就是说,他们可以拥有自己的目录,并产生自己的图形副本.

To answer your question about directory structure, my preference is that since knitr reports are for final output, they should be sandboxed away from scrappier exploratory work. That is, they can have their own directory, and produce their own copies of figures.

这篇关于将研究项目过渡到基于编织器的设置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆