如何使用 R markdown 为数据框的每个子集创建不同的报告? [英] How to create a different report for each subset of a data frame with R markdown?

查看:29
本文介绍了如何使用 R markdown 为数据框的每个子集创建不同的报告?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像的数据集

I have a dataset that looks like

 City   Score   Count   Returns
 Dallas 2.9 61  21
 Phoenix    2.6 52  14
 Milwaukee  1.7 38  7
 Chicago    1.2 95  16
 Phoenix    5.9 96  16
 Dallas 1.9 45  12
 Dallas 2.7 75  45
 Chicago    2.2 75  10
 Milwaukee  2.6 12  2
 Milwaukee  4.5 32  0
 Dallas 1.9 65  12
 Chicago    4.9 95  13
 Chicago    5   45  5
 Phoenix    5.2 43  5

我想使用 R markdown 构建报告;但是,对于每个城市,我都需要构建一份报告.原因是一个城市看不到另一城市的报告.如何为每个城市构建报告并将其保存为 PDF?

I would like to build a report using R markdown; however, for each city I need to build a report. The reason for this is that one city cannot see the report for another city. How do I build a report and save a PDF of it for each city?

每份报告都需要中值Score、平均值Count 和平均值Returns.我知道使用 dplyr 我可以简单地使用

Each report would need the median Score, mean Count, and mean Returns. I know that using dplyr I could simply use

finaldat <- dat %>%
            group_by(City) %>%
            summarise(Score = median(Score),
                      Count = mean(Count)  ,
                      Return= mean(Returns))

但令人沮丧的是为每个City 生成报告.此外,这是数据的子集,而不是完整数据.也就是说,这份报告内容广泛,是一份结果报告,是系统的,每个City没有不同.

But the frustration comes from producing a report for each City. Also, this is a subset of the data, not the full data. That is, this report is extensive and is a report of the results, which is systematic, not different for each City.

推荐答案

它看起来像一个参数化报告可能正是您所需要的.有关详细信息,请参阅链接,但基本思想是您在 rmarkdown 报告的 yaml 中设置一个参数,并在报告中使用该参数对其进行自定义(例如,通过在您的情况下按 City 过滤数据).然后在一个单独的 R 脚本中,您多次渲染报告,对于 City 的每个值一次,您将其作为参数传递给 render> 功能.这是一个基本示例:

It looks like a parameterized report might be what you need. See the link for details, but the basic idea is that you set a parameter in the yaml of your rmarkdown report and use that parameter within the report to customize it (for example, by filtering the data by City in your case). Then in a separate R script, you render the report multiple times, once for each value of City, which you pass as a parameter to the render function. Here's a basic example:

在您的 Rmarkdown 报告中,您将在 yaml 中声明参数.如果在呈现报告时没有输入其他值,则列出的值 Dallas 在本例中只是默认值:

In your Rmarkdown report you would declare the parameter in the yaml. The listed value, Dallas in this case, is just the default value if no other value is input when you render the report:

---
title: My Document
output: pdf_document
params:
   My_City: Dallas
---

然后,在同一个 Rmarkdown 文档中,您将拥有整个报告——任何计算都取决于 City,以及对任何 City 都相同的样板文件.您可以使用 params$My_City 访问参数.下面的代码会将数据框过滤为 My_City 参数的当前值:

Then, in the same Rmarkdown document you would have your entire report--whatever calculations depend on City, plus the boilerplate that's the same for any City. You access the parameter with params$My_City. The code below will filter the data frame to the current value of the My_City parameter:

```{r}
dat %>%        
    filter(City==params$My_City) %>%
    summarise(Score = median(Score),
              Count = mean(Count)  ,
              Return= mean(Returns))
```

然后,在单独的 R 脚本中,您将执行以下操作,为每个 City 生成单独的报告(我假设上面的 Rmarkdown 文件称为 MyReport.Rmd):

Then, in a separate R script, you would do something like the following to produce a separate report for each City (where I've assumed the Rmarkdown file above is called MyReport.Rmd):

for (i in unique(dat$City)) {
    rmarkdown::render("MyReport.Rmd", 
                      params = list(My_City = i),
                      output_file=paste0(i, ".pdf"))
}

在上面的代码中,我假设 dat 数据框位于这个呈现 MyReport.Rmd 的单独 R 脚本的全局环境中.但是,您也可以只提供城市名称的向量,而不是从 unique(dat$City) 获取名称.

In the code above, I've assumed the dat data frame is in the global environment of this separate R script that renders MyReport.Rmd. However, you could also just provide a vector of city names instead of getting the names from unique(dat$City).

这篇关于如何使用 R markdown 为数据框的每个子集创建不同的报告?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆