如何添加将功能应用于现有数据框的结果? [英] How to add the results of applying a function to an existing data frame?

查看:59
本文介绍了如何添加将功能应用于现有数据框的结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试计算某些利率的置信区间. 我正在使用tidyverse和Epitools通过Byar的方法来计算CI.

I am trying to calculate the confidence intervals of some rates. I am using tidyverse and epitools to calculate CI from Byar's method.

几乎可以肯定我做错了.

I am almost certainly doing something wrong.

library (tidyverse)
library (epitools)


# here's my made up data

DISEASE = c("Marco Polio","Marco Polio","Marco Polio","Marco Polio","Marco Polio",
            "Mumps","Mumps","Mumps","Mumps","Mumps",
            "Chicky Pox","Chicky Pox","Chicky Pox","Chicky Pox","Chicky Pox")
YEAR = c(2011, 2012, 2013, 2014, 2015,
         2011, 2012, 2013, 2014, 2015,
         2011, 2012, 2013, 2014, 2015)
VALUE = c(82,89,79,51,51,
          79,91,69,89,78,
          71,69,95,61,87)
AREA =c("A", "B","C")

DATA = data.frame(DISEASE, YEAR, VALUE,AREA)


# this is a simplification, I have the population values in another table, which I've merged 
# to give me the dataframe I then apply pois.byar to.
DATA$POPN = ifelse(DATA$AREA == "A",2.5,
              ifelse(DATA$AREA == "B",3,
                     ifelse(DATA$AREA == "C",7,0)))


# this bit calculates the number of things per area
rates<-DATA%>%group_by(DISEASE,AREA,POPN)%>%
  count(AREA)

然后,如果我想计算CI,我认为这会起作用

Then if I want to calculate CI I thought this would work

rates<-DATA%>%group_by(DISEASE,AREA,POPN)%>%
  count(AREA) %>%
  mutate(pois.byar(rates$n,rates$POPN))

但我明白了

Error in mutate_impl(.data, dots) : 
  Evaluation error: arguments imply differing number of rows: 0, 1.

这有效:

pois.byar(rates$n,rates$POPN)

然后说:将pois.byar函数的结果转换为数据框,然后合并回原始数据,似乎很愚蠢".我可能只是为了获取一些数据而尝试过....我不想这样做.这不是正确的做事方式.

It seems daft to then say: "turn the results of the pois.byar function into a dataframe and then merge back to the original". I may have tried that just to get some data.... I don't want to do that. It's not the right way to do things.

所有收到的建议均感激不尽. 我认为这是一个相当基本的问题.并表明我不是坐着学习,而是尝试边走边做.

Any advice gratefully received. I think it's a fairly basic problem. And indicative of my not sitting and learning but trying to do things as I go.

这就是我想要的 疾病年份n区域popn x pt率较低的上层浓度

Here's what I want Disease Year n area popn x pt rate lower upper conf.level

推荐答案

我不清楚您的预期输出应该给我什么.您的评论并没有真正的帮助.最好明确包含您提供的示例数据的预期输出.

It's not clear to me what your expected output is supposed to me. Your comment does not really help. Best to explicitly include your expected output for the sample data you give.

这里的问题是 pois.byvar返回data.frame .因此,为了使mutate能够使用pois.byvar的输出,我们需要将data.frame s存储在list中.

The issue here is that pois.byvar returns a data.frame. So in order for mutate to be able to use the output of pois.byvar we need to store the data.frames in a list.

这是您代码的整理版本

library(tidyverse)
DATA %>%
    mutate(POPN = case_when(
        AREA == "A" ~ 2.5,
        AREA == "B" ~ 3,
        AREA == "C" ~ 7,
        TRUE ~ 0)) %>%
    group_by(DISEASE,AREA,POPN) %>%
    count(AREA) %>%
    mutate(res = list(pois.byar(n, POPN)))

这将创建列res,其中包含pois.byardata.frame输出.

This creates a column res which contains the data.frame output of pois.byar.

或者也许您想unnest list列将条目扩展到不同的列中?

Or perhaps you'd like to unnest the list column to expand entries into different columns?

library(tidyverse)
DATA %>%
    mutate(POPN = case_when(
        AREA == "A" ~ 2.5,
        AREA == "B" ~ 3,
        AREA == "C" ~ 7,
        TRUE ~ 0)) %>%
    group_by(DISEASE,AREA,POPN) %>%
    count(AREA) %>%
    mutate(res = list(pois.byar(n, POPN))) %>%
    unnest()
## A tibble: 9 x 10
## Groups:   DISEASE, AREA, POPN [9]
#  DISEASE     AREA   POPN     n     x    pt  rate  lower upper conf.level
#  <fct>       <fct> <dbl> <int> <int> <dbl> <dbl>  <dbl> <dbl>      <dbl>
#1 Chicky Pox  A       2.5     1     1   2.5 0.4   0.0363 1.86        0.95
#2 Chicky Pox  B       3       2     2   3   0.667 0.133  2.14        0.95
#3 Chicky Pox  C       7       2     2   7   0.286 0.0570 0.916       0.95
#4 Marco Polio A       2.5     2     2   2.5 0.8   0.160  2.56        0.95
#5 Marco Polio B       3       2     2   3   0.667 0.133  2.14        0.95
#6 Marco Polio C       7       1     1   7   0.143 0.0130 0.666       0.95
#7 Mumps       A       2.5     2     2   2.5 0.8   0.160  2.56        0.95
#8 Mumps       B       3       1     1   3   0.333 0.0302 1.55        0.95
#9 Mumps       C       7       2     2   7   0.286 0.0570 0.916       0.95

这篇关于如何添加将功能应用于现有数据框的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆