如何添加将功能应用于现有数据框的结果? [英] How to add the results of applying a function to an existing data frame?
问题描述
我正在尝试计算某些利率的置信区间. 我正在使用tidyverse和Epitools通过Byar的方法来计算CI.
I am trying to calculate the confidence intervals of some rates. I am using tidyverse and epitools to calculate CI from Byar's method.
几乎可以肯定我做错了.
I am almost certainly doing something wrong.
library (tidyverse)
library (epitools)
# here's my made up data
DISEASE = c("Marco Polio","Marco Polio","Marco Polio","Marco Polio","Marco Polio",
"Mumps","Mumps","Mumps","Mumps","Mumps",
"Chicky Pox","Chicky Pox","Chicky Pox","Chicky Pox","Chicky Pox")
YEAR = c(2011, 2012, 2013, 2014, 2015,
2011, 2012, 2013, 2014, 2015,
2011, 2012, 2013, 2014, 2015)
VALUE = c(82,89,79,51,51,
79,91,69,89,78,
71,69,95,61,87)
AREA =c("A", "B","C")
DATA = data.frame(DISEASE, YEAR, VALUE,AREA)
# this is a simplification, I have the population values in another table, which I've merged
# to give me the dataframe I then apply pois.byar to.
DATA$POPN = ifelse(DATA$AREA == "A",2.5,
ifelse(DATA$AREA == "B",3,
ifelse(DATA$AREA == "C",7,0)))
# this bit calculates the number of things per area
rates<-DATA%>%group_by(DISEASE,AREA,POPN)%>%
count(AREA)
然后,如果我想计算CI,我认为这会起作用
Then if I want to calculate CI I thought this would work
rates<-DATA%>%group_by(DISEASE,AREA,POPN)%>%
count(AREA) %>%
mutate(pois.byar(rates$n,rates$POPN))
但我明白了
Error in mutate_impl(.data, dots) :
Evaluation error: arguments imply differing number of rows: 0, 1.
这有效:
pois.byar(rates$n,rates$POPN)
然后说:将pois.byar函数的结果转换为数据框,然后合并回原始数据,似乎很愚蠢".我可能只是为了获取一些数据而尝试过....我不想这样做.这不是正确的做事方式.
It seems daft to then say: "turn the results of the pois.byar function into a dataframe and then merge back to the original". I may have tried that just to get some data.... I don't want to do that. It's not the right way to do things.
所有收到的建议均感激不尽. 我认为这是一个相当基本的问题.并表明我不是坐着学习,而是尝试边走边做.
Any advice gratefully received. I think it's a fairly basic problem. And indicative of my not sitting and learning but trying to do things as I go.
这就是我想要的 疾病年份n区域popn x pt率较低的上层浓度
Here's what I want Disease Year n area popn x pt rate lower upper conf.level
推荐答案
我不清楚您的预期输出应该给我什么.您的评论并没有真正的帮助.最好明确包含您提供的示例数据的预期输出.
It's not clear to me what your expected output is supposed to me. Your comment does not really help. Best to explicitly include your expected output for the sample data you give.
这里的问题是 pois.byvar
返回data.frame
.因此,为了使mutate
能够使用pois.byvar
的输出,我们需要将data.frame
s存储在list
中.
The issue here is that pois.byvar
returns a data.frame
. So in order for mutate
to be able to use the output of pois.byvar
we need to store the data.frame
s in a list
.
这是您代码的整理版本
library(tidyverse)
DATA %>%
mutate(POPN = case_when(
AREA == "A" ~ 2.5,
AREA == "B" ~ 3,
AREA == "C" ~ 7,
TRUE ~ 0)) %>%
group_by(DISEASE,AREA,POPN) %>%
count(AREA) %>%
mutate(res = list(pois.byar(n, POPN)))
这将创建列res
,其中包含pois.byar
的data.frame
输出.
This creates a column res
which contains the data.frame
output of pois.byar
.
或者也许您想unnest
list
列将条目扩展到不同的列中?
Or perhaps you'd like to unnest
the list
column to expand entries into different columns?
library(tidyverse)
DATA %>%
mutate(POPN = case_when(
AREA == "A" ~ 2.5,
AREA == "B" ~ 3,
AREA == "C" ~ 7,
TRUE ~ 0)) %>%
group_by(DISEASE,AREA,POPN) %>%
count(AREA) %>%
mutate(res = list(pois.byar(n, POPN))) %>%
unnest()
## A tibble: 9 x 10
## Groups: DISEASE, AREA, POPN [9]
# DISEASE AREA POPN n x pt rate lower upper conf.level
# <fct> <fct> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Chicky Pox A 2.5 1 1 2.5 0.4 0.0363 1.86 0.95
#2 Chicky Pox B 3 2 2 3 0.667 0.133 2.14 0.95
#3 Chicky Pox C 7 2 2 7 0.286 0.0570 0.916 0.95
#4 Marco Polio A 2.5 2 2 2.5 0.8 0.160 2.56 0.95
#5 Marco Polio B 3 2 2 3 0.667 0.133 2.14 0.95
#6 Marco Polio C 7 1 1 7 0.143 0.0130 0.666 0.95
#7 Mumps A 2.5 2 2 2.5 0.8 0.160 2.56 0.95
#8 Mumps B 3 1 1 3 0.333 0.0302 1.55 0.95
#9 Mumps C 7 2 2 7 0.286 0.0570 0.916 0.95
这篇关于如何添加将功能应用于现有数据框的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!