将数据分成箱并计算平均值 [英] Separating data into bins and calculating averages

查看:153
本文介绍了将数据分成箱并计算平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个样本数据

Time(s) Bacteria count
0.4 2
0.82    5
6.67    8
7.55    11
8.21    14
8.89    17
9.4 20
10.18   23
10.85   26
11.35   29
11.85   32
12.41   35
13.36   38
13.86   41
14.57   44
15.08   47
15.67   50
16.09   53
16.59   56
18.53   59
24.43   62
25.32   65
25.97   68
26.37   71
26.93   74
27.87   77
28.33   80
29.1    83
29.88   84
30.88   85
31.99   86
35.65   87
36.06   88
36.46   89
36.96   90
37.39   91
37.95   92
38.56   93
39.22   94
39.79   95
40.56   96
41.47   97
42.02   98
42.73   99
43.4    100
43.93   101
44.67   102
45.24   103
45.9    104
46.58   105
47.22   106
47.89   107
48.64   108
49.13   109
49.91   110
50.48   111
51.25   112
53.35   113
53.98   114
54.69   115
55.82   116
56.38   117
56.99   118
62.09   119
63.1    120
63.84   121
64.64   122
65.37   123
66.61   124
69  125
69.72   126
70.78   126
73.32   126
74.65   126
75.12   126
75.45   126
75.94   126
76.38   126
76.84   126
77.95   126
78.61   126
79.06   126
79.62   126
80.19   126
82.73   126
85.3    126
85.68   126
86.42   126
87.41   126
88.08   126
91.74   126
92.81   126
93.21   126
94.32   126
96.32   126
102.03  126
102.71  126
104.45  126
105.04  126
105.65  126
106.16  126
107.44  126
107.9   126
109.72  126
110.24  126
111.24  126
111.84  126
112.45  126
113.12  126
114.02  126
114.67  126
115.24  126
115.85  126
117 126
121.26  126
121.8   126
125.8   126
127.26  126
128.37  126
129.48  126
130.27  126
131.04  126
131.72  126
132.47  126
133.21  126
134.27  126
134.87  126
136.04  126
136.6   126
137.27  126
140.83  126
142.05  126
143.63  126
144.12  126
149.83  126
151.07  126
151.79  126
153.24  126
154.14  126
155.24  126
156.58  126
157.51  126
158.25  126
161.43  126
162.14  126
162.8   126
164.26  126
165.09  126
165.76  126
166.83  126
167.42  126
168.94  126
169.75  126
170.52  126
171.19  126
172.67  126
173.44  126

所以我有从时间(0 s)到时间(2000 s)的数据,我们正在使用的此程序计算盘中繁殖或不繁殖时细菌的数量.打印任何内容,因此基本上可以跳过未检测到任何内容的时间.所以我真的很想用R在30秒的间隔内分离数据.我想让R计算每30秒的平均细菌孢子数.我将如何去做?

So i have this data from Time (0 s) till Time (2000 s) and this program we are using calculates the number of bacteria in a dish whenever it multiplies or if it doesn't...it doesn't print out anything so it basically skips the times where it has not detected anything. So I really want to use R to separate the data in 30 second intervals. I want R to calculate the average number of bacteria spores every 30 seconds. How would I go about doing this?

推荐答案

我做了一些建模.我做了一些假设.我对这个系统进行了建模,就好像您一开始就有126种细菌,每种细菌都有可能活跃".在试验结束时,所有细菌都是活跃的".我已将您的数据称为bacteria

I did a bit of modelling. I've made some assumptions. I've modelled this system as though you start off with 126 bacteria and each has a probability of becoming 'active'. At the end of the trial, all bacteria are 'active'. I've called your data bacteria

bacteria.glm <- glm(cbind(Bacteria_count, 126 - Bacteria_count) ~ Time, 
                    data=bacteria, family=binomial(logit))

plot(Bacteria_count/126 ~ Time, data=bacteria)
lines(bacteria$Time, bacteria.glm$fitted, col="red")

鉴于此,我们可以以30秒的间隔进行插值:

Given this, we can interpolate at 30 second intervals:

bacteria_intervals <- seq(0, 173.44, 30)
bac_predict<-data.frame(Time=bacteria_intervals, 
                        Bacteria_count=predict(bacteria.glm, data.frame(Time=bacteria_intervals), 
                                               type="response")*126)

plot(bacteria)
points(Bacteria_count~Time, data=bac_predict, col="red", pch=16)

bac_predict
##   Time Bacteria_count
## 1    0       12.39587
## 2   30       76.11856
## 3   60      120.36021
## 4   90      125.57925
## 5  120      125.96982
## 6  150      125.99784

或者,对于线性插值:

bacteria_linear <- approx(bacteria, xout=seq(0, 173.44, 30))
setNames(as.data.frame(bacteria_linear), c("Time", "Bacteria_count"))
##   Time Bacteria_count
## 1    0             NA
## 2   30        84.1200
## 3   60       118.5902
## 4   90       126.0000
## 5  120       126.0000
## 6  150       126.0000

甚至是样条插值:

bacteria_spline <- spline(bacteria, xout=seq(0, 173.44, 30))
setNames(as.data.frame(bacteria_spline), c("Time", "Bacteria_count"))
##   Time Bacteria_count
## 1    0      -1.672644
## 2   30      84.110483
## 3   60     118.854542
## 4   90     126.000000
## 5  120     126.000000
## 6  150     126.000000

这篇关于将数据分成箱并计算平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆