创建while循环函数,该函数采用下一个最大值直到满足条件 [英] Create while loop function that takes next largest value untill condition is met
问题描述
我想创建一个函数,该函数在数据框中创建一个新列,该列以所有行中的全0开头,但将基于以下内容创建1.它开始在百分比列中查找最高百分比.这将在同一行的新创建的算法"列中产生1.然后,它将查看起始行的最小和最大行.假设在第6行中找到的最高值(起始值)是13.8%,接下来要查看的行是5和7.然后它将查看此处的百分比,并确定最高的%并在其中创建1 算法"列(假设它在第7行中占8.3%).接下来,它将再次查看最小值和最大值行(第5行和第8行,因为已经考虑了第6和7行).
I want to create a function that creates a new column in a dataframe that starts with all 0's in all rows but will create 1's based on the following. It starts looking at the highest % in the percent column. That will produce a 1 in the newly created "algorithm" column in the same row. Then it will look at the minimum and maximum row of the starting row. Lets say the highest found (starting value) is 13,8% in row 6, the next rows that it will look at are 5 and 7. Then it will look at the percentages in here and decides the highest % and creates a 1 in the "algorithm" column ( lets say it is 8,3% in row 7). Next it will look at the min and max row again ( row 5 and row 8, because row 6&7 are already took into account).
然后,一个重要的因素是,它必须以一定的百分比停止以查找更多行,比方说,在95%处它将停止.这是基于百分比"列中的总百分比总计应为95%.
Then an important factor as well is that it has to stop at a certain percentage with looking for more rows, lets say at 95% it is stopping. This is based on the total percentage from the "percent" column that summed up should be 95% .
这是主要思想,但我不确定如何执行此操作.
This is the main idea, but Im not sure how to do this.
此外,最后它还必须比最小和最大行看起来更远,因为这两个行也都可以是例如8%,因此它必须更远看一行并根据最高行来选择该行价值.
Moreover, it also in the end has to look further than the min and max row since those 2 rows can also be both for example be 8%, so it has to look 1 row further and choose that row based on the highest value.
尚未测试,但这是我目前正在考虑的问题.
Not tested yet, but this is what im thinking about currently.
(While(total_perc < p_min_performance)
prev_row_value <t (minrow -1)
next_rpw_value <t (maxrow +1)
prev > next > t(prev,) >1
minrow <- minrow-1
maxrow <- maxrow+1
示例代码:
algorithm <- data.frame(pc4 = c(5464),
timeinterval = c('08:45:00', '09:00:00', '09:15:00', '09:30:00',
'09:45:00', '10:00:00', '10:15:00', '10:30:00', '10:45:00', '11:00:00',
'11:15:00', '11:30:00'),
stops = c(1, 5, 8, 7, 5, 10, 6, 4, 7, 6, 5, 8)) %>%
mutate(percent = round(stops/sum(stops), digits = 6)*100) %>%
mutate(idgroup = seq_along(timeinterval))
不确定从哪里开始.algorithm_clumn中的1也可以是相应的百分比,这也许可以使计算起来更容易,直到可以说是95%.
Not sure where to start yet. THe 1's in the algorithm_clumn can also be the corresponding percentages, which maybe makes it easier to count it up until lets say 95%.
结构应该看起来像这样(以它为例,algorithm_column中的数据可以是基于其在数据中找到的任何内容)
The structure should look like this (its an example, the data in the algorithm_column could be anything based on what it is finding in the data)
EDIT:
algorithm
# pc4 timeinterval stops percent idgroup algorithm_column
#1 5464 08:45:00 1 1.3889 1 0
#2 5464 09:00:00 5 6.9444 2 1
#3 5464 09:15:00 8 11.1111 3 1
#4 5464 09:30:00 7 9.7222 4 1
#5 5464 09:45:00 5 6.9444 5 1
#6 5464 10:00:00 10 13.8889 6 1
#7 5464 10:15:00 6 8.3333 7 1
#8 5464 10:30:00 4 5.5556 8 1
#9 5464 10:45:00 7 9.7222 9 1
#10 5464 11:00:00 6 8.3333 10 1
#11 5464 11:15:00 5 6.9444 11 1
#12 5464 11:30:00 8 11.1111 12 0
Ronak的代码有效:
The code of Ronak is working:
algorithm$algorithm_column <- 0
output <- do.call(rbind, lapply(split(algorithm, algorithm$pc4),
function(x) {
all_index <- x$idgroup
next_comb <- all_index
while(sum(x$percent[x$algorithm_column == 1]) <= 95) {
inds <- next_comb[which.max(x$percent[next_comb])]
x$algorithm_column[inds] <- 1
nos <- which(all_index == inds)
next_comb <- all_index[c(nos - 1, nos + 1)]
all_index <- setdiff(all_index, inds)
}
x
}))
该函数在某些情况下不起作用,因为当它在下一行中达到两个0时,它将采用这些行的第一个最大值,并且它将在数据集的第一部分中仅找到0,然后继续执行次高的值.例如,此数据集:
The function is not working in some cases because when it reaches two 0's in the next rows it will take the FIRST max of those rows and it will find only 0's in the first part of the data set and then continues to the next highest value. For example, this dataset:
algorithm1 <- data.frame(pc4 = c(8035),
timeinterval = c('03:00:00','03:30:00','04:00:00','04:30:00','05:00:00','05:30:00','06:00:00','06:30:00','07:00:00','07:30:00','08:00:00','08:30:00','09:00:00','09:30:00','10:00:00','10:30:00','11:00:00','11:30:00','12:00:00','12:30:00','13:00:00','13:30:00','14:00:00','14:30:00','15:00:00','15:30:00','16:00:00','16:30:00'),
stops = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 70, 0, 6, 0, 0, 0, 3, 0, 3, 3, 0, 5, 0, 0, 0)) %>%
group_by(pc4) %>%
mutate(percent = round(stops/sum(stops), digits = 6)*100) %>%
mutate(idgroup = seq_along(timeinterval)) %>%
mutate(algorithm_column = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))
我将迭代添加到函数中以查看顺序,因此您可以看到它以0开头.
I added the iteration to the function to see the order, so you can see that it takes the 0's first.
p_min_performance <- 95 # SET PERCENTAGE!
#Naar 0
algorithm1$algorithm_column <- 0
algorithm1$iteration <- 0
it <- 0
algorithm1 <- do.call(rbind, lapply(split(algorithm1, algorithm1$pc4),
function(x) {
#Index voor maximum percentage
all_index <- x$idgroup
next_comb <- all_index
#While loop algorithm
while (sum(x$percent[x$algorithm_column == 1]) <= p_min_performance) {
it <- it + 1
inds <- next_comb[which.max(x$percent[next_comb])]
x$algorithm_column[inds] <- 1
x$iteration[inds] <- it
nos <- which(all_index == inds)
next_comb <- all_index[c(nos - 1, nos + 1)]
all_index <- setdiff(all_index, inds)
}
x
}))
输出现在的状态:(时间间隔从03:00到15:00)
Output how it is now: (where timeinterval will be from 03:00 to 15:00)
pc4 tinterval stops percen id_g a_col iteration
1 8035 03:00:00 0 0.0000 1 1 14
2 8035 03:30:00 0 0.0000 2 1 13
3 8035 04:00:00 0 0.0000 3 1 12
4 8035 04:30:00 0 0.0000 4 1 11
5 8035 05:00:00 0 0.0000 5 1 10
6 8035 05:30:00 0 0.0000 6 1 9
7 8035 06:00:00 0 0.0000 7 1 8
8 8035 06:30:00 0 0.0000 8 1 7
9 8035 07:00:00 0 0.0000 9 1 6
10 8035 07:30:00 0 0.0000 10 1 5
11 8035 08:00:00 0 0.0000 11 1 4
12 8035 08:30:00 0 0.0000 12 1 3
13 8035 09:00:00 9 9.0909 13 1 2
14 8035 09:30:00 70 70.7071 14 1 1
15 8035 10:00:00 0 0.0000 15 1 15
16 8035 10:30:00 6 6.0606 16 1 16
17 8035 11:00:00 0 0.0000 17 1 17
18 8035 11:30:00 0 0.0000 18 1 18
19 8035 12:00:00 0 0.0000 19 1 19
20 8035 12:30:00 3 3.0303 20 1 20
21 8035 13:00:00 0 0.0000 21 1 21
22 8035 13:30:00 3 3.0303 22 1 22
23 8035 14:00:00 3 3.0303 23 1 23
24 8035 14:30:00 0 0.0000 24 1 24
25 8035 15:00:00 5 5.0505 25 1 25
26 8035 15:30:00 0 0.0000 26 0 0
27 8035 16:00:00 0 0.0000 27 0 0
28 8035 16:30:00 0 0.0000 28 0 0
但这应该是:(时间间隔为09:00至15:00)
But this should be: (where timeinterval will be from 09:00 to 15:00)
pc4 tinterval stops percen id_g a_col iteration
1 8035 03:00:00 0 0.0000 1 0 0
2 8035 03:30:00 0 0.0000 2 0 0
3 8035 04:00:00 0 0.0000 3 0 0
4 8035 04:30:00 0 0.0000 4 0 0
5 8035 05:00:00 0 0.0000 5 0 0
6 8035 05:30:00 0 0.0000 6 0 0
7 8035 06:00:00 0 0.0000 7 0 0
8 8035 06:30:00 0 0.0000 8 0 0
9 8035 07:00:00 0 0.0000 9 0 0
10 8035 07:30:00 0 0.0000 10 0 0
11 8035 08:00:00 0 0.0000 11 0 0
12 8035 08:30:00 0 0.0000 12 0 0
13 8035 09:00:00 9 9.0909 13 1 2
14 8035 09:30:00 70 70.7071 14 1 1
15 8035 10:00:00 0 0.0000 15 1 3
16 8035 10:30:00 6 6.0606 16 1 4
17 8035 11:00:00 0 0.0000 17 1 5
18 8035 11:30:00 0 0.0000 18 1 6
19 8035 12:00:00 0 0.0000 19 1 7
20 8035 12:30:00 3 3.0303 20 1 8
21 8035 13:00:00 0 0.0000 21 1 9
22 8035 13:30:00 3 3.0303 22 1 10
23 8035 14:00:00 3 3.0303 23 1 11
24 8035 14:30:00 0 0.0000 24 1 12
25 8035 15:00:00 5 5.0505 25 1 13
26 8035 15:30:00 0 0.0000 26 0 0
27 8035 16:00:00 0 0.0000 27 0 0
28 8035 16:30:00 0 0.0000 28 0 0
因此,如果这些算法均为0,则最后的算法应进一步查看行,然后仅查看最高值旁边的行.
So the algorithm in the end should look at the rows further then only the row next to the highest value if these are both 0.
我现在正忙于创建其中的一部分,但是我却很固执.
I was now busy with creating chunks of it but im a but stuck..
runAlgorithm <- function(x, min_performance = 95) {
x$algorithm_column <- 0
x$iteration <- 0
it <- 0
all_index <- x$idgroup
next_comb <- all_index
inds <- next_comb[which.max(x$percent[next_comb])]
x$algorithm_column[inds] <- 1
x$iteration[inds] <- it
#While loop algorithm
while (sum(x$percent[x$algorithm_column == 1]) <= min_performance) {
prev_values <- x$percent[1:inds - 1]
next_values <- x$percent[inds + 1:length(x$percent)]
first_non_zero_prev <- if_else(sum(prev_values) > 0L, which.max(prev_values
> 0), NA)
first_non_zero_next <- if_else(sum(next_values) > 0L, which.max(next_values
> 0), NA)
next_value <- case_when(
is.na(first_non_zero_prev) & !is.na(first_non_zero_next) ~ next_comb[2],
!is.na(first_non_zero_prev) & is.na(first_non_zero_next) ~ next_comb[1],
first_non_zero_prev <= first_non_zero_next ~ next_comb[2],
first_non_zero_prev > first_non_zero_next ~ next_comb[1]
)
inds <- next_comb[which.max(x$percent[next_value])]
x$algorithm_column[inds] <- 1
x$iteration[inds] <- it
nos <- which(all_index == inds)
next_comb <- all_index[c(nos - 1, nos + 1)]
all_index <- setdiff(all_index, inds)
}
return(x)
}
df_test <- groep_test[1:48,]
output <- runAlgorithm(df_test)
推荐答案
这是不基于循环的解决方案.基本上,它使用cumsum()
确定哪些行与maximum_threshold
交叉. rowSums(matrix(...))
组合示例中的第5行和第7行,然后是第4行和第8行,等等.
Here's a solution that is not based on a loop. Basically, it uses the cumsum()
to determine which rows cross the maximum_threshold
. The rowSums(matrix(...))
combines rows 5 and 7, then 4 and 8, etc. from your example.
根据您的评论,您可以将其添加到dplyr
链中,包括group_by()
.
Based on your comments you could add this to a dplyr
chain, including with group_by()
.
f_algo_return <- function(pct, max_threshold = 70){
# initialize return variable
algo <- vector(mode = 'integer', length = length(pct))
#make rows
max_row <- which.max(pct)
#if we have odd number of rows, we need to prevent subsetting pct[0]
len_out <- min(abs(max_row - c(1, length(pct))))
all_rows <- c(max_row,
(max_row - len_out):(max_row-1),
(max_row+1):(max_row + len_out)
)
#subset the pct
pct <- pct[all_rows]
thresh <- cumsum(c(pct[1], rowSums(matrix(pct[-1], ncol = 2)))) < max_threshold
sub_rows <- all_rows[c(thresh[1], rev(thresh[-1]), thresh[-1])]
#initialize and update new variable
algo[sub_rows] <- 1L
return(algo)
}
f_algo_return(DF[['percent']])
# [1] 0 0 1 1 1 1 1 1 1 0 0 0
数据:
DF <- data.frame(pc4 = c(5464),
timeinterval = c('08:45:00', '09:00:00', '09:15:00', '09:30:00',
'09:45:00', '10:00:00', '10:15:00', '10:30:00', '10:45:00', '11:00:00',
'11:15:00', '11:30:00'),
stops = c(1, 5, 8, 7, 5, 10, 6, 4, 7, 6, 5, 8)) %>%
mutate(percent = round(stops/sum(stops), digits = 6)*100) %>%
mutate(idgroup = seq_along(timeinterval))
这篇关于创建while循环函数,该函数采用下一个最大值直到满足条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!