计算不同列中的文本值,进入新列 [英] Counting text values across different columns, in to new columns
问题描述
清晰度和错别字,进一步扩展了示例.
EDIT 1: Clarity and typo's, expanded more on example.
我有一个数据集,该数据集的一列( Action
)具有文本值.我想计算唯一值(对于给定的 Operatie
),然后根据其ID(== Operatie
)将它们放在新列中. Action
中有21个唯一值.
I have a dataset with one column (Action
) that has text values. I want to count the unique values (for a given Operatie
) and place these in a new column, according to their ID (== Operatie
). There are 21 unique values within Action
.
在新数据集中,重要的是将新列(该列计算 Action
中的单个文本值)链接到 Q.Operatie
的值(具有值 Q1,Q2,Q3,Q4
)和 Operatie
( 1:100
).
In the new data set it is important that the new column (which counts a single text value from Action
) is linked to the value of Q.Operatie
(has values Q1, Q2, Q3, Q4
) and Operatie
(1:100
).
因此,如果我们在示例中采用前4行,则将有一个名为 Q1.Delegerend
== 2的列.code> Q1.Goedaardig ==1.因为对于 Operation == 1
Delegerend 和1次 Goedaardig
>.在此示例中,我忽略了 Instruerend
.
Thus, if we take the first 4 rows in our example then we would have a column named Q1.Delegerend
== 2. Whereas the next column would be Q1.Goedaardig
== 1. Because we have 2 occurrences of Delegerend
and 1 occurrence of Goedaardig
for Operation == 1
. I ignore Instruerend
for this example.
这将导致4( Q1.X:Q4.X )列,每列的文本值唯一计数均在其允许的范围内.因此,<<行的行的 Q1.Delegerend
, Q2.Delegerend
, Q3.Delegerend
, Q4.Delegerend
code> Operatie == 1 表示 Delegerend
(21个唯一值之一).我们可能需要建立转移矩阵",因此需要将观察结果拆分.
This leads to 4 (Q1.X:Q4.X) columns with each having their unique count of text values within their respected range. Thus, Q1.Delegerend
, Q2.Delegerend
, Q3.Delegerend
, Q4.Delegerend
for the row of Operatie == 1
for Delegerend
(one of 21 unique). We might need to set up 'transition-matrices', hence the split of the observations.
下面显示了原始数据的示例,新数据集将在一行中包含每个唯一值的列,请参见原始数据下面的示例.
An example of the raw data is shown below, the new data set will have columns for each unique value in a single row, see the example below the raw data.
Operatie Tijdstip Berekening.voor.D Minuut.van.de.Operatie Berekening.voor.F Q.Operatie Actor Responder Action Focus InterTeam
1 1 08:44:56 00:00 1 1% Q1 C OA Delegerend 1 b
2 1 08:45:43 00:00 2 2% Q1 C* AM Goedaardig 1 a
3 1 08:46:45 00:01 3 4% Q1 OA OA* Instruerend 3
4 1 08:47:10 00:02 3 4% Q1 C OA* Delegerend 1 b
5 1 08:48:03 00:03 4 6% Q1 C Team Onderwijzend 1 b
6 1 08:48:44 00:03 5 7% Q1 C Team Bewustwording 1 b
7 1 08:49:28 00:04 6 8% Q1 C* C Instruerend 1 b
8 1 08:50:30 00:05 7 9% Q1 C C* Onderwijzend 1 b
9 1 08:50:47 00:05 7 10% Q1 C AM Delegerend 1 a
10 1 08:51:47 00:06 8 11% Q1 C OA Instruerend 1 b
因此,最终,我想将一行( Operatie )包含21列,并以该唯一文本的频率(取自 Action 列)进行排序按 Q.Operatie
的级别进行操作.是的,这将导致很多列,21个唯一值乘以4.但这很好.
Thus, ultimately, I'd like to have one row (Operatie) with 21 columns with a frequency of that unique text (taken from column Action), sorted by the levels of Q.Operatie
. Yes, this will lead to a lot of columns, 21 unique values times 4. But that's fine.
Operatie Minuten Chirurg1 Chirurg2 Q1.Delegerend Q2.Delegerend Q3.Delegerend Q4.Delegerend Q1.Goedaardig
1 1 72 10 11 2 4 5 5
2 2 30 10 11 2 2 6 12
3 3 102 1 2 1 5 12 ...
4 4 212 2 NA 3 13 13
5 5 37 4 NA 1 2 ...
6 6 57 2 NA 3 9
7 7 120 3 NA 1 9
8 8 146 3 NA 1 6
9 9 143 2 9 3 10
10 10 189 9 2 3 12
因此,我尝试列出了dplyr可以使用的列表,请参见下文.我没有设法使其流畅地工作.我的印象是,可以调用一个列表来计算唯一值,而不必确定如何使用 dplyr
编写该列表.我看了几篇文章,但找不到关于计数多行以将其迁移到新数据集的任何信息.但是,后者很容易,我只需要这些列即可.
So I tried making a list for dplyr to work with, see below. I didn't manage to get it to work fluently. I am under the impression that it is possible to call a list to count unique values over, not sure how to write that up using dplyr
. I looked at a few posts, but I couldn't find anything about counting multiple rows in order to migrate it to a new dataset. However, the latter is easy enough, I just need the columns.
my_list <- list(unique(sort(obs_IND$Action)))
obs_IND %>%
count(my_list) %>%
group_by(Operatie) %>%
tally()
使用的来源:
- https://datascience.stackexchange.com/questions/6773/如何计算每个id在r中的计数
- 计算R中各列的唯一值
- 如何计数R(如Stata命令计数)中的观察次数
- 计算每个组中的行数
推荐答案
我创建了一些示例数据:
I created some sample data:
operatie <- rep(c(rep(1,10), rep(2,10)),2)
Q <- rep(rep(c(rep('Q1',5),rep('Q2',5)),2),2)
action <- rep(rep(paste('action', 1:4),5),2)
df <- data.frame(operatie, Q, action)
library(dplyr)
library(tidyr)
我们可以按 operatie
, Q
和 action
进行分组,然后使用tally()对实例进行计数.
We can group by operatie
, Q
and action
, and then count the instance with tally().
df_long <- df %>% group_by(operatie, Q, action) %>% tally()
df_long$action.Q <- paste(df_long$action,df_long$Q)
现在,我们可以使用功能 spread
来为 Q
和 action
的每种组合创建带有列的宽数据框:
Now we can use the function spread
to create wide dataframe with columns for each combination of Q
and action
:
df_wide <- df_long %>% spread(action.Q, n, fill=0) %>% select(-c(Q,action))
结果
Q operatie `action 1 Q1` `action 1 Q2` `action 2 Q1` ...
<fct> <dbl> <dbl> <dbl> <dbl> ...
1 Q1 1 4 0 0 ...
2 Q1 1 0 0 2 ...
3 Q1 1 0 0 0 ...
4 Q1 1 0 0 0 ...
5 Q2 1 0 2 0 ...
6 Q2 1 0 0 0 ...
...
这篇关于计算不同列中的文本值,进入新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!