计算不同列中的文本值,进入新列 [英] Counting text values across different columns, in to new columns

查看:60
本文介绍了计算不同列中的文本值,进入新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

清晰度和错别字,进一步扩展了示例.

EDIT 1: Clarity and typo's, expanded more on example.

我有一个数据集,该数据集的一列( Action )具有文本值.我想计算唯一值(对于给定的 Operatie ),然后根据其ID(== Operatie )将它们放在新列中. Action 中有21个唯一值.

I have a dataset with one column (Action) that has text values. I want to count the unique values (for a given Operatie) and place these in a new column, according to their ID (== Operatie). There are 21 unique values within Action.

在新数据集中,重要的是将新列(该列计算 Action 中的单个文本值)链接到 Q.Operatie 的值(具有值 Q1,Q2,Q3,Q4 )和 Operatie ( 1:100 ).

In the new data set it is important that the new column (which counts a single text value from Action) is linked to the value of Q.Operatie (has values Q1, Q2, Q3, Q4) and Operatie (1:100).

因此,如果我们在示例中采用前4行,则将有一个名为 Q1.Delegerend == 2的列.code> Q1.Goedaardig ==1.因为对于 Operation == 1 Delegerend 和1次 Goedaardig >.在此示例中,我忽略了 Instruerend .

Thus, if we take the first 4 rows in our example then we would have a column named Q1.Delegerend == 2. Whereas the next column would be Q1.Goedaardig == 1. Because we have 2 occurrences of Delegerend and 1 occurrence of Goedaardig for Operation == 1. I ignore Instruerend for this example.

这将导致4( Q1.X:Q4.X )列,每列的文本值唯一计数均在其允许的范围内.因此,<<行的行的 Q1.Delegerend Q2.Delegerend Q3.Delegerend Q4.Delegerend code> Operatie == 1 表示 Delegerend (21个唯一值之一).我们可能需要建立转移矩阵",因此需要将观察结果拆分.

This leads to 4 (Q1.X:Q4.X) columns with each having their unique count of text values within their respected range. Thus, Q1.Delegerend, Q2.Delegerend, Q3.Delegerend, Q4.Delegerend for the row of Operatie == 1 for Delegerend (one of 21 unique). We might need to set up 'transition-matrices', hence the split of the observations.

下面显示了原始数据的示例,新数据集将在一行中包含每个唯一值的列,请参见原始数据下面的示例.

An example of the raw data is shown below, the new data set will have columns for each unique value in a single row, see the example below the raw data.

   Operatie Tijdstip Berekening.voor.D Minuut.van.de.Operatie Berekening.voor.F Q.Operatie Actor Responder        Action Focus InterTeam
1         1 08:44:56             00:00                      1                1%        Q1      C        OA    Delegerend     1         b
2         1 08:45:43             00:00                      2                2%        Q1     C*        AM    Goedaardig     1         a
3         1 08:46:45             00:01                      3                4%        Q1     OA       OA*   Instruerend     3          
4         1 08:47:10             00:02                      3                4%        Q1      C       OA*    Delegerend     1         b
5         1 08:48:03             00:03                      4                6%        Q1      C      Team  Onderwijzend     1         b
6         1 08:48:44             00:03                      5                7%        Q1      C      Team Bewustwording     1         b
7         1 08:49:28             00:04                      6                8%        Q1     C*         C   Instruerend     1         b
8         1 08:50:30             00:05                      7                9%        Q1      C        C*  Onderwijzend     1         b
9         1 08:50:47             00:05                      7               10%        Q1      C        AM    Delegerend     1         a
10        1 08:51:47             00:06                      8               11%        Q1      C        OA   Instruerend     1         b

因此,最终,我想将一行( Operatie )包含21列,并以该唯一文本的频率(取自 Action 列)进行排序按 Q.Operatie 的级别进行操作.是的,这将导致很多列,21个唯一值乘以4.但这很好.

Thus, ultimately, I'd like to have one row (Operatie) with 21 columns with a frequency of that unique text (taken from column Action), sorted by the levels of Q.Operatie. Yes, this will lead to a lot of columns, 21 unique values times 4. But that's fine.

   Operatie Minuten Chirurg1 Chirurg2 Q1.Delegerend Q2.Delegerend Q3.Delegerend Q4.Delegerend Q1.Goedaardig
1         1      72       10       11           2          4            5            5
2         2      30       10       11           2          2            6            12
3         3     102        1        2           1          5            12            ...
4         4     212        2       NA           3         13            13
5         5      37        4       NA           1          2            ...
6         6      57        2       NA           3          9
7         7     120        3       NA           1          9
8         8     146        3       NA           1          6
9         9     143        2        9           3         10
10       10     189        9        2           3         12

因此,我尝试列出了dplyr可以使用的列表,请参见下文.我没有设法使其流畅地工作.我的印象是,可以调用一个列表来计算唯一值,而不必确定如何使用 dplyr 编写该列表.我看了几篇文章,但找不到关于计数多行以将其迁移到新数据集的任何信息.但是,后者很容易,我只需要这些列即可.

So I tried making a list for dplyr to work with, see below. I didn't manage to get it to work fluently. I am under the impression that it is possible to call a list to count unique values over, not sure how to write that up using dplyr. I looked at a few posts, but I couldn't find anything about counting multiple rows in order to migrate it to a new dataset. However, the latter is easy enough, I just need the columns.

my_list <- list(unique(sort(obs_IND$Action)))

obs_IND %>% 
count(my_list) %>%
group_by(Operatie) %>%
tally()

使用的来源:

推荐答案

我创建了一些示例数据:

I created some sample data:

operatie <- rep(c(rep(1,10), rep(2,10)),2)
Q <- rep(rep(c(rep('Q1',5),rep('Q2',5)),2),2)
action <- rep(rep(paste('action', 1:4),5),2)
df <- data.frame(operatie, Q, action)

library(dplyr)
library(tidyr)

我们可以按 operatie Q action 进行分组,然后使用tally()对实例进行计数.

We can group by operatie, Q and action, and then count the instance with tally().

df_long <- df %>% group_by(operatie, Q, action) %>% tally()
df_long$action.Q <- paste(df_long$action,df_long$Q)

现在,我们可以使用功能 spread 来为 Q action 的每种组合创建带有列的宽数据框:

Now we can use the function spread to create wide dataframe with columns for each combination of Q and action:

df_wide <- df_long %>% spread(action.Q, n, fill=0) %>% select(-c(Q,action))

结果

  Q     operatie `action 1 Q1` `action 1 Q2` `action 2 Q1` ...
  <fct>    <dbl>         <dbl>         <dbl>         <dbl> ...
1 Q1           1             4             0             0 ...
2 Q1           1             0             0             2 ...
3 Q1           1             0             0             0 ...
4 Q1           1             0             0             0 ...
5 Q2           1             0             2             0 ...
6 Q2           1             0             0             0 ...
...

这篇关于计算不同列中的文本值,进入新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆