计算不同列中的文本值，进入新列 [英] Counting text values across different columns, in to new columns

查看：60 发布时间：2021/5/2 20:57:01 r dplyr

本文介绍了计算不同列中的文本值，进入新列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

清晰度和错别字，进一步扩展了示例.

EDIT 1: Clarity and typo's, expanded more on example.

我有一个数据集，该数据集的一列( Action )具有文本值.我想计算唯一值(对于给定的 Operatie )，然后根据其ID(== Operatie )将它们放在新列中. Action 中有21个唯一值.

I have a dataset with one column (Action) that has text values. I want to count the unique values (for a given Operatie) and place these in a new column, according to their ID (== Operatie). There are 21 unique values within Action.

在新数据集中，重要的是将新列(该列计算 Action 中的单个文本值)链接到 Q.Operatie 的值(具有值 Q1，Q2，Q3，Q4 )和 Operatie ( 1:100 ).

In the new data set it is important that the new column (which counts a single text value from Action) is linked to the value of Q.Operatie (has values Q1, Q2, Q3, Q4) and Operatie (1:100).

因此，如果我们在示例中采用前4行，则将有一个名为 Q1.Delegerend == 2的列.code> Q1.Goedaardig ==1.因为对于 Operation == 1 Delegerend 和1次 Goedaardig >.在此示例中，我忽略了 Instruerend .

Thus, if we take the first 4 rows in our example then we would have a column named Q1.Delegerend == 2. Whereas the next column would be Q1.Goedaardig == 1. Because we have 2 occurrences of Delegerend and 1 occurrence of Goedaardig for Operation == 1. I ignore Instruerend for this example.

这将导致4( Q1.X:Q4.X )列，每列的文本值唯一计数均在其允许的范围内.因此，<<行的行的 Q1.Delegerend ， Q2.Delegerend ， Q3.Delegerend ， Q4.Delegerend code> Operatie == 1 表示 Delegerend (21个唯一值之一).我们可能需要建立转移矩阵"，因此需要将观察结果拆分.

This leads to 4 (Q1.X:Q4.X) columns with each having their unique count of text values within their respected range. Thus, Q1.Delegerend, Q2.Delegerend, Q3.Delegerend, Q4.Delegerend for the row of Operatie == 1 for Delegerend (one of 21 unique). We might need to set up 'transition-matrices', hence the split of the observations.

下面显示了原始数据的示例，新数据集将在一行中包含每个唯一值的列，请参见原始数据下面的示例.

An example of the raw data is shown below, the new data set will have columns for each unique value in a single row, see the example below the raw data.

   Operatie Tijdstip Berekening.voor.D Minuut.van.de.Operatie Berekening.voor.F Q.Operatie Actor Responder        Action Focus InterTeam
1         1 08:44:56             00:00                      1                1%        Q1      C        OA    Delegerend     1         b
2         1 08:45:43             00:00                      2                2%        Q1     C*        AM    Goedaardig     1         a
3         1 08:46:45             00:01                      3                4%        Q1     OA       OA*   Instruerend     3          
4         1 08:47:10             00:02                      3                4%        Q1      C       OA*    Delegerend     1         b
5         1 08:48:03             00:03                      4                6%        Q1      C      Team  Onderwijzend     1         b
6         1 08:48:44             00:03                      5                7%        Q1      C      Team Bewustwording     1         b
7         1 08:49:28             00:04                      6                8%        Q1     C*         C   Instruerend     1         b
8         1 08:50:30             00:05                      7                9%        Q1      C        C*  Onderwijzend     1         b
9         1 08:50:47             00:05                      7               10%        Q1      C        AM    Delegerend     1         a
10        1 08:51:47             00:06                      8               11%        Q1      C        OA   Instruerend     1         b

因此，最终，我想将一行( Operatie )包含21列，并以该唯一文本的频率(取自 Action 列)进行排序按 Q.Operatie 的级别进行操作.是的，这将导致很多列，21个唯一值乘以4.但这很好.

Thus, ultimately, I'd like to have one row (Operatie) with 21 columns with a frequency of that unique text (taken from column Action), sorted by the levels of Q.Operatie. Yes, this will lead to a lot of columns, 21 unique values times 4. But that's fine.

   Operatie Minuten Chirurg1 Chirurg2 Q1.Delegerend Q2.Delegerend Q3.Delegerend Q4.Delegerend Q1.Goedaardig
1         1      72       10       11           2          4            5            5
2         2      30       10       11           2          2            6            12
3         3     102        1        2           1          5            12            ...
4         4     212        2       NA           3         13            13
5         5      37        4       NA           1          2            ...
6         6      57        2       NA           3          9
7         7     120        3       NA           1          9
8         8     146        3       NA           1          6
9         9     143        2        9           3         10
10       10     189        9        2           3         12

因此，我尝试列出了dplyr可以使用的列表，请参见下文.我没有设法使其流畅地工作.我的印象是，可以调用一个列表来计算唯一值，而不必确定如何使用 dplyr 编写该列表.我看了几篇文章，但找不到关于计数多行以将其迁移到新数据集的任何信息.但是，后者很容易，我只需要这些列即可.

So I tried making a list for dplyr to work with, see below. I didn't manage to get it to work fluently. I am under the impression that it is possible to call a list to count unique values over, not sure how to write that up using dplyr. I looked at a few posts, but I couldn't find anything about counting multiple rows in order to migrate it to a new dataset. However, the latter is easy enough, I just need the columns.

my_list <- list(unique(sort(obs_IND$Action)))

obs_IND %>% 
count(my_list) %>%
group_by(Operatie) %>%
tally()

使用的来源:

计算不同列中的文本值，进入新列 [英] Counting text values across different columns, in to new columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

计算不同列中的文本值，进入新列 [英] Counting text values across different columns, in to new columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭