如何使用其他变量值和序列有条件地创建类别 [英] how to create categories conditionally using other variables values and sequence

查看:79
本文介绍了如何使用其他变量值和序列有条件地创建类别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我会很高兴创建一个函数,该函数允许我使用一组其他变量值的顺序来创建一个变量的类别。

I would appreciate any help to create a function that allows me to create categories of one variable using the order of a set of other variables values.


具体来说,我想要一个函数:

Specifically, I want a function that:


  1. 创建类别 E1 变量第一次的时间,变量的每个值组合 A B ID
    出现在数据集中。

  2. 创建变量变量 E2 变量 A B ID
    出现在数据集中。

  3. 创建变量变量 E3 变量 A B ID
    出现在数据集中。

  4. 创建变量变量的类别 En 变量 A B ID
    出现在数据集中。

  1. creates category E1 of the variable variable the first time that each combination of values of the variables A, B, and ID appears in the dataset.
  2. creates category E2 of the variable variable the second time that each combination of values of the variables A, B, and ID appears in the dataset.
  3. creates category E3 of the variable variable the third time that each combination of values of the variables A, B, and ID appears in the dataset.
  4. creates category En of the variable variable the nth time that each combination of values of the variables A, B, and ID appears in the dataset.


#sample数据:

rowdT<-structure(list(A = c("a1", "a2", "a1", "a1", "a2", "a1", "a1", 
            "a2", "a1"), B = c("b2", "b2", "b2", "b1", "b2", "b2", "b1", 
            "b2", "b1"), ID = c("3", "4", "3", "1", "4", "3", "1", "4", "1"
            ), E = c(0.621142094943352, 0.742109450696123, 0.39439152996948, 
            0.40694392882818, 0.779607277916503, 0.550579323666347, 0.352622183880119, 
            0.690660491345867, 0.23378944873769)), class = c("data.table", 
            "data.frame"), row.names = c(NA, -9L))     
sampleDT <- melt(rowdT, id.vars = c("A", "B", "ID"))

#输入数据:

    A  B  ID variable    value
1: a1 b2  3        E 0.6211421
2: a2 b2  4        E 0.7421095
3: a1 b2  3        E 0.3943915
4: a1 b1  1        E 0.4069439
5: a2 b2  4        E 0.7796073
6: a1 b2  3        E 0.5505793
7: a1 b1  1        E 0.3526222
8: a2 b2  4        E 0.6906605
9: a1 b1  1        E 0.2337894

#预期输出:

    A  B  ID variable    value
4: a1 b1  1        E1 0.4069439
1: a1 b2  3        E1 0.6211421
2: a2 b2  4        E1 0.7421095
7: a1 b1  1        E2 0.3526222
3: a1 b2  3        E2 0.3943915
5: a2 b2  4        E2 0.7796073
9: a1 b1  1        E3 0.2337894
6: a1 b2  3        E3 0.5505793
8: a2 b2  4        E3 0.6906605

在此先感谢您的帮助。

推荐答案

首先转换yo将ur变量转换为字符向量以进行适当的强制,然后使用 data.table

First convert your variable to a character vector for proper coercion, and then use data.table

sampleDT$variable = as.character(sampleDT$variable)

sampleDT[, variable := paste(variable,1:.N,sep = ""), by = c("A", "B", "ID")]

这会根据观察到的 A B ID

This creates unique tallies based on the observed combinations of A, B, and ID.

这将获得以下输出:

    A  B ID variable     value
1: a1 b2  3       E1 0.6211421
2: a2 b2  4       E1 0.7421095
3: a1 b2  3       E2 0.3943915
4: a1 b1  1       E1 0.4069439
5: a2 b2  4       E2 0.7796073
6: a1 b2  3       E3 0.5505793
7: a1 b1  1       E2 0.3526222
8: a2 b2  4       E3 0.6906605
9: a1 b1  1       E3 0.2337894

您可以根据需要重新排序。

which you can reorder if necessary.

这篇关于如何使用其他变量值和序列有条件地创建类别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆