R:将数据分配给它们的百分位数 [英] R: Assigning Data to their Percentiles

查看:11
本文介绍了R:将数据分配给它们的百分位数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用R编程语言。假设我有以下数据框:

var_1 = rnorm(100,10,10)
var_2 = rnorm(100,10,10)
var_3 = rnorm(100,10,10)

d = data.frame(var_1, var_2, var_3)

head(d)


      var_1     var_2      var_3
1 14.251923 14.877801  22.636207
2  7.325137  8.513718  21.021522
3  3.400001 -3.400397  11.274797
4 16.400597  8.623980   9.366115
5  7.065583 13.155570  17.891432
6 21.297912  4.341385 -11.337330

我的问题:对于每个变量中的每个元素,我希望将该元素替换为百分位(例如,第5、10、15等)。它属于。

例如:

a = quantile(d$var_1, c(0.05, 0.10, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1))
b = quantile(d$var_2, c(0.05, 0.10, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1))
c = quantile(d$var_3, c(0.05, 0.10, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1))

new = data.frame(a,b,c)

              a           b          c
5%   -0.8806901 -7.40560488 -4.7353920
10%   0.3595086 -3.77910527 -0.6874766
15%   1.1201300 -2.91946322  0.9584040
20%   3.0581928  0.05127097  2.1457693
25%   5.0901641  1.91719913  4.6997966
30%   7.0056228  2.56215345  6.2691894
35%   7.6089831  3.58688942  7.1900823
40%   8.9853805  5.00957881  7.8488446
45%   9.9264540  5.73653135  8.6135093
50%  10.2235212  7.43425669  9.6063344
55%  11.5707533  8.54160196 10.9239040
60%  13.2422940  9.65006232 11.7036647
65%  15.1076889 11.07081528 13.2440004
70%  16.5354881 12.38804922 15.2585324
75%  17.9336020 13.16121940 17.6656208
80%  19.5312682 15.31472178 18.4820207
85%  21.9264905 17.99689941 19.3347983
90%  24.4511364 20.47478783 22.0647173
95%  26.6820271 25.27082341 24.4473033
100% 41.4419744 39.75848302 34.5105183

现在,每次变量位于每个百分位范围之间时,我想进行以下替换:

  • IFd$var_1 < -0.8806901,则d$var_1 == as.factor("5th percentile")
  • IFd$var_1 > -0.8806901 d$var_1 < 0.3595086,则d$var_1 == as.factor("10th percentile")

...

  • IFd$var_1 > 15.1076889 d$var_1 < 16.5354881,则d$var_1 == as.factor("65th percentile")

  • IFd$var_2 < -7.40560488,则d$var_2 == as.factor("5th percentile")

  • IFd$var_3 < -4.7353920,则d$var_3 == as.factor("5th percentile")

有人能教我怎么做吗?

推荐答案

这可能是您想要的

apply(d, 2, function(x) paste0( ntile(x, n = 20L) / 20 * 100, "th percentile" ))

输出

       var_1              var_2              var_3             
  [1,] "60th percentile"  "100th percentile" "25th percentile" 
  [2,] "80th percentile"  "60th percentile"  "100th percentile"
  [3,] "45th percentile"  "90th percentile"  "75th percentile" 
  [4,] "70th percentile"  "85th percentile"  "35th percentile" 
  [5,] "30th percentile"  "5th percentile"   "55th percentile" 
  ...

完成

library(data.table)
cols = c("var_1", "var_3")
setDT(d)[, (cols) := lapply(.SD, function(x) paste0( ntile(x, n = 20L) / 20 * 100, "th percentile")), .SDcols = cols]

这篇关于R:将数据分配给它们的百分位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆