使用cut2(没有[]符号)在Hmisc中获得不错的剪切效果 [英] Obtaining nice cuts in Hmisc with cut2 (without the [ ) signs )

查看:178
本文介绍了使用cut2(没有[]符号)在Hmisc中获得不错的剪切效果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试使用Hmisc包整齐地剪切数据,如下例所示:

I'm currently trying to neatly cut data with use of the Hmisc package, as in the example below:

dummy <- data.frame(important_variable=seq(1:1000))
require(Hmisc)
dummy$cuts <- cut2(dummy$important_variable, g = 4)

产生的切割值相对于以下值是正确的:

The produced cuts are correct with respect to the values:

  important_variable       cuts
1                  1 [  1, 251)
2                  2 [  1, 251)
3                  3 [  1, 251)
4                  4 [  1, 251)
5                  5 [  1, 251)
6                  6 [  1, 251)
> table(dummy$cuts)
[  1, 251) [251, 501) [501, 751) [751,1000] 
       250        250        250        250 

但是,我希望数据呈现的方式略有不同.例如代替

However, I would like for the data to be presented slightly differently. For instance instead of

[,1,251 )

[ 1, 251 )

[ 251,501 )

[ 251, 501 )

我希望使用符号

1-250

1 - 250

251-500

当我对多个变量进行大量操作时,我对一种可重现的解决方案感兴趣,该解决方案很容易应用于多个变量.

As I'm doing a lot of that on multiple variables I'm interested in a reproducible solution that would be easy to apply across multiple variables.

在评论中进行讨论之后,该解决方案必须在更多的 messy 变量(如x2 <- runif(100, 5.0, 7.5))上工作.

Following the discussion in comments, the solution would have to work on more messy variables, like x2 <- runif(100, 5.0, 7.5).

推荐答案

我们可以使用gsubfn删除括号,并通过从第二组数字中减去一个来更改数字部分

We could use gsubfn to remove the parentheses as well as change the numeric part by subtracting one from the second set of numbers

 library(gsubfn)
 v1 <- dummy$cuts
 v1New <-  gsubfn('\\[\\s*(\\d+),\\s*(\\d+)[^0-9]+', ~paste0(x, '-', 
                     as.numeric(y)-1), as.character(v1))
 table(v1New)
 # 1-250 251-500 501-750 751-999 
 #  250     250     250     250 

对于涉及小数的第二种情况,我们需要将数字与小数进行匹配,并通过将其放在括号(([0-9.]+)(\\d+\\.\\d+))中来捕获这些组.我们通过转换为数字"并从中减去0.01(as.numeric(y)-0.01)来更改第二组捕获组. \\s*表示0或多个空格.空格在格式上是不均匀的,因此我们必须使用空格而不是1个或多个空格的\\s+.

For the second case involving decimals, we need to match the numbers along with decimals and capture those groups by placing them in parentheses (([0-9.]+), (\\d+\\.\\d+)). We change the second set of capture group by converting to 'numeric' and subtracting 0.01 from it (as.numeric(y)-0.01). The \\s* denotes 0 or more spaces. The spaces was uneven in the format, so we had to use that instead of \\s+ which is 1 or more spaces.

 v2New <- gsubfn('\\[\\s*([0-9.]+),(\\d+\\.\\d+).*', ~paste0(x,
                 '-',as.numeric(y)-0.01), as.character(v2))
 table(v2New)
 v2New
 #5.00-5.59 5.60-6.12 6.13-6.71 6.72-7.49 
 #    25        25        25        25 

数据

 set.seed(24)
 x2 <- runif(100, 5.0, 7.5)
 v2 <- cut2(x2, g=4)

这篇关于使用cut2(没有[]符号)在Hmisc中获得不错的剪切效果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆