根据班级间隔确定的结果创建装箱的变量 [英] Create binned variable from results of class interval determination

查看:106
本文介绍了根据班级间隔确定的结果创建装箱的变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从连续变量中创建一个装箱变量.我想要10个垃圾箱,并根据抽搐分类的结果设置断点.如何将每个值分配给这10个仓位之一?

I want to create a binned variable out of a continuous variable. I want 10 bins, with break points set from whatever results from a jenks classification. How do I assign each value to one of these 10 bins?

# dataframe w/ values (AllwdAmt)
df <- structure(list(X = c(2078L, 2079L, 2080L, 2084L, 2085L, 2086L, 
2087L, 2092L, 2093L, 2094L, 2095L, 4084L, 4085L, 4086L, 4087L, 
4088L, 4089L, 4091L, 4092L, 4093L, 4094L, 4095L, 4096L, 4097L, 
4098L, 4099L, 4727L, 4728L, 4733L, 4734L, 4739L, 4740L, 4741L, 
4742L, 4743L, 4744L, 4745L, 4746L, 4747L, 4748L, 4749L, 4750L, 
4751L, 4752L, 4753L, 4754L, 4755L, 4756L, 4757L, 4758L), AllwdAmt = c(34.66, 
105.56, 105.56, 473.93, 108, 1669.23, 201.5, 62.67, 61.54, 601.28, 
236.96, 108, 40.28, 29.32, 483.6, 236.96, 6072.4, 25.97, 120.9, 
61.54, 32.18, 473.93, 302.25, 0, 8.48, 3140.18, 0, 0, 6.83, 6.83, 
895.44, 895.44, 24.11, 24.11, 32.18, 32.18, 236.96, 236.96, 11.96, 
11.96, 80.08, 80.08, 3140.18, 3140.18, 163.62, 163.62, 236.96, 
236.96, 216.01, 216.01)), .Names = c("X", "AllwdAmt"), row.names = c(1137L, 
1138L, 1139L, 1140L, 1141L, 1142L, 1143L, 1144L, 1145L, 1146L, 
1147L, 1945L, 1946L, 1947L, 1948L, 1949L, 1950L, 1951L, 1952L, 
1953L, 1954L, 1955L, 1956L, 1957L, 1958L, 1959L, 2265L, 2266L, 
2267L, 2268L, 2269L, 2270L, 2271L, 2272L, 2273L, 2274L, 2275L, 
2276L, 2277L, 2278L, 2279L, 2280L, 2281L, 2282L, 2283L, 2284L, 
2285L, 2286L, 2287L, 2288L), class = "data.frame")

# get class intervals. This shows where the breaks should be.
library(classInt)
classIntervals(df$AllwdAmt, n = 10, style = 'jenks')

# Output:
style: jenks
   [0,140.48]   (140.48,396.26]   (396.26,799.55]  (799.55,1338.18] (1338.18,1864.02]        
         1109               423               111                97                58                 
  (1864.02,2586]    (2586,3451.35] (3451.35,5049.74] 
  12                44                20  
  (5049.74,6342.5] (6342.5,10407.88] 
           33                 2 

# A very inefficient way to assign breaks, based on output from above classIntervals:
df$AllwdBin <- ifelse(df$AllwdAmt <= 140.48,1,
                          ifelse(df$AllwdAmt > 140.48 & df$AllwdAmt <= 396.26,2,
                                 ifelse(df$AllwdAmt > 396.26 & df$AllwdAmt <= 799.55,3…

# The start of my code & associated error for automatically assigning a bin with breaks     
# coming from Jenks classification:

df$AllwdBin <- cut(df$AllwdAmt, breaks = classIntervals(df$AllwdAmt, n = 10, style =     
'jenks'), labels = c(as.character(1:10)))

# Output
Error in is.factor(x) : (list) object cannot be coerced to type 'double'

我理解上述错误与以下事实有关:classintervals的输出正在生成值列表,但是如何将该列表转换为有意义的中断?

I understand the above error relates to the fact that the output of classintervals is producing a list of values, but how to turn that list into meaningful breaks?

推荐答案

如下所示查看classIntervals()输出的names:

foo <- classIntervals(df$AllwdAmt, n=10, style='jenks')
names(foo)

这将告诉您foo有两个条目,varbrks.

This will tell you that foo has two entries, varand brks.

您需要使用此输出的$brks组件:

You need to use the $brks component of this output:

cut(df$AllwdAmt, breaks = foo$brks, labels=as.character(1:10))

这篇关于根据班级间隔确定的结果创建装箱的变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆