scale.default 错误:'center' 的长度必须等于 'x' 的列数 [英] Error in scale.default: length of 'center' must equal the number of columns of 'x'

查看:137
本文介绍了scale.default 错误:'center' 的长度必须等于 'x' 的列数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 mboost 包来做一些分类.这是代码

I am using mboost package to do some classification. Here is the code

library('mboost')
load('so-data.rdata')
model <- glmboost(is_exciting~., data=training, family=Binomial())
pred <- predict(model, newdata=test, type="response")

但是 R 在做预测时会抱怨

But R complains when doing prediction that

Error in scale.default(X, center = cm, scale = FALSE) : 
  length of 'center' must equal the number of columns of 'x'

数据(trainingtest)可以在这里下载 (7z, zip).错误的原因是什么以及如何摆脱它?谢谢.

The data (training and test) can be downloaded here (7z, zip). What is the reason of the error and how to get rid of it? Thank you.

更新:

> str(training)
'data.frame':   439599 obs. of  24 variables:
 $ is_exciting                           : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ school_state                          : Factor w/ 52 levels "AK","AL","AR",..: 15 5 5 23 47 5 44 42 42 5 ...
 $ school_charter                        : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ school_magnet                         : Factor w/ 2 levels "f","t": 1 1 1 1 2 1 1 1 1 1 ...
 $ school_year_round                     : Factor w/ 2 levels "f","t": 1 1 1 1 1 2 1 1 1 2 ...
 $ school_nlns                           : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ school_charter_ready_promise          : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ teacher_prefix                        : Factor w/ 6 levels "","Dr.","Mr.",..: 5 5 3 5 6 5 6 6 5 6 ...
 $ teacher_teach_for_america             : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 2 1 2 1 ...
 $ teacher_ny_teaching_fellow            : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ primary_focus_subject                 : Factor w/ 28 levels "","Applied Sciences",..: 19 17 18 18 10 4 17 17 18 17 ...
 $ primary_focus_area                    : Factor w/ 8 levels "","Applied Learning",..: 6 5 5 5 5 4 5 5 5 5 ...
 $ secondary_focus_subject               : Factor w/ 28 levels "","Applied Sciences",..: 28 18 17 19 26 18 18 28 24 25 ...
 $ secondary_focus_area                  : Factor w/ 8 levels "","Applied Learning",..: 7 5 5 6 8 5 5 7 7 4 ...
 $ resource_type                         : Factor w/ 7 levels "","Books","Other",..: 4 4 2 5 5 2 2 5 5 5 ...
 $ poverty_level                         : Factor w/ 4 levels "high poverty",..: 2 2 4 2 1 2 2 1 2 1 ...
 $ grade_level                           : Factor w/ 5 levels "","Grades 3-5",..: 5 5 2 5 5 2 3 2 4 2 ...
 $ fulfillment_labor_materials           : num  30 35 35 30 30 35 30 35 35 35 ...
 $ total_price_excluding_optional_support: num  1274 477 892 548 385 ...
 $ total_price_including_optional_support: num  1499 562 1050 645 453 ...
 $ students_reached                      : int  31 20 250 36 19 28 90 21 60 56 ...
 $ eligible_double_your_impact_match     : Factor w/ 2 levels "f","t": 1 2 1 2 1 2 1 1 1 1 ...
 $ eligible_almost_home_match            : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 2 2 1 1 ...
 $ essay_length                          : int  236 285 194 351 383 273 385 437 476 159 ...


> str(test)
'data.frame':   44772 obs. of  23 variables:
 $ school_state                          : Factor w/ 51 levels "AK","AL","AR",..: 22 35 11 46 5 35 11 28 28 10 ...
 $ school_charter                        : Factor w/ 2 levels "f","t": 1 1 1 1 2 1 1 1 1 1 ...
 $ school_magnet                         : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ school_year_round                     : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ school_nlns                           : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ school_charter_ready_promise          : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ teacher_prefix                        : Factor w/ 6 levels "","Dr.","Mr.",..: 3 5 6 6 3 5 5 5 3 5 ...
 $ teacher_teach_for_america             : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ teacher_ny_teaching_fellow            : Factor w/ 2 levels "f","t": 1 2 1 1 1 1 1 1 1 1 ...
 $ primary_focus_subject                 : Factor w/ 28 levels "","Applied Sciences",..: 5 16 17 17 18 11 16 17 2 17 ...
 $ primary_focus_area                    : Factor w/ 8 levels "","Applied Learning",..: 2 4 5 5 5 2 4 5 6 5 ...
 $ secondary_focus_subject               : Factor w/ 28 levels "","Applied Sciences",..: 25 1 19 1 17 9 17 11 1 1 ...
 $ secondary_focus_area                  : Factor w/ 8 levels "","Applied Learning",..: 4 1 6 1 5 6 5 2 1 1 ...
 $ resource_type                         : Factor w/ 7 levels "","Books","Other",..: 5 5 5 2 5 6 4 5 5 4 ...
 $ poverty_level                         : Factor w/ 4 levels "high poverty",..: 1 2 4 4 1 2 2 2 1 2 ...
 $ grade_level                           : Factor w/ 5 levels "","Grades 3-5",..: 4 3 3 5 4 5 5 4 3 5 ...
 $ fulfillment_labor_materials           : num  30 30 30 30 30 30 30 30 30 30 ...
 $ total_price_excluding_optional_support: num  2185 149 1017 156 860 ...
 $ total_price_including_optional_support: num  2571 175 1197 183 1012 ...
 $ students_reached                      : int  200 110 10 22 180 51 30 15 260 20 ...
 $ eligible_double_your_impact_match     : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
 $ eligible_almost_home_match            : Factor w/ 2 levels "f","t": 2 1 1 1 1 1 1 1 2 1 ...
 $ essay_length                          : int  221 137 313 243 373 344 304 431 231 173 ...


> summary(model)

     Generalized Linear Models Fitted via Gradient Boosting

Call:
glmboost.formula(formula = is_exciting ~ ., data = training,     family = Binomial())


     Negative Binomial Likelihood 

Loss function: { 
     f <- pmin(abs(f), 36) * sign(f) 
     p <- exp(f)/(exp(f) + exp(-f)) 
     y <- (y + 1)/2 
     -y * log(p) - (1 - y) * log(1 - p) 
 } 


Number of boosting iterations: mstop = 100 
Step size:  0.1 
Offset:  -1.197806 

Coefficients: 

NOTE: Coefficients from a Binomial model are half the size of coefficients
 from a model fitted via glm(... , family = 'binomial').
See Warning section in ?coef.mboost

                       (Intercept)                     school_stateDC 
                     -0.5250166130                       0.0426909965 
                    school_stateIL                    school_chartert 
                      0.0084191638                       0.0729272310 
                teacher_prefixMrs.                  teacher_prefixMs. 
                     -0.0181489492                       0.0438425925 
        teacher_teach_for_americat                 resource_typeBooks 
                      0.2593005345                       0.0046126706 
           resource_typeTechnology        fulfillment_labor_materials 
                     -0.0313904871                       0.0120086140 
eligible_double_your_impact_matcht        eligible_almost_home_matcht 
                     -0.0316376431                      -0.0522717398 
                      essay_length 
                      0.0004993224 
attr(,"offset")
[1] -1.197806

Selection frequencies:
       fulfillment_labor_materials         teacher_teach_for_americat 
                              0.24                               0.15 
                      essay_length                    school_chartert 
                              0.15                               0.09 
                 teacher_prefixMs.            resource_typeTechnology 
                              0.08                               0.07 
eligible_double_your_impact_matcht        eligible_almost_home_matcht 
                              0.07                               0.07 
                teacher_prefixMrs.                     school_stateDC 
                              0.04                               0.02 
                    school_stateIL                 resource_typeBooks 
                              0.01                               0.01 

我也试过 glm 但它说

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  factor teacher_prefix has new levels 

但我在 teacher_prefix 变量中没有看到任何新级别:

But I don't see any new levels in the teacher_prefix variable:

> levels(training$teacher_prefix)
[1] ""           "Dr."        "Mr."        "Mr. & Mrs." "Mrs."       "Ms."       
> levels(test$teacher_prefix)
[1] ""           "Dr."        "Mr."        "Mr. & Mrs." "Mrs."       "Ms."       

推荐答案

其实glmboostglm的问题是相关的.您的 teacher_prefix 变量存在问题.

Actually, the problems with glmboost and glm are related. There are problems with your teacher_prefix variable.

正如 glm 示例所指出的,test 中的某些级别不在 training 中(有点).虽然这两个因素具有相同的 levels(),但训练集没有观察到 teacher_prefix=="" 但测试有.比较

As the glm example points out, there are levels that are in test that are not in training (kind of). While both factors have the same levels(), the training set has no observations where teacher_prefix=="" but test does. Compare

table(test$teacher_prefix)
table(training$teacher_prefix)

所以 glm 实际上给出了更准确、更有用的错误信息.glmboost 的问题是一样的,尽管它没有直接说出来.

So glm is actually giving the more accurate, helpful error message. The problem is the same with glmboost although it isn't as direct about saying it.

这样做似乎修复"了它

test2 <- subset(test, teacher_prefix %in% c("Dr.","Mr.","Mrs.","Ms."))
test2$teacher_prefix <- droplevels(test2$teacher_prefix)
pred <- predict(model, newdata=test2, type="response")

我们只是去掉未使用的级别,然后进行标准预测.

We just get rid of the unused levels and then do the standard prediction.

这篇关于scale.default 错误:'center' 的长度必须等于 'x' 的列数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆