基于带有R的变量创建聚合列 [英] create aggregate column based on variables with R

查看:91
本文介绍了基于带有R的变量创建聚合列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果这是一个菜鸟问题,我向高级致歉,但我在论坛中查看了 却找不到搜索我正在尝试做的事情的方法. 我有一个训练集,我试图找到一种方法来减少分类变量的级别数 (在下面的示例中,类别是状态).我想将状态映射到水平的平均值或比率. 输入数据框后,我的训练集将如下所示:

I apologize in advanced if this is somewhat of a noob question but I looked in the forum and couldn't find a way to search what I am trying to do. I have a training set and I am trying to find a way to reduce the number of levels I have for my categorical variables (In the example below the category is the state). I would like to map the state to the mean or rate of the level. My training set would look like the following once input into a data frame:

    state class mean
1      CA     1    0
2      AZ     1    0
3      NY     0    0
4      CA     0    0
5      NY     0    0
6      AZ     0    0
7      AZ     1    0
8      AZ     0    0
9      CA     0    0
10     VA     1    0

基于类变量,我希望数据框中的第三列是第一列(状态)的均值.因此,CA行的平均值为0.333 ... 这样均值列就可以代替状态列 有没有在R中没有编写显式循环的好的方法?

I would like the third column in my data frame to be the mean of the first column(state) based on the class variable. so the mean for CA rows will be 0.333 ... so that the mean column could be used as a replacement for the state column Is there some good way of doing this without writing an explicit loop in R?

如果我的训练集不包含新的关卡(例如新的州),如何去映射它们?与R中的方法的任何链接将不胜感激.

How does one go about mapping new levels (example new states) if my training set didn't include them? Any link to approaches in R would be greatly appreciated.

推荐答案

这确实是ave函数设计的目的.它确实可以用于按类别构造任何功能结果,但其默认功能是表示名称,即ave-(rage):

This is really what the ave function was designed for. It can really be used to construct any functional result by category, but its default funciton is mean hence the name, ie, ave-(rage):

dfrm$mean <- with( dfrm, ave( class, state ) ) #FUN=mean is the default "setting"

这篇关于基于带有R的变量创建聚合列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆