如何创建配偶变量? [英] How to create spouse variable?

查看:39
本文介绍了如何创建配偶变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一对夫妇的数据,变量有:户籍号"、户主"、教育"、收入".家庭号码"是唯一分配给每个家庭的身份证号码.户主"是指此人是否为户主(1 = 户主,2 = 户主的配偶),教育"和收入"分别是个人的教育水平和收入.例如,数据如下所示.

I have a data of couples, with variables : 'household number', 'head of household' , 'education', 'income'. 'household number' is the id number that is uniquely assigned to each household. 'head of household' is whether the person is the head of the household ( 1 = head of household, 2 = spouse of head of the household), 'education' and 'income' is education level and income of individual respectively. For example, data looks like below.

'household_number'  'head_of_household'  'education'  'income'
        1                     1              high       1000
        1                     2              low        100
        3                     1              medium     500
        3                     2              high       800
        4                     2              high       800
        4                     1              high       800
        9                     1              low        150
        9                     2              low        200

我想为每个人创建配偶变量.所以该数据如下所示.其中spouse edu"是配偶的教育水平,spouse inc"是配偶的收入.

I want to create spouse's variable for each individual. So that data looks like below. Where 'spouse edu' is spouse's education level and 'spouse inc' is spouse's income.

'household_number'  'head_of_household'  'education'  'income' 'spouse_edu' 'spouse_inc'
        1                     1              high       1000      low         100
        1                     2              low        100       high        1000
        3                     1              medium     500       high        800
        3                     2              high       800       medium      500
        4                     2              high       800       high        800
        4                     1              high       800       high        800
        9                     1              low        150       low         200
        9                     2              low        200       low         150

我有非常大的数据集,所以我正在寻找简单的方法来做到这一点.有没有什么优雅的方法可以做到这一点?

I have very large dataset so I am looking for simple way to do this. Is there any elegant way to do this?

以下是可重现的示例语法.

Below is reproducible example syntax.

household_number <- c(1,1,3,3,4,4,9,9)
head_of_household <- c(1,2,1,2,2,1,1,2)
education <- c("high", "low", "medium", "high", "high", "high", "low", "low")
income <- c(1000, 100, 500, 800, 800, 800, 150, 200)

data <- data.frame(household_number, head_of_household, education, income)

推荐答案

您可以在这里使用 base::revdplyr.

You can use base::rev and dplyr here.

library(dplyr)
data %>% 
 group_by(household_number) %>% 
 mutate(spouse_income = rev(income),
        spouse_education = rev(education)) %>% 
 ungroup()

# A tibble: 8 x 6
#  household_number head_of_household education income spouse_income spouse_education
#             <dbl>             <dbl>    <fctr>  <dbl>         <dbl>           <fctr>
#1                1                 1      high   1000           100              low
#2                1                 2       low    100          1000             high
#3                3                 1    medium    500           800             high
#4                3                 2      high    800           500           medium
#5                4                 2      high    800           800             high
#6                4                 1      high    800           800             high
#7                9                 1       low    150           200              low
#8                9                 2       low    200           150              low

<小时>

使用data.table的解决方案.

library(data.table)
setDT(data)[, c("spouse_income", "spouse_education") := .(rev(income), rev(education)),
            by = household_number][]

# same as
# setDT(data)[, `:=`(spouse_income = rev(income), 
#                    spouse_education = rev(education)),
#             by = household_number][]

<小时>

base R中可以做到

transform(data, 
          spouse_income = ave(income, household_number, FUN = rev),
          spouse_education = ave(education, household_number, FUN = rev)) 

这篇关于如何创建配偶变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆