拼写数据管理:过去24个月在给定状态下花费的月份 [英] Management of spell data: months spent in given state in the past 24 months

查看:86
本文介绍了拼写数据管理:过去24个月在给定状态下花费的月份的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用具有以下形式的咒语数据集:

I am working with a spell dataset that has the following form:

    clear all

input persid    start   end t_start t_end   spell_type  year    spell_number    event
    1   8   9   44  45  1   1999    1   0
    1   12  12  60  60  1   2000    1   0
    1   1   1   61  61  1   2001    1   0
    1   7   11  67  71  1   2001    2   0
    1   1   4   85  88  2   2003    1   0
    1   5   7   89  91  1   2003    2   1
    1   8   11  92  95  2   2003    3   0
    1   1   1   97  97  2   2004    1   0
    1   1   3   121 123 1   2006    1   1
    1   4   5   124 125 2   2006    2   0
    1   6   9   126 129 1   2006    3   1
    1   10  11  130 131 2   2006    4   0
    1   12  12  132 132 1   2006    5   1
    1   1   12  157 168 1   2009    1   0
    1   1   12  169 180 1   2010    1   0
    1   1   12  181 192 1   2011    1   0
    1   1   12  193 204 1   2012    1   0
    1   1   12  205 216 1   2013    1   0
end

lab define lab_spelltype 1 "unemployment spell" 2 "employment spell"
lab val spell_type lab_spelltype

其中 persid 是该人的ID; 开始结束分别是年度失业/就业期开始和结束的月份; t_start t_end 是相同的度量,但从1996年1月1日开始计算; 事件对于上一行为失业符的就业条目等于1。

where persid is the id of the person; start and end are the months when the yearly unemployment/employment spell starts and ends, respectively; t_start and t_end are the same measures but starting to count from 1st January 1996; event is equal to 1 for the employment entries for which the previous row was an unemployment spell.

数据为这样,在给定的一年中没有重叠的咒语,并且每年相同类型的连续咒语都被合并在一起。

The data is such that there are no overlapping spells during a given year, and each year contiguous spells of the same type have been merged together.

我的目标是,对于每一行, event 为1,以计算最近6个月和24个月的工作月数。
在此特定示例中,我想得到的是:

My goal is, for each row such that event is 1, to compute the number of months spent as employed in the last 6 months and 24 months. In this specific example, what I would like to get is:

clear all
input persid    start   end t_start t_end   spell_type  year    spell_number    event   empl_6  empl_24
    1   8   9   44  45  1   1999    1   0   .   .
    1   12  12  60  60  1   2000    1   0   .   .
    1   1   1   61  61  1   2001    1   0   .   .
    1   7   11  67  71  1   2001    2   0   .   .
    1   1   4   85  88  2   2003    1   0   .   .
    1   5   7   89  91  1   2003    2   1   0   5
    1   8   11  92  95  2   2003    3   0   .   .
    1   1   1   97  97  2   2004    1   0   .   .
    1   1   3   121 123 1   2006    1   1   0   0
    1   4   5   124 125 2   2006    2   0   .   .
    1   6   9   126 129 1   2006    3   1   3   3
    1   10  11  130 131 2   2006    4   0   .   .
    1   12  12  132 132 1   2006    5   1   4   7
    1   1   12  157 168 1   2009    1   0   .   .
    1   1   12  169 180 1   2010    1   0   .   .
    1   1   12  181 192 1   2011    1   0   .   .
    1   1   12  193 204 1   2012    1   0   .   .
    1   1   12  205 216 1   2013    1   0   .   .
end 

所以,我的想法是我必须返回到每个<$之前的行c $ c> event == 1 条目,并计算该人被雇用了多少个月。

So the idea is that I have to go back to rows preceding each event==1 entry and count how many months the individual was employed.

您能建议一种获得最终成绩的方法吗?结果?
有些人建议扩展数据集,但是也许有更好的方法来解决该问题(特别是因为数据集很大)。

Can you suggest a way to obtain this final result? Some suggested to expand the dataset, but perhaps there are better ways to tackle the problem (especially because the dataset is quite large).

编辑

正确的就业身份标签是:

The correct labeling of the employment status is:

lab define lab_spelltype 1 "employment spell" 2 "unemployment spell"

过去几个月在工作上花费的次数( empl_6 empl_24 )和事件现在使用此标签正确。

The number of past months spent in employment (empl_6 and empl_24) and the definition of event are now correct with this label.

推荐答案

问题是:


  • 扩展数据以使其每月一次,

  • 填补空白 tsfill 的月份,最后

  • 使用 sum()和滞后运算符以获得最近6个月和24个月的运行总金额。

  • expand the data so to have it monthly,
  • fill in the gap months with tsfill and finally,
  • use sum() and lag operators to get the running sum for the last 6 and 24 months.

有关我借用的一些想法,另请参阅Robert解决方案。

See also Robert solution for some ideas I borrowed.

<转ng>重要:这几乎肯定不是解决问题的有效方法,尤其是在数据量很大(例如我的情况)的情况下。
但是,加号是实际上可以看到后台发生的事情,以确保最终结果是所需的结果。

Important: this is almost surely not an efficient way to solve the issue, especially if the data is large (as in my case). However, the plus is that one actually "sees" what happens in background to make sure the final result is the one desired.

另外,重要的是,解决方案考虑了彼此之间在6(或24)个月内发生2(或更多)事件的情况。

Also, importantly, this solution takes into account cases where 2 (or more) events happen within 6 (or 24) months from each other.

clear all

input persid    start   end t_start t_end   spell_type  year    spell_number    event
    1   8   9   44  45  1   1999    1   0
    1   12  12  60  60  1   2000    1   0
    1   1   1   61  61  1   2001    1   0
    1   7   11  67  71  1   2001    2   0
    1   1   4   85  88  2   2003    1   0
    1   5   7   89  91  1   2003    2   1
    1   8   11  92  95  2   2003    3   0
    1   1   1   97  97  2   2004    1   0
    1   1   3   121 123 1   2006    1   1
    1   4   5   124 125 2   2006    2   0
    1   6   9   126 129 1   2006    3   1
    1   10  11  130 131 2   2006    4   0
    1   12  12  132 132 1   2006    5   1
    1   1   12  157 168 1   2009    1   0
    1   1   12  169 180 1   2010    1   0
    1   1   12  181 192 1   2011    1   0
    1   1   12  193 204 1   2012    1   0
    1   1   12  205 216 1   2013    1   0
end

lab define lab_spelltype 1 "employment" 2 "unemployment"
lab val spell_type lab_spelltype
list

* generate Stata monthly dates
gen spell_start = ym(year,start)
gen spell_end = ym(year,end)
format %tm spell_start spell_end
list

* expand to monthly data
gen n = spell_end - spell_start + 1
expand n, gen(expanded)
sort persid year spell_number (expanded)
bysort persid year spell_number: gen month = spell_start + _n - 1
by persid year spell_number: replace event = 0 if _n > 1
format %tm month

* xtset, fill months gaps with "empty" rows, use lags and cumsum to count past months in employment
xtset persid month, monthly // %tm format
tsfill
bysort persid (month): gen cumsum = sum(spell_type) if spell_type==1
bysort persid (month): replace cumsum = cumsum[_n-1] if cumsum==.
bysort persid (month): gen m6  = cumsum-1 - L7.cumsum if event==1  // "-1" otherwise it sums also current empl month
bysort persid (month): gen m24 = cumsum-1 - L25.cumsum if event==1
drop if event==.
list persid start end year m* if event

这篇关于拼写数据管理:过去24个月在给定状态下花费的月份的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆