Stata:foreach创建了太多的变量 - [英] Stata: foreach creates too many variables -

查看:635
本文介绍了Stata:foreach创建了太多的变量 - 的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在下面创建了一个我的代码的玩具示例。
在这个玩具的例子中,我想创建一个所有更高的价格减去在一个自创的参考组内的较低的价格的措施。因此,在每个参照组中,我想把每个个体从同一组中其他个体的所有较高价格值中减去其价格值。我不想有消极的分歧。那么我想总结一下这些差异。在创建这个代码时,我在这里找到了一些帮助:
http://www.stata.com/support/faqs/data-management/try-all-values-with-foreach/



然而,代码对我来说并不完美,因为我的数据集非常大(几十万),网站上的例子和我的代码只能工作到Stata中的最大数量为1600。 (我正在使用版本12)。自动数据集的玩具示例,由于数据集的小尺寸,可以工作。

我想问一下,如果有人有一个想法如何更有效地编码,我可以解决数字列表限制。我想直接汇总差异而不将它们保存在中间变量中,但是这也会打乱数字列表的限制。

 清除全部
sysuse auto
$ b $ ren headroom refgroup

bysort refgroup:egen pricerank = rank(price)
qui:su pricerank,meanonly
gen test =`r(max)'
su test
foreach i num 1 / 'r(max)'{
qui:bys refgroup:gen intermediate'i'= price [_n +`i'] - price if price [_n +`i']>价格

egen price_diff = rowmax(intermediate *)
下降中间*


解决方案

如果我正确理解这个问题,甚至不需要显式循环的问题。所有较高价格的总和就是两个累计和之差。如果价格被捆绑,你可能需要考虑你想做什么。

 
。清除

。设置obs 10
obs为0,现在为10

。 gen group = _n> 5

。设置种子2803

。 gen price = ceil(1000 * runiform())

。 bysort group(price):gen sumhigherprices = sum(price)

。按组别取代:sumhigherprices = sumhigherprices [_N] - sumhigherprices
(10次实际更改)

。列表

+ -------------------------- +
|团体价格sumhig〜s |
| -------------------------- |
1. | 0 218 1448 |
2. | 0 264 1184 |
3. | 0 301 883 |
4. | 0 335 548 |
5. | 0 548 0 |
| -------------------------- |
6. | 1 125 3027 |
7. | 1 213 2814 |
8. | 1 828 1986 |
9. | 1 988 998 |
10. | 1 998 0 |
+ -------------------------- +

编辑:对于OP所需要的,有一个额外的行

 。按组别取代:sumhigherprices = sumhigherprices  - (_N  -  _n)* price 


I created a toy example of my code below. In this toy example I would like to create a measure of all higher prices minus lower prices within a self-created reference group. So within each reference group, I would like to take each individual and subtract its price value from all higher price values from other individuals in the same group. I do not want to have negative differences. Then I would like to sum all these differences. In creating this code I found some help here: http://www.stata.com/support/faqs/data-management/try-all-values-with-foreach/

However, the code didn't work perfectly for me, because my dataset is quite large (several 100K obs) and the examples on the website and my code only work until the numlist maximum of 1600 in Stata. (I am using version 12). The toy example with the auto dataset works, due to small size of the dataset.

I would like to ask if someone has an idea how to code this more efficiently, so that I can get around the numlist restriction. I thought about summing the differences directly without saving them in intermediate variables, but that also blow up the numlist restriction.

clear all
sysuse auto

ren headroom refgroup

bysort refgroup : egen pricerank = rank(price)
qui: su pricerank, meanonly
gen test = `r(max)'
su test
 foreach i of num 1/`r(max)' {
 qui: bys refgroup: gen intermediate`i' = price[_n+`i'] -price if price[_n+`i'] > price
  }
egen price_diff = rowmax(intermediate*)
drop intermediate*

解决方案

If I understand this correctly, this isn't even a problem that requires explicit loops. The sum of all higher prices is just the difference between two cumulative sums. You might need to think through what you want to do if prices are tied.

. clear

. set obs 10 
obs was 0, now 10

. gen group = _n > 5 

. set seed 2803

. gen price = ceil(1000 * runiform()) 

. bysort group (price) : gen sumhigherprices = sum(price) 

. by group : replace sumhigherprices = sumhigherprices[_N] - sumhigherprices 
(10 real changes made)

. list 

     +--------------------------+
     | group   price   sumhig~s |
     |--------------------------|
  1. |     0     218       1448 |
  2. |     0     264       1184 |
  3. |     0     301        883 |
  4. |     0     335        548 |
  5. |     0     548          0 |
     |--------------------------|
  6. |     1     125       3027 |
  7. |     1     213       2814 |
  8. |     1     828       1986 |
  9. |     1     988        998 |
 10. |     1     998          0 |
     +--------------------------+

Edit: For what the OP needs, there is an extra line

. by group : replace sumhigherprices = sumhigherprices - (_N - _n) * price 

这篇关于Stata:foreach创建了太多的变量 - 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆