使用vlookup在另一个变量中包含多个条件来填充新变量 [英] Populating new variable using vlookup with multiple criteria in another variable

查看:63
本文介绍了使用vlookup在另一个变量中包含多个条件来填充新变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

1)应该为变量 sku 中列出的每个唯一观测值创建一个新变量,该变量包含重复的值.

1) A new variable should be created for each unique observation listed in variable sku, which contains repeated values.

2)只要在观察值的 sku 值相同,这些新创建的变量应在商店/周级别上为其自身产品的 price 分配值子类别( subc )作为变量本身.例如,在 eta2,3 中,第3、4和5行中的观测值具有相同的值,因为它们都与sku#3属于同一子类别.[ eta2,3 表示sku 3,subc 2.]

2) These newly created variables should be assigned the value of own product's price at the store/week level, as long as observations' sku value is in the same subcategory (subc) as the variable itself. For example, in eta2,3, observations in line 3, 4, and 5 have the same value because they all belong to the same subcategory as sku #3. [eta2,3 indicates sku 3, subc 2.]

3) x 表示这是当前正在复制的产品/子类别的原始值.

3) x indicates that this is the original value for the product/subcategory that is currently being replicated.

4)如果观察值不属于同一子类别,则应反映为"0".

4) If an observation doesn't belong to the same subcategory, it should reflect "0".

橙色是给定的数据.绿色是步骤1、2和3的值.白色单元格是步骤4.

Orange is the given data. In green are the values from the steps 1, 2, and 3. White cells are step 4.

我无法提供自己的解决方案,因为正在寻找使用现有观察生成变量的方法没有给我结果.

I am unable to offer a solution of my own, as searching for a way to generate a variable using existing observations hasn't given me results.

我还了解它必须是 forvalues foreach levelsof 命令的组合?

I also understand that it must be a combination of forvalues, foreach, and levelsof commands?

clear
input units price   sku week    store   subc
3   4.3 1   1   1   1
2   3   2   1   1   1
1   2.5 3   1   1   2
4   12  5   1   1   2
5   12  6   1   1   3
35  4.3 1   1   2   1
23  3   2   1   2   1
12  2.5 3   1   2   2
35  12  5   1   2   2
35  12  6   1   2   3   
3   20  1   2   1   1
2   30  2   2   1   1
4   40  3   2   2   2
1   50  4   2   2   2
9   10  5   2   2   2
2   90  6   2   2   3
end

UPDATE 根据尼克·考克斯(Nick Cox)的反馈,这是给出我一直在寻找的结果的最终代码:

UPDATE Based on Nick Cox' feedback, this is the final code that gives the result I have been looking for:

clear
input units price   sku week    store   subc
35  4.3 1   1   1   1
23  3   2   1   1   1
12  2.5 3   1   1   2
10  1   4   1   1   2
35  12  5   1   1   2
35  12  6   1   1   3
35  5.3 1   2   1   1
23  4   2   2   1   1
12  3.5 3   2   1   2
10  2   4   2   1   2
35  13  5   2   1   2
35  13  6   2   1   3
end

egen joint = group(subc sku), label 

bysort store week : gen freq = _N
su freq, meanonly 
local jmax = r(max) 
drop freq

tostring subc sku, replace
gen new = subc + "_"+sku 


su joint, meanonly 
forval j = 1/`r(max)'{     
 local J = new[`j'] 
    gen eta`J' = . 
} 

sort  subc week store sku 
egen joint1 = group(subc week store), label 

gen long id = _n 
su joint1, meanonly  

quietly forval i = 1/`r(max)' { 
   su id if joint1 == `i', meanonly
   local jmin = r(min) 
   local jmax = r(max) 

   forval j = `jmin'/`jmax' {  
   local subc = subc[`j'] 
   local sku = sku[`j'] 
   replace eta`subc'_`sku' = price[`j'] in `jmin'/`jmax' 
   replace eta`subc'_`sku' = 0 in `j'/`j'  
   }
}    

推荐答案

我要添加另一个答案,以解决 subc week 的组合.先前的讨论确定,您要尝试执行的操作将为每个观察值添加一个额外的变量.这不是一个好主意!充其量,您可能只拥有许多新变量,大多数为零.最糟糕的是,您将遇到Stata的极限.

I am adding another answer that tackles combinations of subc and week. Previous discussion establishes that what you are trying to do would add an extra variable for every observation. This can't be a good idea! At best, you might just have many new variables, mostly zeros. At worst, you will run into Stata's limits.

因此,我将不支持您进一步努力,但要说明如何生成我在上一个答案中讨论的第二个数据结构.确实,您没有指出(a)为什么要所有这些变量,它们只是重新分配的现有数据;(b)您与他们打交道的策略是什么;(c)为什么 rangestat (SSC)或某些其他程序无法消除首先创建它们的需要.

Hence I won't support your endeavour to go further down the same road, but show how the second data structure I discuss in my previous answer can be produced. Indeed, you haven't indicated (a) why you want all these variables, which are just the existing data redistributed; (b) what your strategy is for dealing with them; (c) why rangestat (SSC) or some other program could not remove the need to create them in the first place.

clear
input units price   sku week    store   subc
35  4.3 1   1   1   1
23  3   2   1   1   1
12  2.5 3   1   1   2
10  1   4   1   1   2
35  12  5   1   1   2
35  12  6   1   1   3
35  5.3 1   2   1   1
23  4   2   2   1   1
12  3.5 3   2   1   2
10  2   4   2   1   2
35  13  5   2   1   2
35  13  6   2   1   3
end

sort subc week sku 
egen joint = group(subc week), label 

bysort joint : gen freq = _N
su freq, meanonly 
local jmax = r(max) 
drop freq

forval j = 1/`jmax' { 
    gen eta`j' = . 
    gen which`j' = . 
} 

gen long id = _n 
su joint, meanonly  

quietly forval i = 1/`r(max)' { 
   su id if joint == `i', meanonly
   local jmin = r(min) 
   local jmax = r(max) 

   local k = 1 
   forval j = `jmin'/`jmax' { 
       replace which`k' = sku[`j'] in `jmin'/`jmax' 
       replace eta`k' = price[`j'] in `jmin'/`jmax' 
       local ++k 
   }
}    

list subc week sku *1 *2 *3 , sepby(subc week)

     +-------------------------------------------------------------------+
     | subc   week   sku   eta1   which1   eta2   which2   eta3   which3 |
     |-------------------------------------------------------------------|
  1. |    1      1     1    4.3        1      3        2      .        . |
  2. |    1      1     2    4.3        1      3        2      .        . |
     |-------------------------------------------------------------------|
  3. |    1      2     1    5.3        1      4        2      .        . |
  4. |    1      2     2    5.3        1      4        2      .        . |
     |-------------------------------------------------------------------|
  5. |    2      1     3    2.5        3      1        4     12        5 |
  6. |    2      1     4    2.5        3      1        4     12        5 |
  7. |    2      1     5    2.5        3      1        4     12        5 |
     |-------------------------------------------------------------------|
  8. |    2      2     3    3.5        3      2        4     13        5 |
  9. |    2      2     4    3.5        3      2        4     13        5 |
 10. |    2      2     5    3.5        3      2        4     13        5 |
     |-------------------------------------------------------------------|
 11. |    3      1     6     12        6      .        .      .        . |
     |-------------------------------------------------------------------|
 12. |    3      2     6     13        6      .        .      .        . |
     +-------------------------------------------------------------------+

这篇关于使用vlookup在另一个变量中包含多个条件来填充新变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆