Stata嵌套的foreach循环子字符串比较 [英] Stata Nested foreach loop substring comparison
问题描述
我刚刚开始学习Stata,但现在很难.
我的问题是:我有两个不同的变量ATC
和A
,其中A
可能是ATC
的子字符串.
现在,我想用OK = 1
标记其中A
是ATC
的子字符串的所有观察结果.
I have just started learning Stata and I'm having a hard time.
My problem is this: I have two different variables, ATC
and A
, where A
is potentially a substring of ATC
.
Now I want to mark all the observations in which A
is a substring of ATC
with OK = 1
.
我使用一个简单的嵌套循环尝试了此操作:
I tried this using a simple nested loop:
foreach x in ATC {
foreach j in A {
replace OK = 1 if strpos(`x',`j')!=0
}
}
但是,每当我运行此循环时,即使应该进行很多操作,也不会进行任何更改.
我觉得我可能应该给一个索引,以指定要更改的OK
(属于ATC
/x的那个),但是我不知道如何执行此操作.这可能真的很简单,但是我已经为此苦苦挣扎了一段时间.
However, whenever I run this loop no changes are being made even though there should be plenty.
I feel like I should probably give an index specifying which OK
is being changed (the one belonging to the ATC
/x), but I have no idea how to do this. This is probably really simple but I've been struggling with it for some time.
我应该澄清一下:我的A
列表与主列表是分开的(仅附加到主列表中),并且仅包含用于标识所需的ATC
的唯一键.因此,我有约120个A
键和几百万个ATC
键.我想做的是遍历每个A
键的每个ATC
键,并用合格的A
标记这些ATC
键.
I should have clarified: my A
list is separate from the main list (simply appended to it) and only contains unique keys which I use to identify the ATC
s which I want. So I have ~120 A
-keys and a couple million ATC
keys. What I wanted to do was iterate over every ATC
key for every single A
-key and mark those ATC
-keys with A
that qualify.
这意味着我没有(ATC
,A
,OK
)的完整元组,而是单独的不同大小的列表.
例如:我有
That means I don't have complete tuples of (ATC
,A
,OK
) but instead separate lists of different sizes.
For example: I have
ATC OK A
ABCD 0 .
EFGH 0 .
... ... ...
. . AB
. . ET
,并希望将具有OK
的"ABCD"
标记为1
,而将"EFGH"
保留在0
上的结果.
and want the result that "ABCD"
having OK
is marked as 1
while "EFGH"
remains at 0
.
推荐答案
我们可以将您的问题分为两部分.您的标题暗示循环存在问题,但循环与
We can separate your question into two parts. Your title implies a problem with loops, but your loops are just equivalent to
replace OK = 1 if strpos(ATC, A)!=0
因此循环的使用似乎无关紧要.剩下的子字符串比较.
so the use of looping appears irrelevant. That leaves the substring comparison.
让我们举个例子:
. set obs 3
obs was 0, now 3
. gen OK = 0
. gen A = cond(_n == 1, "42", "something else")
. gen ATC = "answer is 42"
. replace OK = 1 if strpos(ATC, A) != 0
(1 real change made)
. list
+------------------------------------+
| OK A ATC |
|------------------------------------|
1. | 1 42 answer is 42 |
2. | 0 something else answer is 42 |
3. | 0 something else answer is 42 |
+------------------------------------+
所以工作正常;如果您认为自己有所不同,则确实需要举一个可重复的示例.
So it works fine; and you really need to give a reproducible example if you think you have something different.
关于指定应在何处更改变量:如上例所示,您的代码正是这样做的.
As for specifying where the variable should be changed: your code does precisely that, as again the example above shows.
此更新使问题明确.当您指定所提供的语法时,Stata仅会在相同观察值中查找匹配的子字符串. Stata中的变量是数据集中的字段.要遍历一组值,应满足以下条件
The update makes the problem clear. Stata will only look in the same observation for a matching substring when you specify the syntax you gave. A variable in Stata is a field in a dataset. To cycle over a set of values, something like this should suffice
gen byte OK = 0
levelsof A, local(Avals)
quietly foreach A of local Avals {
replace OK = 1 if strpos(ATC, `"`A'"') > 0
}
注意:
-
指定
byte
会减少存储量.
您可能需要对levelsof
进行if
或in
限制.
You may need an if
or in
restriction on levelsof
.
quietly
剪切有关更改值的消息.调试时,通常最好将其排除在外.
quietly
cuts out messages about changed values. When debugging, it is often better left out.
> 0
,因为在逻辑比较中自动将来自strpos()
的肯定结果视为true.请参阅此常见问题解答.
> 0
could be omitted as a positive result from strpos()
is automatically treated as true in logical comparisons. See this FAQ.
这篇关于Stata嵌套的foreach循环子字符串比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!