Stata嵌套的foreach循环子字符串比较 [英] Stata Nested foreach loop substring comparison

查看:838
本文介绍了Stata嵌套的foreach循环子字符串比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚开始学习Stata,但现在很难. 我的问题是:我有两个不同的变量ATCA,其中A可能是ATC的子字符串. 现在,我想用OK = 1标记其中AATC的子字符串的所有观察结果.

I have just started learning Stata and I'm having a hard time. My problem is this: I have two different variables, ATC and A, where A is potentially a substring of ATC. Now I want to mark all the observations in which A is a substring of ATC with OK = 1.

我使用一个简单的嵌套循环尝试了此操作:

I tried this using a simple nested loop:

foreach x in ATC {
foreach j in A {
        replace OK = 1 if strpos(`x',`j')!=0
    }
}

但是,每当我运行此循环时,即使应该进行很多操作,也不会进行任何更改. 我觉得我可能应该给一个索引,以指定要更改的OK(属于ATC/x的那个),但是我不知道如何执行此操作.这可能真的很简单,但是我已经为此苦苦挣扎了一段时间.

However, whenever I run this loop no changes are being made even though there should be plenty. I feel like I should probably give an index specifying which OK is being changed (the one belonging to the ATC/x), but I have no idea how to do this. This is probably really simple but I've been struggling with it for some time.

我应该澄清一下:我的A列表与主列表是分开的(仅附加到主列表中),并且仅包含用于标识所需的ATC的唯一键.因此,我有约120个A键和几百万个ATC键.我想做的是遍历每个A键的每个ATC键,并用合格的A标记这些ATC键.

I should have clarified: my A list is separate from the main list (simply appended to it) and only contains unique keys which I use to identify the ATCs which I want. So I have ~120 A-keys and a couple million ATC keys. What I wanted to do was iterate over every ATC key for every single A-key and mark those ATC-keys with A that qualify.

这意味着我没有(ATCAOK)的完整元组,而是单独的不同大小的列表. 例如:我有

That means I don't have complete tuples of (ATC,A,OK) but instead separate lists of different sizes. For example: I have

ATC    OK  A 
ABCD   0   .
EFGH   0   .
...   ...  ...
.     .    AB
.     .    ET

,并希望将具有OK"ABCD"标记为1,而将"EFGH"保留在0上的结果.

and want the result that "ABCD" having OK is marked as 1 while "EFGH" remains at 0.

推荐答案

我们可以将您的问题分为两部分.您的标题暗示循环存在问题,但循环与

We can separate your question into two parts. Your title implies a problem with loops, but your loops are just equivalent to

  replace OK = 1 if strpos(ATC, A)!=0

因此循环的使用似乎无关紧要.剩下的子字符串比较.

so the use of looping appears irrelevant. That leaves the substring comparison.

让我们举个例子:

. set obs 3 
obs was 0, now 3

. gen OK = 0 

. gen A = cond(_n == 1, "42", "something else")  

. gen ATC = "answer is 42"

. replace OK = 1 if strpos(ATC, A) != 0 
(1 real change made)

. list 

     +------------------------------------+
    | OK                A            ATC |
    |------------------------------------|
 1. |  1               42   answer is 42 |
 2. |  0   something else   answer is 42 |
 3. |  0   something else   answer is 42 |
    +------------------------------------+

所以工作正常;如果您认为自己有所不同,则确实需要举一个可重复的示例.

So it works fine; and you really need to give a reproducible example if you think you have something different.

关于指定应在何处更改变量:如上例所示,您的代码正是这样做的.

As for specifying where the variable should be changed: your code does precisely that, as again the example above shows.

此更新使问题明确.当您指定所提供的语法时,Stata仅会在相同观察值中查找匹配的子字符串. Stata中的变量是数据集中的字段.要遍历一组值,应满足以下条件

The update makes the problem clear. Stata will only look in the same observation for a matching substring when you specify the syntax you gave. A variable in Stata is a field in a dataset. To cycle over a set of values, something like this should suffice

 gen byte OK = 0 
 levelsof A, local(Avals) 

 quietly foreach A of local Avals { 
     replace OK = 1 if strpos(ATC, `"`A'"') > 0 
 } 

注意:

  1. 指定byte会减少存储量.

您可能需要对levelsof进行ifin限制.

You may need an if or in restriction on levelsof.

quietly剪切有关更改值的消息.调试时,通常最好将其排除在外.

quietly cuts out messages about changed values. When debugging, it is often better left out.

> 0,因为在逻辑比较中自动将来自strpos()的肯定结果视为true.请参阅此常见问题解答.

> 0 could be omitted as a positive result from strpos() is automatically treated as true in logical comparisons. See this FAQ.

这篇关于Stata嵌套的foreach循环子字符串比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆