如何查看组中的所有值是否唯一/识别那些不是 [英] How to see if all values within group are unique/identify those that aren't

查看:28
本文介绍了如何查看组中的所有值是否唯一/识别那些不是的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有这样的数据:

group value
1     fox
1     fox
1     fox
2     dog
2     cat
3     frog
3     frog
4     dog
4     dog

我希望能够判断 value 的所有值在 group 中是否相同.另一种查看方式是,我是否可以创建一个新变量,其中包含组内 value 的所有唯一值,如下所示:

I want to be able to tell if all values of value are the same within group. Another way to see this is if I could create a new variable that contains all unique values of value within group like the following:

group value all_values
1     fox    fox
1     fox    fox
1     fox    fox
2     dog    dog cat
2     cat    dog cat
3     frog   frog
3     frog   frog
4     dog    dog
4     dog    dog

正如我们所见,除了组 2 之外的所有组都只有一个不同的 value 条目.

As we see, all groups except group 2 have only one distinct entry for value.

我认为可以做类似事情(但没有那么好)的一种方法是执行以下操作:

One way I thought that a similar thing (but not as good) could be done is to do the following:

bys group: egen tag = tag(value)
bys group: egen sum = sum(tag)

然后根据 sum 的值,我可以确定是否有多个条目.

And then based on the value of sum I could determine if there were more than one entry.

但是,egen 标记不适用于 bysort.有没有其他有效的方式来获取我需要的信息?

However, egen tag does not work with bysort. Is there any other efficient way to get the information I need?

推荐答案

有几种方法可以做到这一点.一种是:

There are several ways to do this. One is:

clear
set more off

input ///
group str5 value
1     fox
1     fox
1     fox
2     dog
2     cat
3     frog
3     frog
4     dog
4     dog
end

*-----

bysort group (value) : gen onevalue = value[1] == value[_N]

list, sepby(group)

假设你有遗漏,但想忽略它们(不是drop它们);然后以下工作:

Suppose you have missings, but want to ignore them (not drop them); then the following works:

clear
set more off

input ///
group str5 value
1     fox
1     fox
1     fox
2     dog
2     cat
3     frog
3     frog
4     dog
4     dog
5     ox
5     ox
5     
6     cow
6     goat
6      
end

*-----

encode value, gen(value2)

bysort group (value2) : replace value2 = value2[_n-1] if missing(value2)
by group: gen onevalue = value2[1] == value2[_N]

list, sepby(group)

另请参阅此常见问题解答,其技术与您的原始策略相似.

See also this FAQ, which has technique that resembles your original strategy.

这篇关于如何查看组中的所有值是否唯一/识别那些不是的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆