塔塔:比较两个数据集拖放不同的变量 [英] Stata: compare two datasets and drop different variables

查看:165
本文介绍了塔塔:比较两个数据集拖放不同的变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个大的数据集(1000多变量中的每个),其中一个具有第二的所有变量,加上附加的变量。我想获得所有这些额外的变量的列表,然后把它们和一个数据集追加到另一个。我曾尝试命令 dta_equal ,但是来到这里发现了同样的问题:的 http://www.stata.com/statalist/archive/2011-08/msg00308.html

I have two large datasets (more than 1000 variables in each), one of which has all the variables of the second, plus additional variables. I would like to get a list of all these additional variables, and then drop them and append one dataset to another. I have tried the command dta_equal, but got the same problem found here: http://www.stata.com/statalist/archive/2011-08/msg00308.html

我猜追加,保持()不能明白我想直接做的,也就是说,不能追加数据,同时降额外的变量,因为我必须手动输入参数的一个通过一个在保持()选项,给我的大数据集这是不现实的。

I guess append, keep() cannot realize what I want to do directly, i.e., cannot append dataset while drop additional variables since I have to manually type in variables one by one in the keep() option, which is not realistic given my large dataset.

是否有处理这个任意方式?

Are there any ways to deal with this?

推荐答案

有几个Stata的命令,可能是有用的在这里。

There are several Stata commands that can be useful here.

UNAB 命令在第一实施例中使用,以使在用较​​少的变量的数据集的变量的列表。第二个和第三个例子使用描述命令来获得数据集中的变量列表不是当前在内存中。

The unab command is used in the first example to make a list of variable in the dataset with fewer variables. The second and third example use the describe command to obtain the list of variables in a dataset not currently in memory.

最后部分的示例演示了如何使用扩展宏列表功能,以获得共同的变量列表和组变量不常见的两个数据集。

The final part the the example shows how to use extended macro list functions to obtain a list of common variables and the set of variables not common to both datasets.

* simulate 2 datasets, one has more variables than the other
sysuse auto, clear
save "data1.dta", replace
gen x = _n
gen y = -_n
save "data2.dta", replace

* example 1: drop after append
use "data1.dta", clear
unab vcommon : *
gen source = 1
append using "data2.dta"
replace source = 2 if mi(source)
keep `vcommon' source

* example 2: drop first then append
clear
describe using "data1.dta", varlist short
local vcommon `r(varlist)'
use `vcommon' using "data2.dta", clear
gen source = 2
append using "data1.dta"
replace source = 1 if mi(source)

* example 3: append and keep on the fly
use "data1.dta", clear
unab vcommon : *
gen source = 1
append using "data2.dta", keep(`vcommon')
replace source = 2 if mi(source)

* use extended macro list functions to manipulate variable list
clear
describe using "data1.dta", varlist short
local vlist1 `r(varlist)'
describe using "data2.dta", varlist short
local vlist2 `r(varlist)'
local vcommon : list vlist1 & vlist2
local vinonly1 : list vlist1 - vlist2
local vinonly2 : list vlist2 - vlist1
dis "common variables = `vcommon'"
dis "variables in data1 not found in data2 = `vinonly1'"
dis "variables in data2 not found in data1 = `vinonly2'"

这篇关于塔塔:比较两个数据集拖放不同的变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆