子设定面板数据 [英] Sub setting panel data

查看:108
本文介绍了子设定面板数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

非常新,所以让我知道这是否要求太多. 我正在尝试将R中的面板数据细分为两个不同的类别.一种具有完整的变量信息,另一种具有不完整的变量信息.我的数据如下:

Very new, so let me know if this is asking too much. I am trying to sub set panel data, in R, into two different categories; one that has complete information for variables and one that has incomplete information for variables. My data looks like this:

Person     Year Income Age Sex
    1      2003  1500   15  1
    1      2004  1700   16  1
    1      2005  2000   17  1
    2      2003  1400   25  0
    2      2004  1900   26  0
    2      2005  2000   27  0

我需要做的是遍历每一列(而不是第一列和第二列),如果变量的数据已满(变量由第一列中的id定义,然后由列名定义,如上图所示)例如person1Income)将其返回到数据集.否则将其放在不同的数据集中.这是我的元代码,以及给出上述数据后应执行的操作的示例.注意:我先按变量的ID名称然后按列名称调用变量,例如变量person1Income将是第三列的前三行.

What I need to do is go through each column ( not columns 1 and 2 ) and if the data is full for the variable ( variables are defined by the id in the first column and then the column name, in the picture above an example is person1Income) return that to a data set. Else put it in a different data set. Here is my meta code and an example of what it should do given the above data. Note: I call variables by their id name then the column name, for instance the variable person1Income would be the first three rows in column three.

for(each variable in all columns except 1 and 2 in data set) if (variable = FULL) { return to data set "completes" }
else {put in data set "incompletes"}
completes = person1Income, person2Income, person1Age, person2Age, person1Sex, person2 sex
incompletes = {empty because the above info is full}

我了解是否有人不能完全回答这个问题,但我们会提供任何帮助.另外,如果我的目标不清楚,请告诉我,我会尽力澄清.

I understand if someone can't answer this question completely, but any help is appreciated. Also if my goal is not clear, let me know and I will try to clarify.

tl; dr我还不能用一句话来解释它,抱歉.

tl;dr I can't yet explain it in one sentence so...sorry.

可视化完整变量和不完整变量的含义. 屏幕截图

visualization of what I mean by complete and incomplete variables. screenshot

推荐答案

使用图片,这是您想要的工具.可能会花很多时间,而其他人可能会有更优雅的方法,但是它可以完成工作:

Using your picture, here's a stab at what you want. It may be long-winded and others may have a more elegant way of doing it, but it gets the job done:

library("reshape2")

con <- textConnection("Person Year Income Age Sex
  1      2003  1500   15  1
  1      2004  1700   16  1
  1      2005  2000   17  1
  2      2003  1400   25  0
  2      2004  1900   NA  0
  2      2005  2000   27  0
  3      2003  NA   25  0
  3      2004  1900   NA  0
  3      2005  2000   27  0")
pnls <- read.table(con, header=TRUE)

# reformat table for easier processing
pnls2 <- melt(pnls, id=c("Person"))
# and select those rows that relate to values
# of income and age
pnls2 <- subset(pnls2,
              variable == "Income" | variable == "Age")

# create column of names in desired format (e.g Person1Age etc)
pnls2$name <- paste("Person", pnls2$Person, pnls2$variable, sep="")

# Collect full set of unique names
name.set <- unique(pnls2$name)
# find the incomplete set
incomplete <- unique( pnls2$name[ is.na(pnls2$value) ]) 
# then find the complement of the incomplete set
complete <- setdiff(name.set, incomplete) 

# These two now contain list of complete and incomplete variables
complete
incomplete

如果您不熟悉melt ing和reshape2软件包,则可能需要逐行运行它,并在不同阶段检查pnls2的值以了解其工作原理.

If you are not familiar with melting and the reshape2 package, you may want to run it line by line, and examine the value of pnls2 at different stages to see how this works.

编辑:根据@bstockton的要求添加代码以编译值.我敢肯定,有一个更合适的R习惯用法可以做到这一点,但是再一次,在没有更好的答案的情况下:这行得通

EDIT: adding code to compile the values as requested by @bstockton. I am sure there is a much more appropriate R idiom to do this, but once again, in the absence of better answers: this works

# use these lists of complete and incomplete variable names
# as keys to collect lists of values for each variable name
compile <- function(keys) {
    holder = list()
    for (n in keys) {
        holder[[ n ]] <- subset(pnls2, pnls2$name == n)[,3]
    }
    return( as.data.frame(holder) )
}

complete.recs <- compile(complete)
incomplete.recs <- compile(incomplete)

这篇关于子设定面板数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆