如何在数据集列表中查找公共变量&在R中重塑它们? [英] How to find common variables in a list of datasets & reshape them in R?

查看:51
本文介绍了如何在数据集列表中查找公共变量&在R中重塑它们?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

    setwd("C:\\Users\\DATA")
    temp = list.files(pattern="*.dta")
    for (i in 1:length(temp)) assign(temp[i], read.dta13(temp[i], nonint.factors = TRUE))
    grep(pattern="_m", temp, value=TRUE)

在这里,我创建了一个数据集列表,并将其读入R,然后尝试使用grep来查找所有带有模式_m的变量名,显然这是行不通的,因为这只会返回所有带有模式的文件名_m.因此,本质上我想要的是我的代码循环遍历数据库列表,查找以_m结尾的变量,并返回包含这些变量的数据库列表.

Here I create a list of my datasets and read them into R, I then attempt to use grep in order to find all variable names with pattern _m, obviously this doesn't work because this simply returns all filenames with pattern _m. So essentially what I want, is my code to loop through the list of databases, find variables ending with _m, and return a list of databases that contain these variables.

现在我不确定如何执行此操作,我对编码和R还是很陌生.

Now I'm quite unsure how to do this, I'm quite new to coding and R.

除了需要知道这些变量在哪个数据库中之外,我还需要能够对这些变量进行更改(重塑).

Apart from needing to know in which databases these variables are, I also need to be able to make changes (reshape them) to these variables.

推荐答案

首先,assign将无法正常工作,因为它需要一个字符串(或字符,因为它们在R中被调用).它将使用第一个元素作为变量(请参见此处以获取更多信息).

First, assign will not work as you think, because it expects a string (or character, as they are called in R). It will use the first element as the variable (see here for more info).

您可以做什么取决于数据的结构. read.dta13会将每个文件作为data.frame加载.

What you can do depends on the structure of your data. read.dta13 will load each file as a data.frame.

如果您查找列名,则可以执行以下操作:

If you look for column names, you can do something like that:

myList <- character()
for (i in 1:length(temp)) {

    # save the content of your file in a data frame
    df <- read.dta13(temp[i], nonint.factors = TRUE))

    # identify the names of the columns matching your pattern
    varMatch <- grep(pattern="_m", colnames(df), value=TRUE)

    # check if at least one of the columns match the pattern
    if (length(varMatch)) {
        myList <- c(myList, temp[i]) # save the name if match
    }

}

如果您要查找列的内容,则可以查看dplyr程序包,它在处理数据帧时非常有用.

If you look for the content of a column, you can have a look at the dplyr package, which is very useful when it comes to data frames manipulation.

在小插图包中提供了有关dplyr的良好介绍此处.

A good introduction to dplyr is available in the package vignette here.

请注意,在R中,附加到向量可能会变得非常慢(请参阅此 SO问题以获取更多详细信息).

Note that in R, appending to a vector can become very slow (see this SO question for more details).

这篇关于如何在数据集列表中查找公共变量&amp;在R中重塑它们?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆