R - 从String获取数据框的列 [英] R - Getting Column of Dataframe from String

查看:164
本文介绍了R - 从String获取数据框的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个功能,允许在运行回归分析之前将数据框的选定列转换为分类数据类型(因子)。



问题是如何使用字符串(字符)从数据框中切割特定列。



示例:

  strColumnNames<  - 承认,排名
strDelimiter< - ,
strSplittedColumnNames< - strsplit(strColumnNames,strDelimiter)
for strColName在strSplittedColumnNames [[1]]){
dfData $ as.name(strColName)< - factor(dfData $ get(strColName))
}

尝试:

  dfData $ as.name )
dfData $ get(as.name())
dfData $ get()

错误消息:
错误:尝试应用非功能



任何帮助将不胜感激!谢谢!!!

解决方案

您需要更改

  dfData $ as.name(strColName)<  -  factor(dfData $ get(strColName))

to

  dfData [[strColName]]<  -  factor(dfData [[strColName]]) 

您可以阅读?[[更多。



在你的情况下,列名称是编程生成的, [[]] 是唯一的方法走。也许这个例子将足够清楚,以说明 $ 的问题:

  dat<  -  data.frame(x = 1:5,y = 2:6)
z < - x

dat $ z
#[1] NULL

dat [[z]]
#[1] 1 2 3 4 5






关于其他答案



因为您应用的功能是 as.factor 因素 apply 始终适用于矩阵(如果您提供一个数据框,它将首先将其转换为矩阵)并返回一个矩阵,而您不能将矩阵中的因子数据类。考虑这个例子:

  x<  -  data.frame(x1 = letters [1:4],x2 = LETTERS [1 :4],x3 = 1:4,stringsAsFactors = FALSE)
x [,1:2]< - apply(x [,1:2],2,as.factor)

str(x)
#'data.frame':4 obs。的3个变量:
#$ x1:chrabcd
#$ x2:chrABCD
#$ x3:int 1 2 3 4

请注意,您仍然具有字符变量而不是因子。如我所说,我们必须使用 lapply

  x [1 :2]<  -  lapply(x [1:2],as.factor)

str(x)
#'data.frame':4 obs。 3个变量:
#$ x1:因子w / 4级别a,b,c,d:1 2 3 4
#$ x2:因子w / 4级A,B,C,D:1 2 3 4
#$ x3:int 1 2 3 4

现在我们在 x1 x2 中看到因子类。 / p>

对于数据框使用应用从来不是一个好主意。如果您阅读源代码 apply

  dl<  -  length X))
if(is.object(X))
X< - if(dl == 2L)
as.matrix(X)
else as.array X)

您将看到一个数据框(其具有二维)将被强制转换为矩阵。这很慢如果您的数据帧列具有多个不同的类,则生成的矩阵将只有1个类。谁知道这种胁迫的结果是什么。



然而应用是用R不是C写的, 循环

  for(i in 1L:d2){ 
tmp< - forceAndCall(1,FUN,newX [,i],...)
if(!is.null(tmp))
ans [[i]]& - tmp

所以它不比 $ c>循环你自己写。


I am trying to create a function that allows the conversion of selected columns of a data frame to categorical data type (factor) before running a regression analysis.

Question is how do I slice a particular column from a data frame using a string (character).

Example:

  strColumnNames <- "Admit,Rank"
  strDelimiter <- ","
  strSplittedColumnNames <- strsplit(strColumnNames, strDelimiter)
  for( strColName in strSplittedColumnNames[[1]] ){
    dfData$as.name(strColName) <- factor(dfData$get(strColName))
  }

Tried:

dfData$as.name()
dfData$get(as.name())
dfData$get()

Error Msg: Error: attempt to apply non-function

Any help would be greatly appreciated! Thank you!!!

解决方案

You need to change

dfData$as.name(strColName) <- factor(dfData$get(strColName))

to

dfData[[strColName]] <- factor(dfData[[strColName]])

You may read ?"[[" for more.

In your case, column names are generated programmingly, [[]] is the only way to go. Maybe this example will be clear enough to illustrate the problem of $:

dat <- data.frame(x = 1:5, y = 2:6)
z <- "x"

dat$z
# [1] NULL

dat[[z]]
# [1] 1 2 3 4 5


Regarding the other answer

apply definitely does not work, because the function you apply is as.factor or factor. apply always works on a matrix (if you feed it a data frame, it will convert it into a matrix first) and returns a matrix, while you can't have factor data class in matrix. Consider this example:

x <- data.frame(x1 = letters[1:4], x2 = LETTERS[1:4], x3 = 1:4, stringsAsFactors = FALSE)
x[, 1:2] <- apply(x[, 1:2], 2, as.factor)

str(x)
#'data.frame':  4 obs. of  3 variables:
# $ x1: chr  "a" "b" "c" "d"
# $ x2: chr  "A" "B" "C" "D"
# $ x3: int  1 2 3 4

Note, you still have character variable rather than factor. As I said, we have to use lapply:

x[1:2] <- lapply(x[1:2], as.factor)

str(x)
#'data.frame':  4 obs. of  3 variables:
# $ x1: Factor w/ 4 levels "a","b","c","d": 1 2 3 4
# $ x2: Factor w/ 4 levels "A","B","C","D": 1 2 3 4
# $ x3: int  1 2 3 4

Now we see the factor class in x1 and x2.

Using apply for a data frame is never a good idea. If you read the source code of apply:

    dl <- length(dim(X))
    if (is.object(X)) 
    X <- if (dl == 2L) 
        as.matrix(X)
    else as.array(X)

You see that a data frame (which has 2 dimension) will be coerced to matrix first. This is very slow. If your data frame columns have multiple different class, the resulting matrix will have only 1 class. Who knows what the result of such coercion would be.

Yet apply is written in R not C, with an ordinary for loop:

 for (i in 1L:d2) {
        tmp <- forceAndCall(1, FUN, newX[, i], ...)
        if (!is.null(tmp)) 
            ans[[i]] <- tmp

so it is no better than an explicit for loop you write yourself.

这篇关于R - 从String获取数据框的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆