将函数应用于数据框中的每一列,观察每一列现有的数据类型 [英] Apply function to each column in a data frame observing each columns existing data type

查看:119
本文介绍了将函数应用于数据框中的每一列,观察每一列现有的数据类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为了解我的数据的一部分,我试图获取大型数据框中每一列的最小值/最大值.我的第一次尝试是:

I'm trying to get the min/max for each column in a large data frame, as part of getting to know my data. My first try was:

apply(t,2,max,na.rm=1)

它将所有内容都视为字符向量,因为前几列是字符类型.因此,某些数字列的最大值显示为" -99.5".

It treats everything as a character vector, because the first few columns are character types. So max of some of the numeric columns is coming out as " -99.5".

然后我尝试了这个:

sapply(t,max,na.rm=1)

,但它抱怨 max对因素没有意义. (lapply是相同的.)让我感到困惑的是,apply认为max对于因素(例如因素)是完全有意义的.它为第1列返回了"ZEBRA".

but it complains about max not meaningful for factors. (lapply is the same.) What is confusing me is that apply thought max was perfectly meaningful for factors, e.g. it returned "ZEBRA" for column 1.

顺便说一句,我看了在POSIXct的向量上使用sapply ,答案之一是"当您使用sapply时,您的对象将被强制转换为数字,... ".这是发生在我身上的事吗?如果是这样,是否存在不强制的替代Apply函数?当然,这是普遍需要的,因为数据帧类型的关键特征之一是每一列都可以是不同的类型.

BTW, I took a look at Using sapply on vector of POSIXct and one of the answers says "When you use sapply, your objects are coerced to numeric,...". Is this what is happening to me? If so, is there an alternative apply function that does not coerce? Surely it is a common need, as one of the key features of the data frame type is that each column can be a different type.

推荐答案

如果这是一个有序因素",则情况将有所不同.这并不是说我喜欢有序因素",我不是说只是为有序因素"定义了一些关系,这些关系没有为因素"定义.因素被认为是普通的分类变量.您会看到因子的自然排序顺序,这是您所在区域的字母顺序.如果您想为每一列,日期和因子以及所有要素自动强制转换为数字",请尝试:

If it were an "ordered factor" things would be different. Which is not to say I like "ordered factors", I don't, only to say that some relationships are defined for 'ordered factors' that are not defined for "factors". Factors are thought of as ordinary categorical variables. You are seeing the natural sort order of factors which is alphabetical lexical order for your locale. If you want to get an automatic coercion to "numeric" for every column, ... dates and factors and all, then try:

sapply(df, function(x) max(as.numeric(x)) )   # not generally a useful result

或者,如果您要先测试因素并按预期返回,则:

Or if you want to test for factors first and return as you expect then:

sapply( df, function(x) if("factor" %in% class(x) ) { 
            max(as.numeric(as.character(x)))
            } else { max(x) } )

@Darrens的评论确实效果更好:

@Darrens comment does work better:

 sapply(df, function(x) max(as.character(x)) )  

max确实可以使用字符向量.

max does succeed with character vectors.

这篇关于将函数应用于数据框中的每一列,观察每一列现有的数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆