有什么更快的方法来逐行处理data.frame中的一列? [英] What is a faster way to manipulate one column in a data.frame row by row?
问题描述
我有一个带有数字纬度和经度值的数据框。 data.frame有14K行和40列。
I have a data frame with numeric latitude and longitude values. The data.frame has 14K rows and 40 columns.
我想在数据框中添加一个名为半球的类别列,以便于区分北向和北向。 (纬度> 0)和南部位置(纬度<0)。这就是我的工作:
I'd like to add a category column called "hemisphere" to the data frame in order to easily distinguish between northern (latitude > 0) and southern locations (latitude < 0). This is what I do:
for (r in 1:nrow(myDataFrame)) {
if (myDataFrame[r, "latitude"] > 0) {
myDataFrame[r, "hemisphere"] <- "North"
} else {
myDataFrame[r, "hemisphere"] <- "South"
}
}
运行此代码块大约需要一个在MacBook Pro上运行一分钟,也许是两分钟,比我预期的要长得多。似乎有些东西使它变得非常低效,应该有更好的方法。有任何提示吗?
Running this code block takes about a minute, maybe two, on my MacBook Pro - much longer than I'd expect. It seems as if something makes it very inefficient and there should be a better way. Any hints?
推荐答案
@baptiste的 ifelse
解决方案是加快矢量化的比较速度,但是在这种情况下,一些明智的子集和 sign
的使用可能会更快:
@baptiste's ifelse
solution is the general idiom for speeding up comparisons with vectorisation, but in this case, some judicious subsetting and the use of sign
might be faster:
myDataFrame$hemisphere <- c("South","Equator","North")[sign(myDataFrame$latitude)+2]
这篇关于有什么更快的方法来逐行处理data.frame中的一列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!