使用dplyr :: mutate计算适用于数据框的日出函数? [英] Calculate sunrise function that works with dataframe with dplyr::mutate?

查看:85
本文介绍了使用dplyr :: mutate计算适用于数据框的日出函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在尝试将其应用于数据框以在新列中进行变异时编写的函数遇到了麻烦

I am having trouble with a function I wrote when trying to apply it to a dataframe to mutate in a new column

我想向数据框添加一列会根据纬度,经度和日期的现有列来计算所有行的日出/日落时间。日出/日落计算来自maptools软件包中的 sunriseset函数。

I want to add a column to a dataframe that calculates the sunrise/sunset time for all rows based on existing columns for Latitude, Longitude and Date. The sunrise/sunset calculation is derived from the "sunriseset" function from the maptools package.

下面是我的函数:

library(maptools)
library(tidyverse)

sunrise.set2 <- function (lat, long, date, timezone = "UTC", direction = c("sunrise", "sunset"), num.days = 1) 
{
        lat.long <- matrix(c(long, lat), nrow = 1)
        day <- as.POSIXct(date, tz = timezone)
        sequence <- seq(from = day, length.out = num.days, by = "days")
        sunrise <- sunriset(lat.long, sequence, direction = "sunrise", 
                            POSIXct = TRUE)
        sunset <- sunriset(lat.long, sequence, direction = "sunset", 
                           POSIXct = TRUE)
        ss <- data.frame(sunrise, sunset)
        ss <- ss[, -c(1, 3)]
        colnames(ss) <- c("sunrise", "sunset")

        if (direction == "sunrise") {
                return(ss[1,1])     
        } else {
                return(ss[1,2])
        }       
}

当我为单个输入运行函数时,会得到预期的输出:

When I run the function for a single input I get the expected output:

sunrise.set2(41.2, -73.2, "2018-12-09 07:34:0", timezone="EST", 
    direction = "sunset", num.days = 1)
[1] "2018-12-09 16:23:46 EST"

但是,当我尝试在数据框对象上执行此操作以在新列中进行如下修改时:

However, when I try to do this on a dataframe object to mutate in a new column like so:

df <- df %>% 
    mutate(set = sunrise.set2(Latitude, Longitude, LocalDateTime, timezone="UTC", num.days = 1, direction = "sunset"))

我收到以下错误:

Error in mutate_impl(.data, dots) : 
  Evaluation error: 'from' must be of length 1.

我df的dput在下面。我怀疑为了正确地向量化函数没有正确地做些事情,但是我不确定是什么。

The dput of my df is below. I suspect I'm not doing something right in order to properly vectorize my function but I'm not sure what.

谢谢

dput(df):

structure(list(Latitude = c(20.666, 20.676, 20.686, 20.696, 20.706, 
20.716, 20.726, 20.736, 20.746, 20.756, 20.766, 20.776), Longitude = c(-156.449, 
-156.459, -156.469, -156.479, -156.489, -156.499, -156.509, -156.519, 
-156.529, -156.539, -156.549, -156.559), LocalDateTime = structure(c(1534318440, 
1534404840, 1534491240, 1534577640, 1534664040, 1534750440, 1534836840, 
1534923240, 1535009640, 1535096040, 1535182440, 1535268840), class = c("POSIXct", 
"POSIXt"), tzone = "UTC")), .Names = c("Latitude", "Longitude", 
"LocalDateTime"), row.names = c(NA, -12L), class = c("tbl_df", 
"tbl", "data.frame"), spec = structure(list(cols = structure(list(
    Latitude = structure(list(), class = c("collector_double", 
    "collector")), Longitude = structure(list(), class = c("collector_double", 
    "collector")), LocalDateTime = structure(list(format = "%m/%d/%Y %H:%M"), .Names = "format", class = c("collector_datetime", 
    "collector"))), .Names = c("Latitude", "Longitude", "LocalDateTime"
)), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))


推荐答案

问题确实是您的函数现在没有被向量化,它中断了如果您赋予它多个价值。一种解决方法(如Suliman所建议的)使用 rowwise() apply 的变体,但这将为您提供功能

The problem is indeed that your function as it is now is not vectorized, it breaks if you give it more than one value. A workaround (as Suliman suggested) is using rowwise() or a variant of apply, but that would give your function a lot of unnecessary work.

最好将其向量化,因为 maptools :: sunriset 也被向量化。第一个建议:使用向量作为输入来调试或重写它,然后您很容易看到意外发生的行。让我们逐行处理它,我对您用其他替换它的行的注释过高:

So better to make it vectorized, as maptools::sunriset is also vectorized. First suggestion: Debug or rewrite it with vectors as input, and then you easily see the lines where something unexpected happens. Let's go at it line by line, I've outcommented your lines where I replace it with something else:

library(maptools)
library(tidyverse)

# sunrise.set2 <- function (lat, long, date, timezone = "UTC", direction = c("sunrise", "sunset"), num.days = 1) 
sunrise.set2 <- function (lat, long, date, timezone = "UTC", direction = c("sunrise", "sunset")
# Why an argument saying how many days? You have the length of your dates
{
        #lat.long <- matrix(c(long, lat), nrow = 1)
        lat.long <- cbind(lon, lat)
        day <- as.POSIXct(date, tz = timezone)
        # sequence <- seq(from = day, length.out = num.days, by = "days") # Your days object is fine
        sunrise <- sunriset(lat.long, day, direction = "sunrise", 
                            POSIXct = TRUE)
        sunset <- sunriset(lat.long, day, direction = "sunset", 
                           POSIXct = TRUE)
        # I've replaced sequence with day here
        ss <- data.frame(sunrise, sunset)
        ss <- ss[, -c(1, 3)]
        colnames(ss) <- c("sunrise", "sunset")

        if (direction == "sunrise") {
                #return(ss[1,1])
                return(ss[,1])
        } else {
                #return(ss[1,2])
                return(ss[,2])
        }       
}

但是从您的功能来看,我认为还有很多额外的工作要做

But looking at your function, I think there is still a lot of extra work done that doesn't serve any purpose.


  • 您正在计算日出和日落,只使用其中之一。您甚至可以不看它就传递一个方向参数。

  • 要求一个单独的日期和时区是否有用?当您的用户为您提供 POSIXt 对象时,将包括时区。您可以输入一个字符串作为日期,这很好,但是只有格式正确时,该字符串才有效。为简单起见,我只要求输入 POSIXct (在您的example-data.frame中)

  • 为什么要创建 data.frame 并分配名称,然后再返回?
  • You're calculating both sunrise and sunset, only to use one of them. And you can just pass one your direction-argument, without even looking at it.
  • Is it useful to ask for a seperate date and timezone? When your users give you a POSIXt-object, the timezone is included. And it's nice if you can input a string as a date, but that only works if it's in the right format. To keep it simple, I'd just ask for a POSIXct as input (which is in your example-data.frame)
  • Why are you making a data.frame and assigning names before returning? As soon as you're subsetting, it all gets dropped again.

这意味着您的功能可以更短:

Which means your function can be a lot shorter:

sunrise.set2 <- function(lat, lon, date, direction = c("sunrise", "sunset")) {
  lat.long <- cbind(lon, lat)
  sunriset(lat.long, date, direction=direction, POSIXct.out=TRUE)[,2]
}

如果您无法控制自己的输入,则可能需要添加一些检查,但通常我觉得保留它最有用专注于您想要完成的事情。

If you have no control over your input you might need to add some checks, but usually I find it most useful to keep focused on just the thing you want to accomplish.

这篇关于使用dplyr :: mutate计算适用于数据框的日出函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆