Rcpp中的for循环崩溃 [英] for loop crashes in Rcpp

查看:93
本文介绍了Rcpp中的for循环崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Rcpp中复制以下代码(来自以下链接的原始熊猫源​​- https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6

I am trying to replicate the following code in Rcpp (original pandas source from the following link- https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6:

library(data.table)
library(microbenchmark)
deg2rad <- function(deg) {(deg * pi) / (180)}

haversine = function(lat1, lon1, lat2, lon2) {
MILES = 3959
lat1 = deg2rad(lat1)
lon1 = deg2rad(lon1)
lat2 = deg2rad(lat2)
lon2 = deg2rad(lon2)
dlat = lat2 - lat1
dlon = lon2 - lon1
a = sin(dlat/2)^2 + cos(lat1) * cos(lat2) * sin(dlon/2)^2
c = 2 * asin(sqrt(a))
total_miles = MILES * c
return(total_miles)
}

# get data from here
download.file("https://raw.githubusercontent.com/sversh/pycon2017-
optimizing-pandas/master/new_york_hotels.csv","new_york_hotels.csv")
nyc_hotels = fread("new_york_hotels.csv", na.strings = c("NA", "N/A", 
"NULL"))

summary(microbenchmark({
nyc_hotels[, greater_circle := haversine(40.671, -73.985, latitude, 
longitude)]
},times=1000))[,-1]
# min      lq     mean  median       uq      max neval
# 290.161 318.559 366.6786 329.491 345.0295 4365.697  1000
##########
#version 2 - invoke update differently, no change to function
summary(microbenchmark({
set(nyc_hotels,j="greater_circle",value=haversine(40.671, -73.985, 
nyc_hotels[['latitude']], nyc_hotels[['longitude']]))
},times=1000))[,-1]
# min      lq     mean  median      uq      max neval
# 81.395 89.5985 123.2211 96.1635 103.476 3670.193  1000

我创建了


haversine.cpp

haversine.cpp

我主目录中的文件如下:

file in my home directory as follows:

#include <Rcpp.h>
#include <iostream>
using namespace Rcpp;


// [[Rcpp::export]]
NumericVector haversine_cpp_fun(double lat1_cpp,double lon1_cpp,NumericVector lat2_cpp,NumericVector lon2_cpp){
double Miles = 3959.0;
int n = lat2_cpp.size();
NumericVector dlat_cpp;
NumericVector dlon_cpp;
NumericVector a_cpp;
NumericVector c_cpp;
NumericVector total_mile_cpp;
lat1_cpp = (lat1_cpp*3.14159)/180.0;
lon1_cpp = (lon1_cpp*3.14159)/180.0;
for (int i=0 ; i<n ; ++i){
  lat2_cpp[i] = (lat2_cpp[i]*3.14159)/180.0;
  lon2_cpp[i] = (lon2_cpp[i]*3.14159)/180.0;
  dlat_cpp[i] = lat2_cpp[i] - lat1_cpp;
  dlon_cpp[i] = lon2_cpp[i] - lon1_cpp;
  a_cpp[i] = pow(sin(dlat_cpp[i]/2.0),2.0) + cos(lat1_cpp) * cos(lat2_cpp[i]) * pow(sin(dlon_cpp[i]/2.0),2.0);
  c_cpp[i] = 2 * asin(sqrt(a_cpp[i]));
  total_mile_cpp[i] = Miles * c_cpp[i];
  }
 return total_mile_cpp;

}
/***R
# Approach 1: Trying to use the set statement from data.table--- fails without giving error. The session just crashes
summary(microbenchmark({
set(nyc_hotels,j="greater_circle",value=haversine_cpp_fun(40.671, -73.985, 
nyc_hotels[['latitude']], nyc_hotels[['longitude']]))
},times=1000))[,-1]
# Approach 2: Without using the set statement from data.table and doing thing in a simple way by a simple function call--- again fails without giving error. The R session just crashes again. 
microbenchmark({
nyc_hotels[, greater_circle := haversine_cpp_fun(40.671, -73.985, latitude, 
longitude)]
})
*/  

并使用sourceCpp对其进行调用,

and called it using sourceCpp as

 sourceCpp('./haversine.cpp')

在我看来,有些东西for循环出错,导致崩溃,但我似乎无法找出它是什么。我说这是当我进行空循环运行而没有循环并且索引的向量仅在索引0处有一个元素时,导致rcpp函数运行的原因。我发现唯一有用的链接是for循环未正确编写的地方( Rcpp函数崩溃),但是我以某种方式尝试了它所说的所有内容,但仍然找不到崩溃的原因。
请帮忙!

In my opinion, there is something wrong with the for loop which is causing it to crash, but I just cant seem to find out what it is. I say this cause when I did a dry run without a loop and only a single element of the vector at index 0, the rcpp function ran. The only link I have found useful is where the for loop was not written correctly (Rcpp function crashes), but somehow I have tried everything it said and still cant find out the crash reason. Please help!

推荐答案

您的会话崩溃,因为您创建了长度为零的NumericVector对象,然后尝试为其分配值使用不安全的括号( [i] )表示法。如果您以正确的长度初始化NumericVectors,则您的代码会运行(尽管我尚未检查其准确性):

Your session crashes because you create NumericVector objects of length zero and then try to assign them values using the unsafe bracket ([i]) notation. If you initialize the NumericVectors with correct length your code runs (I haven't checked its accuracy, though):

#include <Rcpp.h>
#include <iostream>
using namespace Rcpp;


// [[Rcpp::export]]
NumericVector haversine_cpp_fun(double lat1_cpp, double lon1_cpp,
                            NumericVector lat2_cpp, NumericVector lon2_cpp){
  double Miles = 3959.0;
  int n = lat2_cpp.size();
  NumericVector dlat_cpp(n);
  NumericVector dlon_cpp(n);
  NumericVector a_cpp(n);
  NumericVector c_cpp(n);
  NumericVector total_mile_cpp(n);
  lat1_cpp = (lat1_cpp*3.14159)/180.0;
  lon1_cpp = (lon1_cpp*3.14159)/180.0;
  for (int i=0 ; i<n ; ++i){
    lat2_cpp[i] = (lat2_cpp[i]*3.14159)/180.0;
    lon2_cpp[i] = (lon2_cpp[i]*3.14159)/180.0;
    dlat_cpp[i] = lat2_cpp[i] - lat1_cpp;
    dlon_cpp[i] = lon2_cpp[i] - lon1_cpp;
    a_cpp[i] = pow(sin(dlat_cpp[i]/2.0),2.0) + cos(lat1_cpp) * cos(lat2_cpp[i]) * pow(sin(dlon_cpp[i]/2.0),2.0);
    c_cpp[i] = 2 * asin(sqrt(a_cpp[i]));
    total_mile_cpp[i] = Miles * c_cpp[i];
  }
  return total_mile_cpp;
}

一般来说,请使用更安全的。 at(i)方法可使您的代码更正常地失败。

On a more general note: Using the safer .at(i) method makes your code fail more gracefully.

这篇关于Rcpp中的for循环崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆