Rcpp:使用Rcpp(内联)的数据帧时推荐的代码结构 [英] Rcpp: Recommended code structure when using data frames with Rcpp (inline)

查看:155
本文介绍了Rcpp:使用Rcpp(内联)的数据帧时推荐的代码结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

[我已将此草稿作为其他地方的评论,但决定创建一个正确的问题...]

[I had this sketched out as a comment elsewhere but decided to create a proper question...]

目前被认为是最佳实践在Rcpp中使用数据帧时的代码结构?可以将输入数据帧从R转换到C ++代码的难易程度是显而易见的,但是如果数据帧具有n列,则是当前的想法,即该数据应当被分成n个单独的C ++)向量之前使用?

What is currently considered "best practice" in terms of code structuring when using data frames in Rcpp? The ease with which one can "beam over" an input data frame from R to the C++ code is remarkable, but if the data frame has n columns, is the current thinking that this data should be split up into n separate (C++) vectors before being used?

在使用数据框架中的字符串是的,这是正确的事情。特别地,似乎不支持诸如df.name [i] 的符号直接引用数据帧信息(如在C结构中可能具有的),除非I 'm错误。

The response to my previous question on making use of a string (character vector) column in a data frame suggests to me that yes, this is the right thing to do. In particular, there doesn't seem to be support for a notation such as df.name[i] to refer to the data frame information directly (as one might have in a C structure), unless I'm mistaken.

然而,这导致我们陷入了一种情况,子数据向下的数据是更麻烦 - 而不是能够在一行数据框,每个变量必须单独处理。因此,是认为Rcpp中的子集化最好通过布尔向量隐式地完成,例如

However, this leads us into a situation where subsetting down the data is much more cumbersome - instead of being able to subset a data frame in one line, each variable must be dealt with separately. So, is the thinking that subsetting in Rcpp is best done implicitly, via boolean vectors, say?

总而言之,我想检查我目前的理解,虽然一个数据框架可以传递到C ++代码,没有办法直接引用其列的个别元素以df.name [i]的方式,没有简单的通过选择满足简单标准(例如df.date在给定范围内)的行来生成输入df的子数据帧的方法。

To summarise, I guess in a nutshell I wanted to check my current understanding that although a data frame can be beamed over to the C++ code, there is no way to refer directly to the individual elements of its columns in a "df.name[i]" fashion, and no simple method of generating a sub-dataframe of the input df by selecting rows satisfying simple criteria (e.g. df.date being in a given range).

推荐答案

因为数据框架实际上内部表示为向量列表,所以通过向量访问真的是你能做的最好的。在C或C ++层次上没有任何方法可以逐行子集。

Because data frames are in fact internally represented as list of vectors, the access by vectors really is the best you can do. There simply is no way to subset by row at the C or C++ level.

几个星期前在r-devel上有一个很好的讨论一个数据框架的转置(你不能为同样的原因廉价)。

There was a good discussion about that on r-devel a few weeks ago in the context of a transpose of a data.frame (which you cannot do 'cheaply' for the same reason).

这篇关于Rcpp:使用Rcpp(内联)的数据帧时推荐的代码结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆