Ruby的C扩展API的问题 [英] Ruby C extensions API questions

查看:151
本文介绍了Ruby的C扩展API的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,最近我有不幸需要做一个C扩展为Ruby(因为性能)。因为我是有认识问题(现在仍然如此),所以我看着Ruby源代码后发现:的typedef unsigned long类型值; (<一个href=\"https://github.com/ruby/ruby/blob/2b44bbf6970fbc9f5bf82f7316a8b2a5a8b460d4/include/ruby/ruby.h#L88\"相对=nofollow>链接到源,但你会发现,有它做了一些其他'方式',但我认为它本质上是一个 ;纠正我,如果我错了)。因此,尽管调查这进一步,我发现了一个有趣的博客文章,该说:


  

......在某些情况下,值对象可以是数据,而不是指向的数据。


什么困惑我的是,当我试图将一个字符串在Ruby传递到C,并使用 RSTRING_PTR(); (传递给Ruby中的C函数),并尝试调试它的strlen(); 返回4的始终的4。

例如code:

 值试验(VALUE INP){
    无符号字符* C = RSTRING_PTR(INP);
    //返回rb_str_new2(C); //这会返回一些随机乱码
    返回INT2FIX(strlen的(C));
}

这个例子总是返回1作为字符串的长度:

 值试验(VALUE INP){
    无符号字符* C =(无符号字符*)INP;
    //返回rb_str_new2(C); //总是\\ X03在Ruby中。
    返回INT2FIX(strlen的(C));
}

有时红宝石我看到一个异常说无法转换模块,将字符串(或类似的规定,不过我是用code与其说试图弄清楚这一点,我搞乱现在无法重现错误会当我试图发生错误 StringValuePtr(); [我有点不清楚这究竟文件说,它的变化。传递给paramater 的char * ]上INP):

 值试验(VALUE INP){
    StringValuePtr(INP);
    返回rb_str_new2((字符*)INP); //如果不投,我会得到编译器警告
}

所以,有问题的红宝石code是: MyMod ::测试(blahblablah)

修改:修正了一些错别字和更新后的一点点


的问题


  1. 到底是什么 VALUE小鬼持有?一个指向对象/价值?
    本身的价值?

  2. 如果它拥有本身的价值:当它这样做,有没有办法来检查它

  3. 如何实际访问值(因为我似乎访问几乎所有的
    值)?

P.S:我的C的理解是不是真的是最好的,但它是一个进展中的工作;此外,阅读在code片段一些额外说明的意见(如果它帮助)。

谢谢!


解决方案

红宝石主场迎战字符串C字符串

让我们先从第一个字符串。首先,试图检索在C字符串之前,这是好习惯叫的StringValue(OBJ)第一。这可以确保你将真正处理到底是Ruby的字符串,因为如果它不是已经是一个字符串,那么它会变成一个又与该对象的 to_str 方法。因此,这使事情变得更加安全和prevents偶尔段错误你可能会得到其他。

要注意接下来的事情是,Ruby的字符串不是 \\ 0 -terminated作为C code期望他们做的东西像的strlen 等方面的工作预期。 Ruby的字符串随身携带的长度信息用它们来代替 - 这就是为什么除了 RSTRING_PTR(STR)也有个 RSTRING_LEN(STR)宏来确定实际长度。

那么 StringValuePtr 现在所做的就是返回非零终止的char * 你 - 这是伟大的对于您有一个单独的长度,但不是你想要的东西如缓冲区的strlen 。使用 StringValueCStr 相反,它会修改字符串是零结尾所以这是在C,它希望它是零结尾的功能使用安全。但是,尽量避免此尽可能因为该变形比检索非零终止不具有在所有修改字符串高性能少得多。如果你留意这个让人惊奇的是,你会怎样实际上很少需要真正的C字符串。

自身作为一个隐含的取值

另一个原因,预期当前的code不工作是每一个C函数由红宝石被称为被传递作为隐


  • 在红宝石(例如obj.doit)无参数转换为


      

    DOIT VALUE(VALUE个体经营)



  • 参数固定金额(> 0,例如​​obj.doit(A,B))转换为


      

    DOIT VALUE(VALUE自我,价值A,B值)



  • 变参在Ruby中(例如obj.doit(A,B =无))转换为


      

    VALUE DOIT(INT ARGC,VALUE * argv的,VALUE个体经营)



在Ruby中。那么,你在你的榜样工作是的的由红宝石传递给你的字符串,但实际的当前值,即对象这是接收器,当你调用的函数。你比如一个正确的定义是

 静态值测试(VALUE自我,值输入)

我做了静态来指出你应该在你的C扩展跟随其他规则。让你的C函数仅公共如果你打算在几个源文件共享。因为这是几乎从来没有为函数附加到一个Ruby类的情况下,你应该默认声明为静态,只让他们公开,如果有一个很好的理由这样做左右。

什么是价值,它从何而来?

现在的困难的部分。如果你挖成深红宝石内部,那么你会发现功能 rb_objnew 在GC 。C。在这里,你可以看到任何新创建Ruby对象由被铸成一个来自一种叫做空闲列表变成了。它的定义是:

 的#define空闲列表objspace-&GT; heap.freelist

您可以想像, objspace 作为每一个对象,它是目前活着的在给定的时间点在code存储巨大的地图。这也是垃圾收集履行自己的职责,特别是结构就是新的对象是出生的地方。该堆的空闲列表再次被声明为一个右值* 。这是对Ruby的C-内部重新presentation内置类型。一个右值实际上是定义如下:

  typedef结构{右值
    工会{
    结构{
        VALUE标志; / *始终为0释放OBJ * /
        结构右值*旁边;
    } 自由;
    结构RBasic基础;
    结构RObject对象;
    结构RClass克拉斯;
    结构RFloat flonum;
    结构RSTRING串;
    结构的RArray阵列;
    结构RRegexp正则表达式;
    结构RHash散列;
    结构RDATA数据;
    结构RTypedData typeddata;
    结构RStruct rstruct;
    结构RBignum BIGNUM;
    结构的RFile文件;
    结构RNode节点;
    结构RMatch匹配;
    结构合理RRational;
    结构复杂RComplex;
    }作为;
    #IFDEF GC_DEBUG
    为const char *文件;
    INT线;
    #万一
}右值;

也就是说,红宝石知道核心数据类型的基本上联合。失去了一些东西?是的,Fixnums,符号,和布尔值不包含在那里。这是因为这些对象的直接重新$ P $使用 psented无符号长,一个归结为结束。我认为,设计决策有(除了是一个很酷的想法)的解引用指针可能会略多于转化来当当前需要移位更少高性能什么它实际上重新presents。从本质上讲

  OBJ =(VALUE)空闲列表;

说给我任何空闲列表指向当前和治疗是无符号长。这是安全的,因为自由列表是一个指向右值 - 和指针也可以PTED为安全间$ P $无符号长。这意味着每除了那些携带Fixnums,符号,零或布尔基本上指针到右值中,其他人直接再在值内psented $ p $

您最后一个问题,你怎么能检查什么表示?您可以使用 TYPE(X)宏检查的类型是否会的原始的一种 那些。

So, recently I had the unfortunate need to make a C extension for Ruby (because of performance). Since I was having problems with understanding VALUE (and still do), so I looked into the Ruby source and found: typedef unsigned long VALUE; (Link to Source, but you will notice that there are a few other 'ways' it's done, but I think it's essentially a long; correct me if I'm wrong). So, while investigating this further I found an interesting blog post, which says:

"...in some cases the VALUE object could BE the data instead of POINTING TO the data."

What confuses me is that, when I attempt to pass a string to C from Ruby, and use RSTRING_PTR(); on the VALUE (passed to the C-function from Ruby), and try to 'debug' it with strlen(); it returns 4. Always 4.

example code:

VALUE test(VALUE inp) {
    unsigned char* c = RSTRING_PTR(inp);
    //return rb_str_new2(c); //this returns some random gibberish
    return INT2FIX(strlen(c));
}

This example returns always 1 as the string length:

VALUE test(VALUE inp) {
    unsigned char* c = (unsigned char*) inp;
    //return rb_str_new2(c); // Always "\x03" in Ruby.
    return INT2FIX(strlen(c));
}

Sometimes in ruby I see an Exception saying "Can't convert Module to String" (or something along those lines, however I was messing with the code so much trying to figure this out that I am unable to reproduce the error now the error would happen when I tried StringValuePtr(); [I'm a bit unclear what this exactly does. Documentation says it changes the passed paramater to char*] on inp):

VALUE test(VALUE inp) {
    StringValuePtr(inp);
    return rb_str_new2((char*)inp); //Without the cast, I would get compiler warnings
} 

So, the Ruby code in question is: MyMod::test("blahblablah")

EDIT: Fixed a few typos and updated the post a little.


The questions

  1. What exactly does VALUE imp hold? A pointer to the object/value? The value itself?
  2. If it holds the value itself: when does it do that, and is there a way to check for it?
  3. How do I actually access the value (since I seem to accessing almost everything but the value)?

P.S: My understanding of C isn't really the best, but it's a work in progress; also, read the comments in the code snippets for some additional description (if it helps).

Thanks!

解决方案

Ruby Strings vs. C strings

Let's start with strings first. First of all, before trying to retrieve a string in C, it is good habit to call StringValue(obj) on your VALUE first. This ensures that you will really deal with a Ruby string in the end because if it is not already a string, then it will turn it into one by coercing it with a call to that object's to_str method. So this makes things safer and prevents the occasional segfault you might get otherwise.

The next thing to watch out for is that Ruby strings are not \0-terminated as your C code would expect them to make things like strlen etc. work as expected. Ruby's strings carry their length information with them instead - that's why in addition to RSTRING_PTR(str) there is also the RSTRING_LEN(str) macro to determine the actual length.

So what StringValuePtr now does is returning the non-zero-terminated char * to you - this is great for buffers where you have a separate length, but not what you want for e.g. strlen. Use StringValueCStr instead, it will modify the string to be zero-terminated so that it is safe for usage with functions in C that expect it to be zero-terminated. But, try to avoid this wherever possible, because this modification is much less performant than retrieving the non-zero-terminated string that does not have to be modified at all. It's surprising if you keep an eye on this how rarely you will actually need "real" C strings.

self as an implicit VALUE argument

Another reason why your current code doesn't work as expected is that every C function to be called by Ruby gets passed self as an implicit VALUE.

  • No arguments in Ruby ( e.g. obj.doit ) translates to

    VALUE doit(VALUE self)

  • Fixed amount of arguments (>0, e.g. obj.doit(a, b)) translates to

    VALUE doit(VALUE self, VALUE a, VALUE b)

  • Var args in Ruby ( e.g. obj.doit(a, b=nil)) translates to

    VALUE doit(int argc, VALUE *argv, VALUE self)

in Ruby. So what you were working on in your example is not the string passed to you by Ruby but actually the current value of self, that is the object that was the receiver when you called that function. A correct definition for your example would be

static VALUE test(VALUE self, VALUE input) 

I made it static to point out another rule that you should follow in your C extensions. Make your C functions only public if you intend to share them among several source files. Since that's almost never the case for function that you attach to a Ruby class, you should declare them as static by default and only make them public if there is a good reason to do so.

What is VALUE and where does it come from?

Now to the harder part. If you dig down deeply into Ruby internals, then you will find the function rb_objnew in gc.c. Here you can see that any newly created Ruby object becomes a VALUEby being cast as one from something called the freelist. It's defined as:

#define freelist objspace->heap.freelist

You can imagine the objspace as a huge map that stores each and every object that is currently alive at a given point in time in your code. This is also where the garbage collector fulfills his duty and the heap struct in particular is the place where new objects are born. The "freelist" of the heap is again declared as being an RVALUE *. This is the C-internal representation of the Ruby built-in types. An RVALUE is actually defined as follows:

typedef struct RVALUE {
    union {
    struct {
        VALUE flags;        /* always 0 for freed obj */
        struct RVALUE *next;
    } free;
    struct RBasic  basic;
    struct RObject object;
    struct RClass  klass;
    struct RFloat  flonum;
    struct RString string;
    struct RArray  array;
    struct RRegexp regexp;
    struct RHash   hash;
    struct RData   data;
    struct RTypedData   typeddata;
    struct RStruct rstruct;
    struct RBignum bignum;
    struct RFile   file;
    struct RNode   node;
    struct RMatch  match;
    struct RRational rational;
    struct RComplex complex;
    } as;
    #ifdef GC_DEBUG
    const char *file;
    int   line;
    #endif
} RVALUE;

That is, basically a union of core data types that Ruby knows about. Missing something? Yes, Fixnums, Symbols, nil and boolean values are not included there. It's because these kinds of objects are directly represented using the unsigned long that a VALUE boils down to in the end. I think the design decision there was (besides being a cool idea) that dereferencing a pointer might be slightly less performant than the bit shifts that are currently needed when transforming the VALUE to what it actually represents. Essentially

obj = (VALUE)freelist;

says give me whatever freelist points to currently and treat is as unsigned long. This is safe because freelist is a pointer to an RVALUE - and a pointer can also be safely interpreted as unsigned long. This implies that every VALUE except those carrying Fixnums, symbols, nil or Booleans are essentially pointers to an RVALUE, the others are directly represented within the VALUE.

Your last question, how can you check for what a VALUE stands for? You can use the TYPE(x) macro to check whether a VALUE's type would be one of the "primitive" ones.

这篇关于Ruby的C扩展API的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆