清漆 C VRT 变量/函数 [英] Varnish C VRT variables/functions

查看:23
本文介绍了清漆 C VRT 变量/函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开始学习 varnish 并且在我们的配置(以及网络上的示例)中遇到了对 C 代码中 VRT 函数的引用,但我找不到相关文档(据我所知,我的 C 知识是非-存在).这是我能找到的最好的,但它只是原型:http://fossies.org/dox/varnish-4.0.2/vrt__obj_8h.html#a7b48e87e48beb191015eedf37489a290

所以这是我们使用的一个例子(而且我发现它似乎是从网上copypasta,因为我发现它很多次):

C{#include 静态无效 strtolower(char *c) {for (; *c; c++) {如果 (isupper(*c)) {*c = tolower(*c);}}}}C子 vcl_recv {...东西....if (req.url ~ "<condition>" && (<another if condition>)) {C{strtolower((char *)VRT_r_req_url(sp));}C}

所以我的问题是:

  1. 这里的 sp 是什么?它从何而来?它没有在任何地方定义,我也找不到关于它的任何信息
  2. VRT_r_req_url 有什么作用?为什么是 VRT_ 前缀,什么是 r(我看到也有 VRT_l_ 函数).它从中获取数据的结构是什么?
  3. 所有这些 VRT 函数是否都可以并行获取变量,相当于在 C 块之外说 req.url?
  4. 是否有文档说明所有这些功能的作用?例如,我也看过几次:

    sub detectionmobile {C{VRT_SetHdr(sp, HDR_BEREQ, "\020X-Varnish-TeraWurfl:", "no1", vrt_magic_string_end);}C}

    那么这里的 HDR_BEREQ 和 vrt_magic_string_end 是什么?

解决方案

这将是一个很长的答案,因为关于您的问题有很多话要说.首先,关于你的 VCL 中的 C 代码的一些 nits:

  1. 实施 strtolower 可能是不必要的;标准 vmod 有一个 std.tolower 函数.如果您正在运行 Varnish 3,则应改用它.(也就是说,它的存在似乎意味着您可能正在使用 Varnish 2,所以谁知道呢?)
  2. 您对 VRT_SetHdr 的调用似乎没有必要.我看不出这和 set bereq.http.X-Varnish-TeraWurfl = "no1";
  3. 有什么区别

我的一些答案可能不是非常准确,因为不清楚您使用的是哪个版本的 Varnish,但我会猜测

现在,回答您的问题:

<块引用>

  1. 这里的 sp 是什么?它从何而来?它没有在任何地方定义,我也找不到关于它的任何信息

sp 在 Varnish 中是惯用的,表示 会话指针.它属于 struct sess 类型,并包含有关正在进行的请求的一些上下文.根据您使用的 Varnish 版本,这可能具有或多或少的上下文,因此很难真正定义范围.在 Varnish 2 中,会话包含从工作区到请求状态(以及介于两者之间)的所有内容.Varnish 4 已经大大地拆分了这一点.

我猜你正在使用 Varnish 2 或 Varnish 3.在 Varnish 4 中,你会传递一个叫做 ctx 的东西.

无论如何,从配置的角度来看,您真正需要了解的关于 sp 的唯一事情是它始终是任何 VRT 函数的第一个参数.

<块引用>

  1. VRT_r_req_url 有什么作用?为什么它带有 VRT_ 前缀以及什么是 r(我看到也有 VRT_l_ 函数).它从中获取数据的结构是什么?

VRT 代表 VCL RunTime.它是在 Varnish 二进制文件本身内部实现的一组函数.函数签名和一些不透明的结构通过头文件暴露给 VCL.VCL 编译器使用这个头文件以及它从 VCL 生成的 C 代码的输出来创建一个可加载到 Varnish 的共享对象.此外,还有一个 TCL 脚本(在 Varnish 4 中是 Python),它将不同的 VCL 内置和变量与 VRT 函数相关联.

rl 代表 rightleft,这与变量所在的位置有关在表达式中求值.因为 VCL 不允许任何类型的复杂"表达式(例如加法或减法;除非您将 max_restarts 设置为无界值,否则它肯定与图灵完备相去甚远),因此实际上只有两个地方可以访问变量:在右侧- 手边,或左侧.例如:

set req.url = req.url + "/"

将编译为

VRT_l_req_url(sp, VRT_r_req_url(sp), "/", vrt_magic_string_end);

左侧访问req.url导致编译器调用VRT_l_req_url,右侧访问导致它使用VRT_r_req_url.

一种更简单的思考方式可能是 l 表示设置",而 r 表示获取"(或阅读",如果您愿意的话).但它真的意味着左和右.

要将其与您的代码片段相关联:

strtolower((char *)VRT_r_req_url(sp));

VRT_r_req_url 返回一个 const char * 表示 req.url 的值.该指针被转换为 char * 以移除 const 限定符.(此转换是您配置中的一个错误.)转换指针被发送到 strtolower,然后将字符串小写.

这有几个原因.VRT_r_req_url 给了你一个 const char * 回来,所以你真的不应该修改它.我不认为这会破坏任何东西,但它违反了您获得的 API 合同.此外,您被赋予写入 req.url 的方式是通过 VRT_l_req_url 接口——而不是直接在您的 strtolower 实现中.因此,执行此操作的正确方法是使用 std.tolower vmod,或在会话工作区中制作 URL 的副本,修改该副本,然后将其存储回 VRT_l_req_url.

顺便说一句,strtolower 实现不需要 if (isupper(*c)) 检查.这种检查只会混淆处理器的分支预测器.tolower(3) 基本上每个实现都使用无分支查找表,没有小写等效的字符(如数字)不会被转换.

<块引用>

  1. 所有这些 VRT 函数是否都可以并行获取变量,相当于在 C 块之外说 req.url?

是的.所有 VRT 函数都实现函数调用或变量查找.但我认为你的意思是在 C 块内部".

<块引用>

  1. 是否有文档说明所有这些功能的作用?例如,我也看过几次:

sub detectionmobile {C{VRT_SetHdr(sp, HDR_BEREQ, "\020X-Varnish-TeraWurfl:", "no1", vrt_magic_string_end);}C}

<块引用>

那么这里的 HDR_BEREQ 和 vrt_magic_string_end 是什么?

有一些文档,但其中相当一部分需要源代码挖掘.如果您能说出您使用的是哪个版本的 Varnish,我可以向您指出一些可能有助于了解正在发生的事情的文件.

HDR_BEREQ 告诉 VRT_SetHdr 使用包含将发送到后端的请求的特定工作区.

vrt_magic_string_end 是一个哨兵.基本上所有可以接受字符串参数的函数也可以接受一组连接在一起的字符串.Varnish 通过对这些函数使用可变参数,将多个 char * 参数传递给函数来解决这个问题.通常,如果您有一个函数,其参数数量可变且全部是指针,您只需使用 NULL 指针来表示没有更多参数可用.但是,将 NULL 值传递给许多这些函数是完全有效的.vrt_magic_string_end 是一个常量指针值,不能与任何其他指针混淆,因此是一种确定何时不再向函数传递参数的安全方法.

考虑一个 log 调用,例如:

log req.url + " " + req.http.Wookies + "ha!"

此调用将转换为:

VRT_log(sp, VRT_r_req_url(sp), " ", VRT_GetHdr(sp, HDR_REQ, "\10Wookies:"), "ha!", vrt_magic_string_end);

如果我们不使用 vrt_magic_string_end,而是依靠 NULL,我们将永远无法弄清楚哈!"还需要打印.

无论如何,这里有很多回应.我希望它有用;如果您有更多问题,请随时提问.

后续问题

<块引用>

  1. 那么,C 块之外的所有操作实际上只是在幕后调用 C 函数,因此 VCL 中的所有函数和变量都与 VRT 函数匹配吗?

是的,有效.从技术角度来看,VCL 并没有真正的变量(或者也可以说是函数).从严格意义上讲,它并不是真正的编程语言.它只是一种用于调整 Varnish HTTP 状态机的语言.

<块引用>

  1. 为什么在 VRT_SetHdr 中指定工作区而在 VRT_r_req_url 中不指定?就像我运行 VRT_r_bereq_url 来获取后端 url 一样,还是我需要用工作区调用它来获取它,比如 VRT_r_req_url(sp, BEREQ) (或者这只是不是一个有效的操作,因为你从不查找后端网址)?
  2. 我如何知道何时需要传递工作区以及它们都是什么(即 HDR_BEREQ 显然是后端请求标头,但还有哪些其他工作区)?

这些问题的答案是相关的,所以我会在一个地方回答它们.

这是因为解析 req.url 的位置嵌入在函数名称中,这是由于 VCL 编译器的工作方式有些奇怪.在 HTTP 中,URL 实际上并不是标头的一部分,但 Varnish 会像对待它一样对待它.类似地,诸如 beresp.ttlreq.hash_always_miss 之类的东西不是标题.当我们查看的位不是标题时,我们需要专门实现它们.

确实,很难找到实现 req.url 的位置,因为一些相当不幸的宏使用没有任何注释.您对 cache_vrt_var.c:64-95 感兴趣.

无论如何,标头是动态的,在收到请求之前,您不知道它们会在哪里(如果它们存在的话).通过任何接口访问各种状态的标头时(req.http.*bereq.http.*beresp.http.*, 和 resp.http.*),您需要针对该特定状态解析它们.为了减少代码重复,通过这些方法读取或设置的任何标头分别通过 VRT_GetHdrVRT_SetHdr.因为这些函数是所有 VCL 状态共享的,所以你向它们传递一个提示,告诉它们你是否在谈论 reqbereqberespresp 标头.因此,您可能可以想象,您有 HDR_REQHDR_BEREQHDR_BERESPHDR_RESP.

<块引用>

  1. 为了学习(忽略有一个 vmod),您是否介意更新您的帖子以展示实现 strtolower 函数的最佳方法,以避免通过狡猾的强制转换和传递不正确的类型来修改 const到tolower函数?

老实说,你不能真正安全地做到这一点,因为 VCL 编译器为 struct sess 提供了一个不透明的类型.在不制作 VMOD 的情况下,您能做的最好的事情是:

#include 静态无效strtolower(char *c){而 (*c != '\0) {*c++ = tolower(*c);}}

如果您使用 C99 支持进行编译,您可能会这样做:

C{#include 静态无效strtolower(const char *c, char *obuf){而 (*c != '\0') {*obuf++ = tolower(*c++);}*obuf = '\0';}}C...if (req.url ~ "[A-Z]") {C{const char *url = VRT_r_req_url(sp);size_t urllen = strlen(url) + 1;字符 obuf[urllen];strtolower(网址,obuf,网址);vrt_l_req_url(sp, obuf, vrt_magic_str_end);}C}

老实说,这个实现也不是很好.当您获得一个长 URL 时,您可能会冒着耗尽堆栈的风险,并且您不想在 VCL 内部进行 malloc.实际的 strtolower 实现不做任何边界检查;它只需要你有一个足够大的缓冲区来保存字符串.这些都是可以解决的问题,但我真的不想在上面花费大量时间,因为这是错误的方法.这就是创建 VMOD 的确切原因.

您可以看到标准的 strtoupper/strtolower 实现明显不同:它从工作区保留空间,复制到工作区缓冲区,然后释放它没有使用的空间.

(PS 我去掉了未定义的行为注释,因为我意识到我引用的 tolower(3) 联机帮助页意味着输入必须可以用无符号字符表示.这是因为 tolower(3) 采用整数参数;您传递的值可能超出范围.所以这是错误的信息,我已撤回.)

I'm starting to pick up varnish and have come across references to VRT functions in C code in our configuration (and examples on the net) that I can't find documentation on (that I understand, my C knowledge is non-existant). This is the best I can find but it's just the prototypes: http://fossies.org/dox/varnish-4.0.2/vrt__obj_8h.html#a7b48e87e48beb191015eedf37489a290

So here's an example we use (and which seems to be copypasta from the net as I've found it plenty of times):

C{
  #include <ctype.h>
  static void strtolower(char *c) {
    for (; *c; c++) {
      if (isupper(*c)) {
        *c = tolower(*c);
      }
    }
  }
}C

sub vcl_recv {
...stuff....
if (req.url ~ "<condition>" && (<another if condition>)) {
  C{
    strtolower((char *)VRT_r_req_url(sp));
  }C
}

So my questions are:

  1. What is sp here? Where does it come from? It's not defined anywhere nor can I find anything about it
  2. What does VRT_r_req_url do? Why is it VRT_ prefixed and what is the r (I see there are VRT_l_ functions too). What is this struct it gets data from?
  3. Are all of these VRT functions parallels to get variables equivalent to say req.url outside of a C block?
  4. Is there documentation somewhere that says what all of these do? For example I've seen this one a few times as well:

    sub detectmobile {
      C{
        VRT_SetHdr(sp, HDR_BEREQ, "\020X-Varnish-TeraWurfl:", "no1", vrt_magic_string_end);
      }C
     }
    

    So what is HDR_BEREQ and vrt_magic_string_end here?

解决方案

This is going to be a pretty long answer, because there's a fair bit to say regarding your question. First, some nits about the C code in your VCL:

  1. Implementing strtolower is probably unnecessary; the standard vmod has a std.tolower function. If you are running Varnish 3, you should use this instead. (That said, the existence of this seems to imply you might be using Varnish 2, so who knows?)
  2. Your call to VRT_SetHdr seems unnecessary. I don't see any difference between that and set bereq.http.X-Varnish-TeraWurfl = "no1";

Some of my answers may not be super accurate because it's unclear what version of Varnish you're using, but I'm going to guess

Now, to get at your questions:

  1. What is sp here? Where does it come from? It's not defined anywhere nor can I find anything about it

sp is idiomatic in Varnish to mean session pointer. It is of type struct sess and contains some context about an in-progress request. Depending on what version of Varnish you're using, this may have more or less context, so it's hard to really define the scope. In Varnish 2, the session contains everything from workspace to request state (and much in between). Varnish 4 has split this out considerably.

I'm guessing that you're using Varnish 2 or Varnish 3. In Varnish 4, you would be passing around something called ctx.

In any event, from the perspective of the configuration, the only thing you really need to know about sp is that it is always the first argument to any VRT function.

  1. What does VRT_r_req_url do? Why is it VRT_ prefixed and what is the r (I see there are VRT_l_ functions too). What is this struct it gets data from?

VRT stands for VCL RunTime. It is a set of functions that are implemented inside the Varnish binary itself. The function signatures and some opaque structures are exposed to VCL through a header file. The VCL compiler uses this header file along with the output of the C code it generates from your VCL to create a shared object that is loadable into Varnish. In addition, there is a TCL script (it's Python in Varnish 4) that associates different VCL built-ins and variables with VRT functions.

The r and l stand for right and left and this has to do with where a variable is evaluated in an expression. Because VCL doesn't allow any kind of "complex" expressions (like addition or subtraction; it's certainly nowhere close to Turing complete unless you set max_restarts to an unbounded value), there are really only two places variables can be accessed: on the right-hand side, or the left-hand side. For instance:

set req.url = req.url + "/"

will compile to

VRT_l_req_url(sp, VRT_r_req_url(sp), "/", vrt_magic_string_end);

The access to req.url on the left-hand side causes the compiler to call VRT_l_req_url, and the access on the right-hand side causes it to use VRT_r_req_url.

An easier way to think about it might be l means "set" and r means "get" (or "read", if you prefer). But it really means left and right.

To tie this into your code snippet:

strtolower((char *)VRT_r_req_url(sp));

VRT_r_req_url returns a const char * representing the value of req.url. This pointer is being cast to char * to remove the const qualifier. (This cast is a bug in your configuration.) The cast pointer is sent to strtolower, which then lowercases the string.

This is buggy for a few reasons. VRT_r_req_url gave you a const char * back, so you really aren't supposed to modify it. I don't think this will break anything, but it is a violation of the API contract you are given. Furthermore, the way you are given to write to req.url is via the VRT_l_req_url interface -- not directly in your strtolower implementation. Therefore, the correct way to do this would be to use either the std.tolower vmod, or to make a copy of the URL in the session workspace, to modify that copy, and then store it back with VRT_l_req_url.

As an aside, the strtolower implementation does not need the if (isupper(*c)) check. This check only serves to confuse the processor's branch predictor. tolower(3) in basically every implementation uses a branchless lookup table, and characters (like numbers) without a lowercase equivalent will not be converted.

  1. Are all of these VRT functions parallels to get variables equivalent to say req.url outside of a C block?

Yes. All VRT functions implement either function calls or variable lookups. But I think you mean "inside of a C block".

  1. Is there documentation somewhere that says what all of these do? For example I've seen this one a few times as well:

sub detectmobile {
  C{
    VRT_SetHdr(sp, HDR_BEREQ, "\020X-Varnish-TeraWurfl:", "no1", vrt_magic_string_end);
  }C
 }

So what is HDR_BEREQ and vrt_magic_string_end here?

There is some documentation, but a fair bit of it requires source diving. If you can say what version of Varnish you're using, I can point you to some files that might be helpful for understanding what's going on.

HDR_BEREQ tells VRT_SetHdr to use a particular workspace that contains the request that will be sent to the backend.

vrt_magic_string_end is a sentinel. Basically all of the functions that can take a string argument can also take a bunch of strings concatenated together. Varnish solves this problem by using varargs for these functions, passing multiple char * arguments to the function. Typically, if you have a function with a variable number of arguments that are all pointers, you'd just use a NULL pointer to signify that no more arguments are available. However, it is perfectly valid for a NULL value to be passed in to many of these functions. vrt_magic_string_end is a constant pointer value that cannot be confused for any other pointer, and therefore is a safe method for determining when no more arguments were passed to the function.

Consider a log call like:

log req.url + " " + req.http.Wookies + "ha!"

This call would be converted to:

VRT_log(sp, VRT_r_req_url(sp), " ", VRT_GetHdr(sp, HDR_REQ, "\10Wookies:"), "ha!", vrt_magic_string_end);

If we did not use vrt_magic_string_end, and instead relied on NULL, we would never be able to figure out that "ha!" would also need printing.

Anyway, there's a lot of response here. I hope it's useful; please feel free to ask questions if you have more.

Edit: Follow-up Questions

  1. So are all operations outside a C block actually just calling the C functions under the covers, and thus are all the functions and variables in VCL matched by a VRT function?

Yes, effectively. From a technical perspective, VCL doesn't really have variables (or arguably functions either). It's not really a programming language in a strict sense. It's simply a language for tweaking the Varnish HTTP state machine.

  1. In VRT_SetHdr why do you specify a workspace but in VRT_r_req_url you don't? As in do I run VRT_r_bereq_url to get a backend url or do I need to call it with a workspace as well to get that, something like VRT_r_req_url(sp, BEREQ) (or is this just not a valid operation because you never look up a backend URL)?
  2. How do I know when I need to pass a workspace or not and what are they all are (i.e. HDR_BEREQ is obviously back end request headers, but what other workspaces are there)?

The answers to these are related, so I'll answer them both in one place.

This is because the place to resolve req.url from is embedded in the function name, and this is due to some general weirdness in how the VCL compiler does things. In HTTP, the URL isn't really part of headers, but Varnish sort of treats it like it is. Similarly, things like an beresp.ttl or req.hash_always_miss are not headers. When the bits we're looking at aren't headers, we need to implement them specially.

Indeed, finding where req.url is implemented is hard because of some rather unfortunate macro use without any comments. You're interested in cache_vrt_var.c:64-95.

Anyway, headers are dynamic, and you don't know where they'll be (if they exist at all) until you get a request. When accessing headers through any of the interfaces for various states (req.http.*, bereq.http.*, beresp.http.*, and resp.http.*), you need to resolve them for that specific state. To reduce code duplication, any header read or set via these methods goes through VRT_GetHdr or VRT_SetHdr, respectively. Because these functions are shared for all VCL states, you pass a hint to them to tell them whether you're talking about req, bereq, beresp, or resp headers. So as you can probably imagine, you have HDR_REQ, HDR_BEREQ, HDR_BERESP, and HDR_RESP.

  1. For the sake of learning (ignoring that there is a vmod for this) would you mind updating your post to show the best way to implement the strtolower function avoiding the modifying a const via a dodgy cast and the passing of an incorrect type to the tolower function?

Honestly, you can't really do it safely because the VCL compiler is given an opaque type for struct sess. Without making a VMOD, the best you can do is:

#include <ctype.h>
static void 
strtolower(char *c)
{
  while (*c != '\0) {
    *c++ = tolower(*c);
  }
}

If you compile with C99 support, you could possibly do this:

C{
  #include <ctype.h>
  static void 
  strtolower(const char *c, char *obuf)
  {
    while (*c != '\0') {
      *obuf++ = tolower(*c++);
    }
    *obuf = '\0';
  }
}C

...

if (req.url ~ "[A-Z]") {
  C{
    const char *url = VRT_r_req_url(sp);
    size_t urllen = strlen(url) + 1;
    char obuf[urllen];

    strtolower(url, obuf, urllen);
    VRT_l_req_url(sp, obuf, vrt_magic_str_end);
  }C
}

Honestly, this implementation isn't great either. You risk blowing out the stack doing this when you get a long URL, and you don't want to malloc inside of VCL. The actual strtolower implementation doesn't do any bounds checking; it just requires you to have a buffer large enough to hold the string. These are all solvable problems, but I really don't want to spend a ton of time on it precisely because it's the wrong way to do it. This is the exact reason why VMODs were created.

You can see the standard strtoupper/strtolower implementation is significantly different: it reserves space from the workspace, copies to the workspace buffer, and then releases the space it didn't use.

(P.S. I got rid of the undefined behavior comments because I realized that the tolower(3) manpage I was quoting meant that the input must be representable in an unsigned char. This is because tolower(3) takes an integer argument; the value you pass could fall out of range. So that was bad information, and I've retracted that.)

这篇关于清漆 C VRT 变量/函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆