高效解码二进制和文本结构(数据包) [英] Efficient decoding of binary and text structures (packets)

查看:216
本文介绍了高效解码二进制和文本结构(数据包)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景



有一个名为。因此,有可能简化解码器创作仍然提供良好的性能。 AST(代表抽象语法树)是一种编译时技术。


当Groovy编译器编译Groovy脚本和类时,在某些
指向过程,源代码最终将以具体语法树的形式表示在
内存中,然后转换为
抽象语法树。 AST转换的目的是让
开发人员进入编译过程,以便在将
AST转换为将由JVM运行的字节码之前修改它。


我不想重新发明轮子,引入另一种语言来定义/描述协议结构(它足以让看起来非常好。





需要一些额外的移植,但绝对是值得的。


Background

There is a well-known tool called Wireshark. I've been using it for ages. It is great, but performance is the problem. Common usage scenario includes several data preparation steps in order to extract a data subset to be analyzed later. Without that step it takes minutes to do filtering (with big traces Wireshark is next to unusable).

The actual idea is to create a better solution, fast, parallel and efficient, to be used as a data aggregator/storage.

Requirements

The actual requirement is to use all power provided by modern hardware. I should say there is a room for different types of optimization and I hope I did a good job on upper layers, but technology is the main question right now. According to the current design there are several flavors of packet decoders (dissectors):

  • interactive decoders: decoding logic can be easily changed in runtime. Such approach can be quite useful for protocol developers -- decoding speed is not that critical, but flexibility and fast results are more important
  • embeddable decoders: can be used as a library.This type is supposed to have good performance and be flexible enough to use all available CPUs and cores
  • decoders as a service: can be accessed through a clean API. This type should provide best of the breed performance and efficiency

Results

My current solution is JVM-based decoders. The actual idea is to reuse the code, eliminate porting, etc, but still have good efficiency.

  • Interactive decoders: implemented on Groovy
  • Embeddable decoders: implemented on Java
  • Decoders as a service: Tomcat + optimizations + embeddable decoders wrapped into a servlet (binary in, XML out)

Problems to be solved

  • Groovy provides way to much power and everything, but lucks expressiveness in this particular case
  • Decoding protocol into a tree structure is a dead end -- too many resources are simply wasted
  • Memory consumption is somewhat hard to control. I did several optimizations but still not happy with profiling results
  • Tomcat with various bells and whistles still introduces to much overhead (mainly connection handling)

Am I doing right using JVM everywhere? Do you see any other good and elegant way to achieve the initial goal: get easy-to-write highly scalable and efficient protocol decoders?

The protocol, format of the results, etc are not fixed.

解决方案

I've found several possible improvements:

Interactive decoders

Groovy expressiveness can be greatly improved, by extending Groovy syntax using AST Transformations. So it would be possible to simplify decoders authoring still providing good performance. AST (stands for Abstract Syntax Tree) is a compile-time technique.

When the Groovy compiler compiles Groovy scripts and classes, at some point in the process, the source code will end up being represented in memory in the form of a Concrete Syntax Tree, then transformed into an Abstract Syntax Tree. The purpose of AST Transformations is to let developers hook into the compilation process to be able to modify the AST before it is turned into bytecode that will be run by the JVM.

I do not want to reinvent the wheel introducing yet another language to define/describe a protocol structure (it is enough to have ASN.1). The idea is to simplify decoders development in order to provide some fast prototyping technique. Basically, some kind of DSL is to be introduced.

Further reading

Embeddable decoders

Java can introduce some additional overhead. There are several libraries to address that issue:

Frankly speaking I do not see any other option except Java for this layer.

Decoders as a service

No Java is needed on this layer. Finally I have a good option to go but price is quite high. GWan looks really good.

Some additional porting will be required, but it is definitely worth it.

这篇关于高效解码二进制和文本结构(数据包)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆