你怎么能反向工程二进制文件节俭? [英] How can you reverse engineer a binary thrift file?

查看:163
本文介绍了你怎么能反向工程二进制文件节俭?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我被要求来处理一些文件序列化为二进制文件(而不是文本/ JSON不幸)节俭对象,但我没有访问创建该文件的程序或程序员,所以我没有它们的结构,字段顺序等的想法是否有使用储蓄库,打开一个二进制文件,并对其进行分析的方法,渐渐的字段类型,价值观,排料等的列表。

I've been asked to process some files serialized as binary (not text/JSON unfortunately) Thrift objects, but I don't have access to the program or programmer that created the files, so I have no idea of their structure, field order, etc. Is there a way using the Thrift libraries to open a binary file and analyze it, getting a list of the field types, values, nesting, etc.?

推荐答案

不幸的是,似乎节俭的二进制协议不会做所有的数据非常多标签;脱code它似乎假定你手中有.thrift文件,所以你知道,比方说,在接下来的4个字节应该是一个整数,而实际上不是一个浮动的前半部分。所以会出现你被卡住,基本上,看着十六进制编辑器(或同等学历)的文件,并试图根据您所看到的确切模式来推断领域。

Unfortunately it appears that Thrift's binary protocol does not do very much tagging of data at all; to decode it appears to assume you have the .thrift file in hand so you know, say, the next 4 bytes are supposed to be an integer, and aren't actually the first half of a float. So it appears you are stuck with, basically, looking at the files in a hex editor (or equivalent) and trying to deduce fields based on the exact patterns you're seeing.

有一个非常有用的几个位:

There are a very few helpful bits:

每个文件以一个版本,协议标识符字符串和序列号。地图将与确定键和值类型(前两个字节,整数codeS)加元素为4字节整数数量6个字节开始。类型codeS似乎是标准的(其定义的规范的位置似乎是TProtocol.h在旧货来源,例如通过输入code 2,UTF-8字符串指定按类型$一个布尔值C $ C 16,等等)。字符串是由一个4字节整数长度字段pfixed $ P $,并列出由式(1字节)pfixed $ P $和一个4字节长度。它看起来像所有的整数字段都保存大端和浮动点都保存在IEEE格式(这应该双打比较容易找到,至少)。

Each file begins with a version, protocol identifier string, and sequence number. Maps will begin with 6 bytes that identify the key and value types (first two bytes, as integer codes) plus the number of elements as a 4 byte integer. The type codes appear to be standard (the canonical location of their definitions seems to be TProtocol.h in the Thrift sources, for instance a boolean value is specified by type code 2, UTF-8 string by type code 16, and so on). Strings are prefixed by a 4 byte integer length field, and lists are prefixed by the type (1 byte) and a 4 byte length. It looks like all integer fields are saved big-endian, and floating points are saved in IEEE format (which should make doubles relatively easy to find, at least).

该TBinaryProtocol *在节俭文件有一些更多的帮助信息;在好的方面,也有一些不同的实现,所以你可以阅读你最舒服的语言来实现的人。

The TBinaryProtocol* files in Thrift have a few more helpful details; on the plus side, there are a number of different implementations so you can read the ones implemented in the language you are most comfortable with.

对不起,我知道这可能不是有益的,但​​它确实出现,这是所有的节俭二进制格式提供的信息;明确了二进制格式的设计意图,你总是知道确切的协议规范已经和我们的目标是最小化线空间,而不是让它在所有容易脱code盲目。

Sorry, I know this probably isn't that helpful but it really does appear this is all the information the Thrift binary format provides; clearly the binary format was designed with the intent that you would always know the exact protocol spec already, and that the goal was the minimize wire space, rather than make it at all easy to decode blindly.

这篇关于你怎么能反向工程二进制文件节俭?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆