对二进制字典文件进行逆向工程以提取字符串 [英] Reverse engineer a binary dictionary file to extract strings
问题描述
我有一个约 600MB 的 .DAT 文件,其中包含一本意大利语词典(带有定义的重音词).
我想从这个文件中提取所有的字符串(一个包含字符串和脏头/二进制数据的原始转储就可以了,只要我能读懂单词和定义).
所以我的问题是:是否有可以自动执行此操作的软件?
我会告诉它:'我知道这个文件包含字符串TREE"、DOG"、CAT"、COLLISION"……现在使用一些蛮力、统计分析或任何方法来尝试找出这些字符串的编码方式'>
我想提两件事:
- 我是软件开发人员,但在逆向工程、十六进制编辑等方面完全没有经验或知识...
- 我不想花费数小时阅读逆向工程教程并使用许多软件进行试验和错误.如果我不能以简单的方式成功提取出我需要的东西,我就会放弃这个任务.
我意识到很可能(例如,如果文本已加密)无法简单地执行此任务,我只想使用可用的最佳工具尝试一下.
这样的自动化工具似乎不存在,如果存在,它也只能用于非常小的输入文件集.
我终于找到了解决问题的方法.
我有一个 EXE 程序,可以浏览字典并显示单词的定义.
使用 AutoHotkey,我编写了一个相对简单的脚本,从 40 万字的输入列表中搜索每个字的定义,将其复制到剪贴板,然后将其粘贴到另一个输出文本文件中.
我不得不在按键、窗口切换等之间插入一些 Sleep
语句以使脚本稳定.解析"整个字典的估计时间:20 天 :)
I have a ~600MB .DAT file that contains an italian dictionary (accented words with their definitions).
I would like to extract all the strings from this file (a raw dump containing strings and dirty headers/binary data would be all right as long as I can read the words and definitions).
So my question is: Is there a software that could do this in an automated way?
I would tell it: 'I know that this file contains the strings "TREE", "DOG", "CAT", "COLLISION"... now use some brute force, statistical analysis or whatever method to try and find how these strings are encoded'
2 things I'd like to mention:
- I am software developer but have absolutely no experience or knowledge in reverse engineering, hex editing etc...
- I do not want to spend hours reading reverse engineering tutorials and doing trial and error using many sofwares. If I don't succeed in extracting what I need in a simple manner, I'll just abandon this task.
I realize that it's probable (if the text is encrypted for instance) that this task could not be performed simply, I just want to give it a try with the best tool available.
It seems that such an automated tool does not exist, of if it did, it would only work for a very small set of input files.
I finally found a solution to my problem.
I have an EXE program that allows browsing the dictionary and displaying the definition of a word.
Using AutoHotkey, I wrote a relatively simple script that searches the definition of every word from a 400k words input list, copies it to the clipboard, then pastes it in another output text file.
I had to insert some Sleep
statements between the keystrokes, window switching etc. to make the script stable.
Estimated time to "parse" the whole dictionary: 20 days :)
这篇关于对二进制字典文件进行逆向工程以提取字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!