解码算法需要 [英] decoding algorithm wanted

查看：223 发布时间：2017/8/17 22:58:27 algorithm encryption decode

本文介绍了解码算法需要的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我经常收到编码的PDF文件。编码工作原理如下：

PDF可以在Acrobat Reader中正确显示

选择所有并通过Acrobat Reader复制测试

并粘贴到文本编辑器中

将显示内容已编码

所以，例子是：

  13579  - > 3579; 
 hello  - > jgnnq

它基本上是一个ASCII字符的偏移（也许交换）。

问题是当我只访问几个样本时，如何自动找到偏移量。我无法确定编码偏移是否改变。我所知道的一些文字通常（如果不是总是）出现，例如

谢谢！

编辑：感谢您的反馈。我会尝试将问题分解成较小的问题：

第1部分：如何检测字符串内的相同部分？

解决方案

p>你需要强制它。

如果这些模式是简单的，像你的例子中的+2个字符代码（这是+2个char代码）

  hij 
efg 
lmn 
lmn 
opq 
 
 1 2 3 
 3 4 5 
 5 6 7 
 7 8 9 
 9：

您可以轻松实现这一点，以检查已知字词

 >>> text ='jgnnq'
>>> knowns = ['hello'，'13579'] 
>>> 
>>>对于我在范围（-5，+ 5）：#check -5到+5 char代码范围
 ... rot =''。连接（chr（ord（j）+ i）for j） 
 ... for x in knowns：
 ... if x in rot：
 ... print rot 
 ... 
 hello

I receive encoded PDF files regularly. The encoding works like this:

the PDFs can be displayed correctly in Acrobat Reader
select all and copy the test via Acrobat Reader
and paste in a text editor
will show that the content are encoded

so, examples are:

13579 -> 3579;
hello -> jgnnq

it's basically an offset (maybe swap) of ASCII characters.

The question is how can I find the offset automatically when I have access to only a few samples. I cannot be sure whether the encoding offset is changed. All I know is some text will usually (if not always) show up, e.g. "Name:", "Summary:", "Total:", inside the PDF.

Thank you!

edit: thanks for the feedback. I'd try to break the question into smaller questions:

Part 1: How to detect identical part(s) inside string?

解决方案

You need to brute-force it.

If those patterns are simple like +2 character code like in your examples (which is +2 char codes)

h i j
e f g
l m n
l m n
o p q

1 2 3
3 4 5
5 6 7
7 8 9
9 : ;

You could easily implement like this to check against knowns words

>>> text='jgnnq'
>>> knowns=['hello', '13579']
>>>
>>> for i in range(-5,+5): #check -5 to +5 char code range
...     rot=''.join(chr(ord(j)+i) for j in text)
...     for x in knowns:
...         if x in rot:
...             print rot
...
hello

这篇关于解码算法需要的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

解码算法需要 [英] decoding algorithm wanted

问题描述

相关文章

开发方法最新文章

热门教程

热门工具

登录关闭

解码算法需要 [英] decoding algorithm wanted

问题描述

相关文章

开发方法最新文章

热门教程

热门工具

登录 关闭

登录关闭