解码算法需要 [英] decoding algorithm wanted

查看:223
本文介绍了解码算法需要的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常收到编码的PDF文件。编码工作原理如下:




  • PDF可以在Acrobat Reader中正确显示

  • 选择所有并通过Acrobat Reader复制测试

  • 并粘贴到文本编辑器中

  • 将显示内容已编码



所以,例子是:

  13579  - > 3579; 
hello - > jgnnq

它基本上是一个ASCII字符的偏移(也许交换)。



问题是当我只访问几个样本时,如何自动找到偏移量。我无法确定编码偏移是否改变。我所知道的一些文字通常(如果不是总是)出现,例如



谢谢!



编辑:感谢您的反馈。我会尝试将问题分解成较小的问题:



第1部分:如何检测字符串内的相同部分?

解决方案

p>你需要强制它。



如果这些模式是简单的,像你的例子中的+2个字符代码(这是+2个char代码)

  hij 
efg
lmn
lmn
opq

1 2 3
3 4 5
5 6 7
7 8 9
9:

您可以轻松实现这一点,以检查已知字词

 >>> text ='jgnnq'
>>> knowns = ['hello','13579']
>>>
>>>对于我在范围(-5,+ 5):#check -5到+5 char代码范围
... rot =''。连接(chr(ord(j)+ i)for j)
... for x in knowns:
... if x in rot:
... print rot
...
hello


I receive encoded PDF files regularly. The encoding works like this:

  • the PDFs can be displayed correctly in Acrobat Reader
  • select all and copy the test via Acrobat Reader
  • and paste in a text editor
  • will show that the content are encoded

so, examples are:

13579 -> 3579;
hello -> jgnnq

it's basically an offset (maybe swap) of ASCII characters.

The question is how can I find the offset automatically when I have access to only a few samples. I cannot be sure whether the encoding offset is changed. All I know is some text will usually (if not always) show up, e.g. "Name:", "Summary:", "Total:", inside the PDF.

Thank you!

edit: thanks for the feedback. I'd try to break the question into smaller questions:

Part 1: How to detect identical part(s) inside string?

解决方案

You need to brute-force it.

If those patterns are simple like +2 character code like in your examples (which is +2 char codes)

h i j
e f g
l m n
l m n
o p q

1 2 3
3 4 5
5 6 7
7 8 9
9 : ;

You could easily implement like this to check against knowns words

>>> text='jgnnq'
>>> knowns=['hello', '13579']
>>>
>>> for i in range(-5,+5): #check -5 to +5 char code range
...     rot=''.join(chr(ord(j)+i) for j in text)
...     for x in knowns:
...         if x in rot:
...             print rot
...
hello

这篇关于解码算法需要的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆