提取PDF SWF文件 [英] Extract SWF file from PDF

查看:155
本文介绍了提取PDF SWF文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经实现了使用iTextSharp的加入SWF文件为PDF,我的问题是,是否有可能做逆向工程,例如,如果我给PDF作为输入,我不得不从它那里得到的SWF文件,如果是我怎么能做到这一点?

I have implemented adding swf files to pdf using iTextsharp, and my question is, is it possible to do the reverse engineering, for example if I give pdf as input, I have to get swf files from it, if yes how I can do that?

如何启动任何想法,将不胜AP preciated。

Any idea of how to start, would be greatly appreciated.

亲切的问候,

Raghu.M

推荐答案

这是一个工作的例子,在这里借此嵌入PDF(第一次,我发现):

This is a working example that takes this embedded pdf here (first one I found):

http://www.opf-labs.org/format-胼/ pdfCabinetOfHorrors / fileAttachment.pdf

和提取在这种情况下,嵌入的文件,一个KSBASE.WQ2文件

And extracts the embedded files, in this case a KSBASE.WQ2 file.

    public static void ExtractAttachments(String src, String dir)
    {

        PdfReader reader = new PdfReader(string.Format("{0}\\{1}", dir, src));
        PdfDictionary root = reader.Catalog;
        PdfDictionary names = root.GetAsDict(PdfName.NAMES);
        PdfDictionary embedded = names.GetAsDict(PdfName.EMBEDDEDFILES);
        PdfArray filespecs = embedded.GetAsArray(PdfName.NAMES);
        for (int i = 0; i < filespecs.Size; )
        {
            ExtractAttachment(reader, dir, filespecs.GetAsString(i++),
            filespecs.GetAsDict(i++));

        }
    }

    protected static void ExtractAttachment(PdfReader reader, string dir, PdfString name, PdfDictionary filespec)
    {
        PRStream stream;
        FileStream fos;
        String filename;
        PdfDictionary refs = filespec.GetAsDict(PdfName.EF);
        foreach(PdfName key in refs.Keys) {
            stream = (PRStream)PdfReader.GetPdfObject(refs.GetAsIndirectObject(key));
            filename = filespec.GetAsString(key).ToString();
            // here you can do an filename.Contains(".swf) check
            var fileBytes = PdfReader.GetStreamBytes(stream);
            File.WriteAllBytes(string.Format("{0}\\{1}", dir, filename), fileBytes);
            }
        }

如下你会叫这样的:

You would call this as follows:

var dir = "C:\\temp\\PdfExtract";
ExtractAttachments("fileAttachment.pdf", dir);

您可以简单地添加一个 filename.Contains(SWF)检查周围的文件名中提取了。

You can simply add a filename.Contains(".swf) check around the file names before extracting.

更新

好吧,这是我怎么会搞清楚,如果上述办法都无法正常工作。

Ok, this is how I would figure it out if the above approach did not work.

该文件必须位于目录内的不同地方,没有看到文件,这是我会怎么对待它。

The files must be located in a different place within the catalog, without seeing the file this is how I would approach it.

根解决后,再踏进去,看看我能找到那里的SWF文件是我想补充一个断点。

I would add a breakpoint after root is resolved then step into it to see if I could find where the swf files are.

如果你看看 root.Keys ,你会看到什么目录包含

If you look into root.Keys you will see what the Catalog contains.

中检索您可以使用传入 GetAsDict 方法的任何字典对象 PdfName 相匹配。

To retreive any dictionary objects you can use the GetAsDict method passing in a PdfName which matches.

下台的水平进一步的,你可以看到它包含 EmbeddedFiles 等等。

Stepping down a level futher you can see that it contains the EmbeddedFiles and so forth.

有几个 PdfName 的名字,甚至有一个闪光的。

There are several PdfName names, there is even a Flash one.

如任何文件的结构可以是不同的,将仅仅是调查的结构和使用的情况下,为了读出的文件的正确的参数对 GetAsDict

As the structure of any document can be different it will just be a case of investigating the structure and using the correct parameter's to GetAsDict in order to read the files.

这篇关于提取PDF SWF文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆