如何在目标c上区分NSData是xls,ppt还是doc [英] How to differentiate if NSData is xls, ppt or doc on objective c

查看:166
本文介绍了如何在目标c上区分NSData是xls,ppt还是doc的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理文件处理类型的应用,我最近遇到了一个错误,该错误是由没有这样的文件扩展名的链接引起的:

I'm working on a File-Handling type of app, I recently encountered a bug that is caused by links that doesn't have a file extension like this:

https://drive.google.com/uc?export=download&id=1234567abcdefghijk

我一直在文件类型的基础上,文件名位于链接的末尾,这是该文件的直接链接。

I've been basing the file type by the filename located at the end of the link which is the direct link to the file.

如果是重定向链接,例如上面的google drive链接,它仍会返回数据,但问题是因为它没有文件扩展名, UIWebView 不会不渲染文件的文档类型(我对图像类型使用不同的查看器,它渲染得很好,因为你可以直接将数据传递给 UIImage )。

In the case of a redirecting link like the google drive link above, it still returns the data but the problem is since it doesn't have a file extension, the UIWebView doesn't render the document types of file (I use a different viewer for image types and it renders quite fine because you can pass the data directly to a UIImage).

我想出的解决方案是检查文件签名,您可以在数据的前1024个字节中找到它。我在 http://www.filesignatures.net/index中找到了文档类型的文件签名。 php

The solution I came up with was to check for File Signature which you can find in the first 1024 bytes of the data. I found the file signatures for document types in http://www.filesignatures.net/index.php.

我可以区分图像和pdf类型的文件,但问题是xls / ppt / doc和xlsx / pptx / docx因为它们有相同的文件签名, [D0 CF 11 E0 A1 B1 1A E1] [50 4B 03 04]

I can differentiate the images and pdf type of files but the problem is the xls/ppt/doc and xlsx/pptx/docx because they have the same file signatures, [D0 CF 11 E0 A1 B1 1A E1] and [50 4B 03 04] respectively.

现在我想知道的是,是否有其他方法可以区分这些Microsoft Office文档文件。

Now what I want to know is if there are other ways to differentiate those Microsoft Office document files.

这是我已经完成的代码,如果你知道如何增强这个功能,我会接受一些解释:

This is the code that I've already done, if you know how to enhance this function, I would accept it with some explanation:

typedef enum FileSignature {
    kFileSignaturePDF,
    kFileSignaturePPT_DOC_XLS,
    kFileSignaturePPTX_DOCX_XLSX,
    kFileSignaturePNG,
    kFileSignatureJPG,
    kFileSignatureBMP,
    kFileSignatureUndefined,
}FileSignature;

+ (FileSignature) getDocumentTypeOfData:(NSData *)documentData {

    if ( documentData.length >= 1024 ) {
        const unsigned char pdfBytes[] = {0x25, 0x50, 0x44, 0x46};
        const unsigned char jpgBytes[] = {0xFF, 0xD8, 0xFF, 0xE0};
        const unsigned char pngBytes[] = {0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A};
        const unsigned char bmpBytes[] = {0x42, 0x4D};
        // pptx,xlsx,docx
        const unsigned char msOfficeXBytes[] = {0x50, 0x4B, 0x03, 0x04};
        // ppt,xls,doc
        const unsigned char msOfficeBytes[] = {0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1};

        NSString *pdfByteString = [[NSString alloc] initWithBytes:pdfBytes length:sizeof(pdfBytes) encoding:NSASCIIStringEncoding];
        NSString *jpgByteString = [[NSString alloc] initWithBytes:jpgBytes length:sizeof(jpgBytes) encoding:NSASCIIStringEncoding];
        NSString *pngByteString = [[NSString alloc] initWithBytes:pngBytes length:sizeof(pngBytes) encoding:NSASCIIStringEncoding];
        NSString *bmpByteString = [[NSString alloc] initWithBytes:bmpBytes length:sizeof(bmpBytes) encoding:NSASCIIStringEncoding];
        NSString *msOfficeXByteString = [[NSString alloc] initWithBytes:msOfficeXBytes length:sizeof(msOfficeXBytes) encoding:NSASCIIStringEncoding];
        NSString *msOfficeByteString = [[NSString alloc] initWithBytes:msOfficeBytes length:sizeof(msOfficeBytes) encoding:NSASCIIStringEncoding];

        NSArray *arrayOfBytesToSearchFor = [[NSArray alloc] initWithObjects:pdfByteString,jpgByteString,pngByteString,bmpByteString, msOfficeByteString, msOfficeXByteString, nil];

        NSString *foundByteString = NULL;

        for (NSString *byteString in arrayOfBytesToSearchFor) {
            const unsigned char *searchForByte = (const unsigned char *) [byteString cStringUsingEncoding:NSASCIIStringEncoding];

            NSData *searchForByteData = [NSData dataWithBytes:searchForByte length:sizeof(searchForByte)];
            NSRange foundRange = [documentData rangeOfData:searchForByteData options:NSDataSearchAnchored range:NSMakeRange(0, 1024)];

            if (foundRange.length > 0) {
                foundByteString = byteString;
                break;
            }
        }

        FileSignature fileType = kFileSignatureUndefined;

        int indexOfFoundByteString = [arrayOfBytesToSearchFor indexOfObject:foundByteString];

        switch (indexOfFoundByteString) {
            case 0:
                fileType = kFileSignaturePDF;
                break;
            case 1:
                fileType = kFileSignatureJPG;
                break;
            case 2:
                fileType = kFileSignaturePNG;
                break;
            case 3:
                fileType = kFileSignatureBMP;
                break;
            case 4:
                fileType = kFileSignaturePPT_DOC_XLS;
                break;
            case 5:
                fileType = kFileSignaturePPTX_DOCX_XLSX;
                break;
            default:
                fileType = kFileSignatureUndefined;
                break;
        }

        return fileType;
    }

    return kFileSignatureUndefined;
}


推荐答案

我花了一会儿发帖这个,但是如果你使用 AFNetworking 2.0 那么我在响应标题中获取内容类型的想法我就失败了阻止你可以通过 operation.response.allHeaderFields 获取内容类型, allHeaderFields 也是<$ c的属性对于那些做手册 NSURLConnection 的人来说,$ c> NSHTTPURLResponse 。

Took me a while to post this, but I went down on trojanfoe's idea of getting the content-type in the response header, if you are using AFNetworking 2.0 then on the success block you can get the content-type by operation.response.allHeaderFields, allHeaderFields is also a property of NSHTTPURLResponse for those doing the manual NSURLConnection way.

如果可以的话在这方面做了一些改进,无论是优化还是在支持的文档列表中添加较少的代码或添加,我建议你发一个答案。

If you can do some improvements in this, be it optimization or lesser line of code or additions in the list of supported documents, I suggest you post an answer.

typedef enum DocumentType {
    kDocumentTypePDF,
    kDocumentTypePPT,
    kDocumentTypeDOC,
    kDocumentTypeXLS,
    kDocumentTypePPTX,
    kDocumentTypeDOCX,
    kDocumentTypeXLSX,
    kDocumentTypePNG,
    kDocumentTypeJPG,
    kDocumentTypeBMP,
    kDocumentTypeIMG,
    kDocumentTypeUndefined,
}DocumentType;

+ (DocumentType) getDocumentTypeBasedOnContentType:(NSString *)contentType {

    if ( [contentType isEqualToString:@"application/pdf"] ) {
        return kDocumentTypePDF;
    } else if ( [contentType isEqualToString:@"application/mspowerpoint"] ||
                [contentType isEqualToString:@"application/powerpoint"] ||
                [contentType isEqualToString:@"application/vnd.ms-powerpoint"] ||
                [contentType isEqualToString:@"application/x-mspowerpoint"]) {
        return kDocumentTypePPT;
    } else if ( [contentType isEqualToString:@"application/msword"] ) {
        return kDocumentTypeDOC;
    } else if ( [contentType isEqualToString:@"application/excel"] ||
                [contentType isEqualToString:@"application/vnd.ms-excel"] ||
                [contentType isEqualToString:@"application/x-excel"] ||
                [contentType isEqualToString:@"application/x-msexcel"] ) {
        return kDocumentTypeXLS;
    }  else if ( [contentType isEqualToString:@"application/vnd.openxmlformats-officedocument.wordprocessingml.document"] ) {
        return kDocumentTypeDOCX;
    }  else if ( [contentType isEqualToString:@"application/vnd.openxmlformats-officedocument.presentationml.presentation"] ) {
        return kDocumentTypePPTX;
    }  else if ( [contentType isEqualToString:@"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"] ) {
        return kDocumentTypeXLSX;
    }   else if ( [contentType rangeOfString:@"image/"].location != NSNotFound ) {
        return kDocumentTypeIMG;
    } else {
        return kDocumentTypeUndefined;
    }

}

这篇关于如何在目标c上区分NSData是xls,ppt还是doc的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆