这种二进制文件传输(损坏的docx文件)有什么问题? [英] what is wrong with this binary file transfer (corrupting docx files)?

查看:69
本文介绍了这种二进制文件传输(损坏的docx文件)有什么问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一个多星期以来,我一直在尝试解决此问题,并且可能会确实提供一些帮助.

I've been trying to resolve this issue for over a week and could really do with some help.

我们正在使用httprequest将文件发布到api.大多数文件可以正常运行,但是docx文件最终损坏了.

We are using a httprequest to post files to an api. Most files come out ok, but docx files end up corrupted.

经过大量研究,我很确定在二进制文件中添加了额外的数据/字节后,我做错了什么.

After much research I'm pretty sure that I'm doing something wrong in the binary post that is adding extra data / bytes to the file.

流正在关闭,我think我已经正确设置了边界和标题....

Streams are being closed and I think I've got the boundries and headers right....

下面的代码中是否有任何明显的错误?否则任何人都可以为我指出正确的方向以进行修复.为什么要向此文件添加额外的数据?是HTTP标头是问题,还是我错误地读取了流?造成我困境的最可能原因是什么?

Are there any obvious mistakes in the code below? Or would anybody be able to point me in the right direction for a fix. Why is extra data being added to this file? Are http headers the issue, or am I reading the stream incorrectly? What is the most likely cause of my woes?

((我试图检查docx文件中的额外数据以找出它们的来源.但是我无法做到这一点.那里有许多docx修复工具,但我碰到的都没有有关错误的信息,他们只是修复了文件.我已经尝试了用于Microsoft Office的Open XML SDK 2.0,但这不会打开损坏的文件,因此我无法将其与已修复的文件进行比较.)

(I have tried to examine the extra data in the docx file to find out where it's coming from. But I have been unable to do so. There are many docx repair tools out there, but none I've come across give information about the error, they just fix the file. I have tried the Open XML SDK 2.0 for Microsoft Office, but this won't open the corrupt file, so I can't compare it to a fixed one. )

代码:

Sub PostTheFile(CVFile, fullFilePath, PostToURL)

    strBoundary = "---------------------------9849436581144108930470211272"
    strRequestStart = "--" & strBoundary & vbCrlf &_
        "Content-Disposition: attachment; name=""file""; filename=""" & CVFile & """" & vbcrlf & vbcrlf
    strRequestEnd = vbCrLf & "--" & strBoundary & "--" 

    Set stream = Server.CreateObject("ADODB.Stream")
        stream.Type = adTypeBinary 
        stream.Mode = adModeReadWrite     
        stream.Open
        stream.Write StringToBinary(strRequestStart)
        stream.Write ReadBinaryFile(fullFilePath)
        stream.Write StringToBinary(strRequestEnd)
        stream.Position = 0
        BINARYPOST= stream.read
        stream.Close

    Set stream = Nothing    

    Set httpRequest = Server.CreateObject("MSXML2.ServerXMLHTTP.6.0")
        httpRequest.Open "PATCH", PostToURL, False, "username", "pw"
        httpRequest.setRequestHeader "Content-Type", "multipart/form-data; boundary=""" & strBoundary & """"
        httpRequest.Send BINARYPOST
        Response.write "httpRequest.status: " & httpRequest.status 
    Set httpRequest = Nothing   
End Sub


Function StringToBinary(input)
    dim stream
    set stream = Server.CreateObject("ADODB.Stream")
        stream.Charset = "UTF-8"
        stream.Type = adTypeText 
        stream.Mode = adModeReadWrite 
        stream.Open
        stream.WriteText input
        stream.Position = 0
        stream.Type = adTypeBinary 
        StringToBinary = stream.Read
        stream.Close
    set stream = Nothing
End Function

Function ReadBinaryFile(fullFilePath) 
    dim stream
    set stream = Server.CreateObject("ADODB.Stream")
        stream.Type = 1
        stream.Open()
        stream.LoadFromFile(fullFilePath)
        ReadBinaryFile = stream.Read()
        stream.Close
    set stream = nothing
end function  

链接到文件

以下是通过API之前和之后的文件链接.我让他们真的简单.

Here are links to the files before and after going through the API. I kept them really simple.

http://fresherandprosper.com/cvsamples/testcv.corrupted.docx

http://fresherandprosper.com/cvsamples/testcv.notcorrupted.docx

更新

在Edi9999的出色帮助下(见下文),我认为我的问题已经结束.我要做的就是弄清楚我是如何在代码中生成不需要的附加序列并将其删除的.

After Edi9999's fantastic help (see below) I thought my problems were over. All I had to do was figure out how I was generating the unwanted additional sequence in my code and remove it.

但是我似乎无法确定要从我的代码中删除的内容.没有任何效果像预期的那样.

But I couldn't seem to nail WHAT to remove from my code. Nothing worked as expected.

然后我意识到...每次发布文件时,结束顺序都略有不同.

Then I realised... each time I posted the file, the ending sequence came out slightly different.

0015 e88a 5060 0700 00da 3b00 000f 0000
0000 0000 0000 0000 0000 0060 1d00 0077
6f72 642f 7374 796c 6573 2e78 6d6c 504b
0506 0000 0000 0b00 0b00 

和完全相同的文件,使用30秒后发布的完全相同的代码:

And the exact same file, using the exact same code posted 30 seconds later:

0015 e88a 5060 0700 00da 3b00 000f 0000
0000 0000 0000 0000 0000 0060 1d00 0077
6f72 642f 7374 796c 6573 2e78 6d6c 504b
0506 0000 0000 0b00 0b00 c102 00

再次,几分钟后:

0015 e88a 5060 0700 00da 3b00 000f 0000
0000 0000 0000 0000 0000 0060 1d00 0077
6f72 642f 7374 796c 6573 2e78 6d6c 504b
0506 0000 0000 0b00 0b00 c102 0000 ed24

也许这值得一个新的问题.但是已经有大约6个与此问题相关的内容,因此我不愿意再添加一个.

Maybe this deserves a new question. But there's already about 6 relating to this issue so I'm reluctant to add yet another one.

推荐答案

以下是我尝试对您的docx进行的操作:

Here is what I tried to do with your docx:

  • 我用单词打开了它们,被损坏的人的确是腐败的
  • 我解压缩了文件,它们完全相同

我看着docx的大小,但是对于docx却有所不同.

I watched at the size of the docx, it was different for the docx.

所以我查看了二进制文件:文件的开头是相同的

So I looked into the binary file: The beginning of the file is identical

504b 0304 1400 0600 0800 0000 2100 ddfc
9537 6601 0000 2005 0000 1300 0802 5b43
6f6e 7465 6e74 5f54 7970 6573 5d2e 786d
6c20 a204 0228 a000 0200 0000 0000 0000

但到那时结束:

文件未损坏

6f72 642f 7374 796c 6573 2e78 6d6c 504b
0506 0000 0000 0b00 0b00 c102 0000 ed24
0000 0000 

损坏的文件

6f72 642f 7374 796c 6573 2e78 6d6c 504b
0506 0000 0000 0b00 0b00 c102 0000 ed24
0000 0000 0a2d 2d2d 2d2d 2d2d 2d2d 

如您所见,它们是一个序列:0a2d 2d2d 2d2d 2d2d 2d2d.该文件的其余部分是相同的.而且,当我删除此序列时,文件不再受到破坏.

As you can see, they is a sequence: 0a2d 2d2d 2d2d 2d2d 2d2d. The rest of the file is identical. And when I delete this sequence, the file is not corrupted any more.

转换为ascii的0a2d 2d2d 2d2d 2d2d 2d2d\n----

Converted into ascii, 0a2d 2d2d 2d2d 2d2d 2d2d is \n----

这可能是由于strRequestEnd = vbCrLf & "--" & strBoundary & "--"

Howewer,由于我不太了解您的代码中到底发生了什么,因此,如果您需要更多帮助,请更深入地解释这部分代码.

Howewer, as I don't really understand exactly what happens into your code, If you want more help, please explain more deeply this portion of code.

希望这对您有帮助

这篇关于这种二进制文件传输(损坏的docx文件)有什么问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆