在XLSX格式的情况下,是否可以将字节数组编码为可读字符串? [英] Is it possible to encode byte array into readable string in case of XLSX format?

查看:71
本文介绍了在XLSX格式的情况下,是否可以将字节数组编码为可读字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字节数组,它代表XLS或XLSX文件,我需要使用正则表达式在其中找到一些字符串。很容易将XLS编码为字符串并执行我需要的操作,但XLSX文件采用压缩xml格式,因此编码没有任何好处 - 字符串是不可读的。有没有办法提取/读取XLSX字节数组?我不想使用Interop,保存Excel文件和打开工作簿。有任何想法吗?



P.S.只是为了清楚 - 我从InfoPath附件得到一个字节数组9file内容)

I have a byte array, which represents either XLS or XLSX file, and I need to find some string(s) in it using regex. It is easy to encode XLS into string and do what I need but XLSX files are in zipped xml format, so encoding doesn't do any good - string is unreadable. Is there any way to extract/read "XLSX" byte arrays? I don't want to use Interop, save Excel files an and open workbooks. Any ideas?

P.S. Just to make it clear - I get a byte array 9file content) from InfoPath attachments

推荐答案

更新的解决方案(在OP澄清之后):



如果您想从Infopath文档中读取附件,可以使用以下方法: http: //support.microsoft.com/kb/2517906 。代码包含编码器和解码器。



解码器为您提供附件的实际字节[]。然后是byte [] - > MemoryStream - >解压缩(使用DeflateStream或ZipArchive - >然后从存档中获取XML文件。



首先从infopath表单获取附件数据为字节数组:

Updated solution (after clarification by OP):

If you want to read in the attachment from an Infopath document, you can use this method: http://support.microsoft.com/kb/2517906. The code contains both an Encoder and Decoder.

The Decoder gives you the actual byte[] for the attachment. Then byte[] -> MemoryStream -> Unzip (using DeflateStream or ZipArchive -> then you get your XML files out of your archive.

Start by getting the attachment data from the infopath form as byte array:
XPathDocument document = new XPathDocument(@"infopathform.xml");
XPathNavigator navigator = document.CreateNavigator();

//Get the my namespace URI
navigator.MoveToFollowing(XPathNodeType.Element);
string myNamespace = navigator.GetNamespacesInScope(XmlNamespaceScope.All)["my"];

//Create an XmlNamespaceManager
XmlNamespaceManager ns = new XmlNamespaceManager(new NameTable());
ns.AddNamespace("my", myNamespace);

//Create an XPathNavigator object for the attachment node
XPathNavigator xnAttNode = navigator.SelectSingleNode("//my:Attachments", ns);

//Get the encoded value as string
string encodedAttachment = xnAttNode.InnerXml;

byte[] theData = Convert.FromBase64String(encodedAttachment);



Next ,获取实际的附件数据,因为原始数据包括文件名等:


Next, get the actual attachment data, because the raw data includes filename etc.:

using (MemoryStream ms = new MemoryStream(theData))
{
    BinaryReader theReader = new BinaryReader(ms);

    //Position the reader to obtain the file size.
    byte[] headerData = new byte[16];
    headerData = theReader.ReadBytes(headerData.Length);

    int fileSize = (int)theReader.ReadUInt32();
    int attachmentNameLength = (int)theReader.ReadUInt32() * 2;

    byte[] fileNameBytes = theReader.ReadBytes(attachmentNameLength);
    //InfoPath uses UTF8 encoding.
    Encoding enc = Encoding.Unicode;
    string attachmentName = enc.GetString(fileNameBytes, 0, attachmentNameLength - 2);
    byte[] decodedAttachment = theReader.ReadBytes(fileSize);
}



获得byte []后,将其放入流中,然后您可以使用以下任何方法提取XML文件。< br $>


.NET 4.5 - 参考System.IO.Packaging。


Once you get your byte[], put it into a stream, and you can use any of the following methods to extract your XML files.

.NET 4.5 - reference System.IO.Packaging.

using (MemoryStream memoryStream = new MemoryStream(decodedAttachment))
{
    ZipArchive zip = new ZipArchive(memoryStream, ZipArchiveMode.Read);
    foreach (var entry in zip.Entries)
    {
        //do something with each XML file
        var sw = new StreamReader(entry.Open());
        string xmlString = sw.ReadToEnd();
    }
}



或pre -.NET 4.5 - 参考WindowsBase:


Or pre-.NET 4.5 - reference WindowsBase:

using (MemoryStream memoryStream = new MemoryStream(decodedAttachment))
{
    ZipPackage p = (ZipPackage) ZipPackage.Open(memoryStream);
    foreach (var part in p.GetParts())
    {
        var s = new StreamReader(part.GetStream());
        string xmlString = s.ReadToEnd();
    }
}



或使用OpenXML SDK:


Or using the OpenXML SDK:

using (MemoryStream memoryStream = new MemoryStream(decodedAttachment))
{
    SpreadsheetDocument document = SpreadsheetDocument.Open(memoryStream, false);
    foreach (var worksheetPart in document.WorkbookPart.WorksheetParts)
    {
        //do something with all worksheets
    }
}


如果XLSX是OpenXML格式,您可以使用以下方法之一:



1.使用OpenXML SDK( http://msdn.microsoft.com/en-us/library/office/bb448854(v = office.15).aspx [ ^ ])



2.使用像ExcelPackage这样的开源工具: HTTP://excelpackage.codeplex.c om / wikipage?title =阅读%20data%20from%20an%20Excel%20spreadsheet& referTitle = Home [ ^ ]



3. XLSX是一组XML文件。使用System.Io.Compression.ZipArchive在.NET中使用内置的归档支持,您可以轻松访问其中的XML文件。这篇文章:读写Open XML文件(MS Office 2007) [ ^ ]包含有关如何从XSLX中提取XML文件并使用简单的XmlReader读取它们的信息。
If the XLSX are OpenXML format, you can use one of the following methods:

1. Use the OpenXML SDK (http://msdn.microsoft.com/en-us/library/office/bb448854(v=office.15).aspx[^])

2. Use an open source tool like ExcelPackage: http://excelpackage.codeplex.com/wikipage?title=Reading%20data%20from%20an%20Excel%20spreadsheet&referringTitle=Home[^]

3. The XLSX is a collection of XML files. Using built-in support for archives in .NET using System.Io.Compression.ZipArchive, you can easily access the XML files in there. This post: Read and write Open XML files (MS Office 2007)[^] contains information on how to extract the XML files from a XSLX and read them using a simple XmlReader.


这篇关于在XLSX格式的情况下,是否可以将字节数组编码为可读字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆