OpenXML的悬而写作的元素 [英] OpenXML hanging while writing elements

查看:199
本文介绍了OpenXML的悬而写作的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个程序,它基本上是从数据库中提取数据,缓存到一个文件,然后导出该数据为多种格式(Excel中,Excel 2003中,CSV)。我使用的OpenXML SDK 2.0做Excel的工作。这些出口进程并行运行(使用 Parallel.ForEach ),和数据量可以pretty的大 - 如一些CSV的是800MB。在这些较大的出口,我已经注意到,XML文档的写作将挂起。举例来说,如果我有8个出口并行,在某些时候,他们都将只是暂停。他们都挂在同一点:

  // this.Writer是从一个WorksheetPart创建的OpenXmlWriter。
this.Writer.WriteElement(新Cell()
{
    CellValue =新CellValue(值),
    数据类型= CellValues​​.String
});
 

在这种情况下,我暂停调试器(VS2013在这种情况下),注意所有线程阻塞围绕code相同的部分 - 有些是在OpenXML的SDK深一点 - 但他们从所有干来电 OpenXmlWriter.WriteElement

我挖通过使用JustDecompile源,但没有找到任何答案。似乎有一个中介流中使用被写入到分离的存储,这是,由于某种原因,阻挡。上述每个底层流是一个的FileStream

下面是截图显示所有(8在这种情况下),在阻断或 OpenXmlWriter.WriteElement 方法内并行任务:

完整堆栈这些挂起线程之一 - 与注释

<$p$p><$c$c>WindowsBase.dll!MS.Internal.IO.Packaging.PackagingUtilities.CreateUserScopedIsolatedStorageFileStreamWithRandomName正常 WindowsBase.dll!MS.Internal.IO.Packaging.PackagingUtilities.CreateUserScopedIsolatedStorageFileStreamWithRandomName(int retryCount,出字符串文件名) WindowsBase.dll中!MS.Internal.IO.Packaging.SparseMemoryStream.EnsureIsolatedStoreStream() // ---&GT;为什么我们写入独立存储呢? WindowsBase.dll中!MS.Internal.IO.Packaging.SparseMemoryStream.SwitchModeIfNecessary() WindowsBase.dll中!MS.Internal.IO.Zip.ZipIOFileItemStream.Write(byte []的缓冲区,诠释抵消,诠释计数) System.dll中!System.IO.Com pression.DeflateStream.WriteDeflaterOutput(布尔isAsync) System.dll中!System.IO.Com pression.DeflateStream.Write(byte []数组,诠释抵消,诠释计数) WindowsBase.dll中!MS.Internal.IO.Packaging.Com pressStream.Write(byte []的缓冲区,诠释抵消,诠释计数) WindowsBase.dll中!MS.Internal.IO.Zip.ProgressiveCrcCalculatingStream.Write(byte []的缓冲区,诠释抵消,诠释计数) WindowsBase.dll中!MS.Internal.IO.Zip.ZipIOModeEnforcingStream.Write(byte []的缓冲区,诠释抵消,诠释计数) system.xml.dll的!System.Xml.XmlUtf8RawTextWriter.FlushBuffer() system.xml.dll的!System.Xml.XmlUtf8RawTextWriter.WriteAttributeTextBlock(字符* PSRC,字符* pSrcEnd) system.xml.dll的!System.Xml.XmlUtf8RawTextWriter.WriteString(文本字符串) system.xml.dll的!System.Xml.XmlWellFormedWriter.WriteString(文本字符串) DocumentFormat.OpenXml.dll!DocumentFormat.OpenXml.OpenXmlElement.WriteAttributesTo(System.Xml.XmlWriter XMLWriter的) DocumentFormat.OpenXml.dll!DocumentFormat.OpenXml.OpenXmlElement.WriteTo(System.Xml.XmlWriter的XmlWriter) DocumentFormat.OpenXml.dll!DocumentFormat.OpenXml.OpenXmlPartWriter.WriteElement(DocumentFormat.OpenXml.OpenXmlElement elementObject) // ---&GT;在这点上,线程似乎被阻断。 MyProject.Common.dll!MyProject.Common.Export.ExcelWriter.WriteLine(字符串[]值)117线

还有一件事值得一提的是,虽然有8件事(在这种情况下)被出口时,每个单独的出口被写入许多文件串联。例如,一个给定的出口可具有150底层文件被输出到 - 输入数据被分段并且仅一部分被写入到每个文件。基本上,我缓存从数据库批量数据,然后读出的线和将其推(串联 - 一个接一个)的流,它们应该包括这个数据。的一点是,如果有8个出口跑,有可能是,也许,被写入1000个文件太多,但只有8积极地写在任何特定时间

解决方案

我知道,问题是pretty的老了,但是这与的OpenXML IsolatedFileStorage已知的Microsoft问题。你可以阅读有关的解决方法在这里 http://support.microsoft.com/kb/951731

该IsolatedStorageFile类不是线程安全的,IsolatedStorageFile是静态的,所有的PackagePart对象之间共享。因此,当使用IsolatedStorageFile对象缓冲数据多的PackagePart流访问的写作(包括冲洗为好),在IsolatedStorageFile类线程安全问题被曝光,引起死锁。

的基本思想是将包裹的PackagePart流并锁定写入它。 他们指出了包裹流的一个例子。这里是实现:

 公共类PackagePartStream:流
{
    私人只读流_stream;

    私人静态只读互斥互斥=新的mutex(假);

    公共PackagePartStream(流流)
    {
        _stream =流;
    }

    公众覆盖寻求长(长偏移,SeekOrigin原点)
    {
        返回_stream.Seek(偏移,产地);
    }

    公众覆盖无效SetLength(long值)
    {
        _stream.SetLength(值);
    }

    公众覆盖INT读(byte []的缓冲区,诠释抵消,诠释计数)
    {
        返回_stream.Read(缓冲区,偏移,计数);
    }

    公共覆盖无效写入(byte []的缓冲区,诠释抵消,诠释计数)
    {
        Mutex.WaitOne(Timeout.Infinite,假);
        _stream.Write(缓冲区,偏移,计数);
        Mutex.ReleaseMutex();
    }

    公众覆盖布尔的CanRead
    {
        {返回_stream.CanRead; }
    }

    公众覆盖布尔CanSeek
    {
        {返回_stream.CanSeek; }
    }

    公众覆盖布尔CanWrite
    {
        {返回_stream.CanWrite; }
    }

    公众覆盖长的长度
    {
        {返回_stream.Length; }
    }

    公众覆盖多头头寸
    {
        {返回_stream.Position; }
        集合{_stream.Position =价值; }
    }

    公众覆盖无效的同花顺()
    {
        Mutex.WaitOne(Timeout.Infinite,假);
        _stream.Flush();
        Mutex.ReleaseMutex();
    }

    公众覆盖无效关闭()
    {
        _stream.Close();
    }

    保护覆盖无效的Dispose(BOOL处置)
    {
        _stream.Dispose();
    }
}
 

和用法示例的:

  VAR worksheetPart = document.WorkbookPart.AddNewPart&LT; WorksheetPart&GT;();
变种workSheetWriter = OpenXmlWriter.Create(新PackagePartStream(worksheetPart.GetStream()));
workSheetWriter.WriteStartElement(新工作表());
//你的code却会在这里?
 

I have a program which basically pulls data from a database, caches it to a file and then exports that data to multiple formats (Excel, Excel 2003, CSV). I'm using the OpenXML SDK 2.0 to do the Excel work. These export processes are run in parallel (using Parallel.ForEach), and the amount of data can be pretty large - e.g. some CSVs are 800MB. During these larger exports, I've noticed that the writing of the XML documents will hang. For instance, if I have 8 exporting in parallel, at some point they will all just "pause". They all hang around the same point:

//this.Writer is an OpenXmlWriter which was created from a WorksheetPart.
this.Writer.WriteElement(new Cell()
{
    CellValue = new CellValue(value),
    DataType = CellValues.String
});

When this happens, I pause the debugger (VS2013 in this case) and notice that all threads are blocking around the same portion of code - some are a bit deeper in the OpenXML SDK - but they all stem from the call to OpenXmlWriter.WriteElement.

I dug through the source using JustDecompile but didn't find any answers. It appears that there is an intermediary stream in use which is writing to isolated storage and this is, for some reason, blocking. The underlying stream for each of these is a FileStream.

Here is a screenshot showing all (8 in this case) parallel tasks blocked at or inside the OpenXmlWriter.WriteElement method:

Complete Stack for one of these hung threads - with annotations.

WindowsBase.dll!MS.Internal.IO.Packaging.PackagingUtilities.CreateUserScopedIsolatedStorageFileStreamWithRandomName Normal
WindowsBase.dll!MS.Internal.IO.Packaging.PackagingUtilities.CreateUserScopedIsolatedStorageFileStreamWithRandomName(int retryCount, out string fileName)     
WindowsBase.dll!MS.Internal.IO.Packaging.SparseMemoryStream.EnsureIsolatedStoreStream()  

//---> Why are we writing to isolated storage at all?
WindowsBase.dll!MS.Internal.IO.Packaging.SparseMemoryStream.SwitchModeIfNecessary()  
WindowsBase.dll!MS.Internal.IO.Zip.ZipIOFileItemStream.Write(byte[] buffer, int offset, int count)   
System.dll!System.IO.Compression.DeflateStream.WriteDeflaterOutput(bool isAsync)     
System.dll!System.IO.Compression.DeflateStream.Write(byte[] array, int offset, int count)    
WindowsBase.dll!MS.Internal.IO.Packaging.CompressStream.Write(byte[] buffer, int offset, int count)  
WindowsBase.dll!MS.Internal.IO.Zip.ProgressiveCrcCalculatingStream.Write(byte[] buffer, int offset, int count)   
WindowsBase.dll!MS.Internal.IO.Zip.ZipIOModeEnforcingStream.Write(byte[] buffer, int offset, int count)  
System.Xml.dll!System.Xml.XmlUtf8RawTextWriter.FlushBuffer()     
System.Xml.dll!System.Xml.XmlUtf8RawTextWriter.WriteAttributeTextBlock(char* pSrc, char* pSrcEnd)    
System.Xml.dll!System.Xml.XmlUtf8RawTextWriter.WriteString(string text)  
System.Xml.dll!System.Xml.XmlWellFormedWriter.WriteString(string text)   
DocumentFormat.OpenXml.dll!DocumentFormat.OpenXml.OpenXmlElement.WriteAttributesTo(System.Xml.XmlWriter xmlWriter)   
DocumentFormat.OpenXml.dll!DocumentFormat.OpenXml.OpenXmlElement.WriteTo(System.Xml.XmlWriter xmlWriter)     
DocumentFormat.OpenXml.dll!DocumentFormat.OpenXml.OpenXmlPartWriter.WriteElement(DocumentFormat.OpenXml.OpenXmlElement elementObject)   

//---> At this point, threads seem to be blocking. 
MyProject.Common.dll!MyProject.Common.Export.ExcelWriter.WriteLine(string[] values) Line 117

One more thing worth mentioning is that while there are 8 things (in this case) being exported at once, each individual exporter is writing to many files in series. For instance, a given export may have 150 underlying files it is exporting to - the input data is segmented and only a portion is written to each file. Basically, I cache the bulk data from the database, then read a line and push it (in series - one-by-one) to the streams which should include this data. The point is that if there are 8 exporters running, there could be, maybe, 1,000 files being written too but only 8 actively writing at any given time.

解决方案

I know that question is pretty old, but this is known Microsoft issue with OpenXml-IsolatedFileStorage. You can read about workaround here http://support.microsoft.com/kb/951731:

The IsolatedStorageFile class is not thread safe, IsolatedStorageFile is static and shared between all PackagePart objects. So when multiple PackagePart streams using IsolatedStorageFile objects to buffer data are accessed for writing (includes flushing as well), a thread safety problem in the IsolatedStorageFile class is exposed, causing a deadlock.

The basic idea is to wrap a stream of PackagePart and lock writing to it. They pointed an example with a wrapped stream. Here is implementation:

public class PackagePartStream : Stream
{
    private readonly Stream _stream;

    private static readonly Mutex Mutex = new Mutex(false);

    public PackagePartStream(Stream stream)
    {
        _stream = stream;
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        return _stream.Seek(offset, origin);
    }

    public override void SetLength(long value)
    {
        _stream.SetLength(value);
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        return _stream.Read(buffer, offset, count);
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        Mutex.WaitOne(Timeout.Infinite, false);
        _stream.Write(buffer, offset, count);
        Mutex.ReleaseMutex();
    }

    public override bool CanRead
    {
        get { return _stream.CanRead; }
    }

    public override bool CanSeek
    {
        get { return _stream.CanSeek; }
    }

    public override bool CanWrite
    {
        get { return _stream.CanWrite; }
    }

    public override long Length
    {
        get { return _stream.Length; }
    }

    public override long Position
    {
        get { return _stream.Position; }
        set { _stream.Position = value; }
    }

    public override void Flush()
    {
        Mutex.WaitOne(Timeout.Infinite, false);
        _stream.Flush();
        Mutex.ReleaseMutex();
    }

    public override void Close()
    {
        _stream.Close();
    }

    protected override void Dispose(bool disposing)
    {
        _stream.Dispose();
    }
}

And example of usage:

var worksheetPart = document.WorkbookPart.AddNewPart<WorksheetPart>();
var workSheetWriter = OpenXmlWriter.Create(new PackagePartStream(worksheetPart.GetStream()));
workSheetWriter.WriteStartElement(new Worksheet());
//rest of your code goes here ...

这篇关于OpenXML的悬而写作的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆