内存不足异常Ps Hlp [英] Out of memory exception Ps Hlp

查看:74
本文介绍了内存不足异常Ps Hlp的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,
我是一名测试人员,正在尝试使我的软件测试程序自动化.我必须执行以下测试.
1.打开两个不同的xml文件(具有html表结构;具有table,thead,tr,td标签)
2.比较它们(我们正在使用第三方文本比较工具)
3.找到差异
下面例程的目的是将具有数据的XML文件转换为数据表,然后以编程方式比较两者.现在,我的整个应用程序已准备就绪,可以很好地处理小型XML文件.但是,对于150MB或更大的磁盘而言,不是这样.它给出了mscorlib.dll中发生的``System.OutOfMemoryException''
在完成数据表中第一个XML文件的转换后,将要在数据表中转换的文件首先转换为150 MB的字符串时,会发生上述错误.
我的直觉告诉我们下面的这段代码只会泄漏内存.
伙计们,请保存我的生命,并从我的生活中删除此例外.

Hi All,
I am a tester and trying to automate my s/w testing procedure. I have to do the following to test.
1. Open the two different xml files (which has html table structure; it has table, thead, tr, td tags)
2. Compare them (We are currently using Third Party Text comparison tool)
3. Find the difference
The purpose of below routine is to convert the XML file which has data to data table and then compare both programatically. Now my whole application is ready and working fine for small XML files. However not for larger, 150MB or more. It gives ''System.OutOfMemoryException'' occurred in mscorlib.dll
The said error occurs when the file which is being converted in datatable is first converted to string which is 150 MB, after the completion of conversion of first XML file in datatable.
My gut feeling tells that this code below is only leaking memory.
Guys, Please save my life and remove this exception out of my life.

Private Function ConvertToDataTable(ByVal XMLString As String) As DataTable
        Dim dt As DataTable
        Dim dr As DataRow
        Dim dc As DataColumn
        Dim TableExpression As String = "<table[^>]*>(.*?)</table>"
        Dim HeaderExpression As String = "<th[^>]*>(.*?)</th>"
        Dim RowExpression As String = "<tr[^>]*>(.*?)</tr>"
        Dim ColumnExpression As String = "<td[^>]*>(.*?)</td>"
        Dim HeadersExist As Boolean = False
        Dim iCurrentColumn As Integer = 0
        Dim iCurrentRow As Integer = 0
        Dim data As String
        Dim substr1 As String
        Dim substr2 As String
        Dim substr3 As String
        Dim counter As Integer = 0
        Dim str1 As String
        Dim s As String
        Dim Tables As MatchCollection
        Dim Headers As MatchCollection
        Dim str() As Char
        Try
            ' Get a match for all the tables in the HTML
            Tables = Regex.Matches(XMLString, TableExpression, RegexOptions.Multiline Or RegexOptions.Singleline Or RegexOptions.IgnoreCase Or RegexOptions.IgnorePatternWhitespace)
            ' Loop through each table element
            For Each Table As Match In Tables
                ' Reset the current row counter and the header flag
                iCurrentRow = 0
                HeadersExist = False
                ' Add a new table to the DataSet
                dt = New DataTable
                ' Create the relevant amount of columns for this table (use the headers if they exist, otherwise use default names)
                Dim pattern As String = "*<th*"
                If Table.Value.ToString Like pattern Then
                    ' Set the HeadersExist flag
                    HeadersExist = True
                    ' Get a match for all the rows in the table
                    Headers = Regex.Matches(Table.Value, HeaderExpression, RegexOptions.Multiline Or RegexOptions.Singleline Or RegexOptions.IgnoreCase)
                    ' Loop through each header element
                    For Each Header As Match In Headers
                        If Header.Groups(1).ToString() Like "*<tr><th>*" Then
                            str1 = Header.Groups(1).ToString().Replace("<tr><th>", Nothing)
                            str1.Trim(Nothing)
                            dt.Columns.Add(str1)
                            str1 = Nothing
                        ElseIf Header.Groups(1).ToString() Like "*<tr>*" And Not Header.Groups(1).ToString() Like "*<tr><th>*" Then
                            str = Header.Groups(1).ToString()
                            Dim i As Integer
                            Dim newstrindex As Integer = 0
                            Dim newstr(Header.Groups(1).ToString().Length) As Char
                            Dim c As Char
                            For i = 0 To str.Length - 1
                                c = str(i)
                                If Not Char.IsLetter(c) Then
                                Else
                                    newstr(newstrindex) = str(i)
                                    newstrindex = newstrindex + 1
                                End If
                            Next
                            s = newstr
                            s = s.Substring(4)
                            dt.Columns.Add(s)
                            str = Nothing
                            s = Nothing
                            c = Nothing
                            newstr = Nothing
                            newstrindex = Nothing
                        Else
                            dt.Columns.Add(Header.Groups(1).ToString)
                        End If
                    Next
                Else
                    For iColumns As Integer = 1 To Regex.Matches(Regex.Matches(Regex.Matches(Table.Value, TableExpression, RegexOptions.Multiline Or RegexOptions.Singleline Or RegexOptions.IgnoreCase).Item(0).ToString, RowExpression, RegexOptions.Multiline Or RegexOptions.Singleline Or RegexOptions.IgnoreCase).Item(0).ToString, ColumnExpression, RegexOptions.Multiline Or RegexOptions.Singleline Or RegexOptions.IgnoreCase).Count
                        dt.Columns.Add("Column " & iColumns)
                    Next
                End If
                ' Get a match for all the rows in the table
                Dim Rows As MatchCollection = Regex.Matches(Table.Value, RowExpression, RegexOptions.Multiline Or RegexOptions.Singleline Or RegexOptions.IgnoreCase)
                Tables = Nothing
                GC.Collect()
                ' Loop through each row element
                For Each Row As Match In Rows
                    ' Only loop through the row if it isn't a header row
                    If Not (iCurrentRow = 0 And HeadersExist = True) Then
                        ' Create a new row and reset the current column counter
                        dr = dt.NewRow
                        iCurrentColumn = 0
                        ' Get a match for all the columns in the row
                        Dim Columns As MatchCollection = Regex.Matches(Row.Value, ColumnExpression, RegexOptions.Multiline Or RegexOptions.Singleline Or RegexOptions.IgnoreCase Or RegexOptions.IgnorePatternWhitespace)
                        ' Loop through each column element
                        For Each Column As Match In Columns
                            data = Column.Groups(1).ToString()
                            'counter = 0
                            'Removing subsiquent <td\> tags as some columns may have null values
                            For Each Header As Match In Headers
                                If data.Length > 5 Then
                                    substr1 = data.Substring(0, 5)
                                    If substr1.Equals("<td/>") Then
                                        data = data.Substring(5)
                                        iCurrentColumn += 1
                                        'dt.Rows(iCurrentRow).Item(iCurrentColumn) = Nothing
                                        dr(iCurrentColumn) = Nothing
                                    Else
                                    End If
                                End If
                                If data.Length > 4 Then
                                    substr2 = data.Substring(0, 4)
                                    If substr2.Equals("<td>") Then
                                        data = data.Substring(4)
                                        iCurrentColumn += 1
                                        'dt.Rows(iCurrentRow).Item(iCurrentColumn) = Nothing
                                        dr(iCurrentColumn) = Nothing
                                    Else
                                    End If
                                End If
                                If data.Length >= 5 Then
                                    If substr2 <> "<td>" And substr1 <> "<td/>" Then
                                        dr(iCurrentColumn) = data
                                        'dt.Rows(iCurrentRow).Item(iCurrentColumn) = data
                                        iCurrentColumn += 1
                                        Exit For
                                    End If
                                Else
                                    dr(iCurrentColumn) = data
                                    iCurrentColumn += 1
                                    Exit For
                                End If
                                substr1 = Nothing
                                substr2 = Nothing
                            Next
                        Next
                        ' Add the DataRow to the DataTable
                        dt.Rows.Add(dr)
                        dr = Nothing
                        'GC.Collect()
                    End If
                    ' Increase the current row counter
                    iCurrentRow += 1
                    dr = Nothing
                    'GC.Collect()
                Next
                dr = Nothing
                Headers = Nothing
                Rows = Nothing
                Tables = Nothing
                GC.Collect()
            Next
            'dg.DataSource = dt
            Return dt
        Finally
            dt.Dispose()
            MsgBox("in finally")
            GC.Collect()
        End Try
    End Function
End Class

推荐答案

解决此问题的最佳方法是将函数拆分为多个较小的函数,以执行某些特定处理.第一个好处是,您使用的资源密集型对象仅限于此功能的范围,并且更容易发现垃圾回收.它还会更好地向您显示在何时何地引发异常.内存泄漏通常应该不是一个大问题,因为这应该由垃圾回收处理.它的主要问题是您使用了大量无法(或不够明显)释放的内存.当然,拆分为多个功能也将使您的代码更具可读性和可维护性.

另一个技巧是尽可能使用using,因为即使出现异常,它也会始终释放内存.看看更多信息的链接:
http://msdn.microsoft.com/en-us/library/htd05whh%28v = vs.80%29.aspx [ ^ ]

http://www.pluralsight-training.net/community/博客/fritz/archive/2005/04/28/7834.aspx [
The best way to deal with this problem is to split the function into multiple smaller functions that do some of the specific processing. The first benefit is that resource intensive objects you use are limited to the scope of such a function and is way easier to spot for garbage collection. It also will show you much better where and when the exception is raised. A memory leak should normally not be that great a problem since this should be handled by garbage collection. The main problem with it is that you use a lot of memory that cannot (or not obvious enough) be freed. Of course, splitting up in multiple functions will also make your code more readable and maintainable.

Another tip is to use using as much as possible because it will free memory always, even when an exception occurs. Have a look at the links for more info:
http://msdn.microsoft.com/en-us/library/htd05whh%28v=vs.80%29.aspx[^]

http://www.pluralsight-training.net/community/blogs/fritz/archive/2005/04/28/7834.aspx[^]

Good luck!


在我看来,这就像电子表格和数据库之间的差异之一.我已经使用过很多电子表格,但是它们经常做得不好的一件事就是扩大规模,因为它们全有还是全无.一般而言,要么将所有文件都加载到内存中,要么不加载任何文件-超过一定大小后,内存就用完了.数据库可以很好地扩展,因为您一次只能加载一点,因此即使数据库很大,所需的内存量也不会发生太大变化.

我认为您可能有类似的问题.我会寻找一种编码方式,以便您一次只能处理一个标签(可以,也许几个,但理想情况下不超过XML节点树的最大深度).一次读取每个文件的一个标签,进行比较,推送任何结果,下一个标签等.否则,可能是在某些文件大小下您会遇到内存问题.

如果要将解析与比较分开,也许可以考虑在第1遍将解析后的值存储在自己的数据库中,然后在第2遍进行比较的方法.

仅仅处理150MB长的字符串会使我感到紧张.而且我不知道RegEx东西在内部是​​如何工作的,但是实际上您可能一直在复制字符串.

使用托管代码,真正的内存泄漏不应该成为问题(除非出现Microsoft的错误,尽管我没有看到会影响此的任何错误)-GC可以处理所有这些问题-并且应该能够找到可以只要GC有足够的处理器时间来处理它,就不再需要引用它,而不必考虑复杂性.
This feels to me like (one of) the differences between spreadsheets and databases. I''ve used spreadsheets a lot, but one of the things they often don''t do well is scale up because they are all or nothing. In general terms, either you load all of the file into memory or none of it - and above a certain size you run out of memory. Databases scale well because you only load a bit at a time so the amount of memory you need doesn''t change much even if the database is huge.

I think you may have a similar problem. I''d look for a way of coding this so that you only ever handle one tag at a time (OK, maybe a few, but ideally no more than the maximum depth of the XML node tree). Read each file one tag at a time, compare, push out any results, next tag etc. Otherwise the likelihood is that at some file size you''ll have memory problems.

If you want to separate parsing from comparison, maybe think of a way to store parsed values etc. in your own database in pass 1 and then do the comparisons in pass 2.

Simply handling a string 150MB long makes me feel nervous. And I don''t know how the RegEx stuff works internally, but you may in effect be replicating the strings along the way.

With managed code, real memory leaks shouldn''t be a problem (barring bugs from Microsoft, though I haven''t seen any that would affect this) - the GC handles all that - and should be able to find any objects that can no longer be referenced regardless of complexity so long as the GC has enough processor time to do its stuff.


这篇关于内存不足异常Ps Hlp的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆