将csv文件放入内存 [英] putting csv file into memory

查看:81
本文介绍了将csv文件放入内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大 (10mb) 的 csv 文件.我解析它并使用通用列表将其放入内存中.

I have one very large (10mb) csv file. I parsed it and put it into memory using a generic list.

我创建了一个类来表示每一行.这个类只有几个字段(数据类型ip-address,string).

I created a class to represent each line. This class has only several fields (data type ip-address, string).

我想,由于文件只有 10 兆字节,我可以预期内存中的大小相似.

I thoguht that since the file is only 10 megabytes I could expect a similar size in-memory.

当我发现创建列表的方法分配了 300 mb 而没有释放它时,我感到非常惊讶.

I was quite surprised when I found out that the method that is creating the list is allocating 300 mb and not freeing it up.

这是否正常,可能是什么原因造成的.

Is this normal, and what can be causing this.

请注意,csv 文件有很多行(100 000 +),这可能是一个因素.

Note that the csv file has many lines (100 000 +) this could be a factor.

命名空间地理公共类 CountryMarker公共起始地址作为 IP 地址公共端地址作为 IP 地址公共国家作为字符串公共国家代码作为字符串结束班

Namespace Geo Public Class CountryMarker Public StartAddress As IPAddress Public EndAddress As IPAddress Public Country As String Public CountryCode As String End Class

Public Class Markers
    Private Const DatabasePath = "~/App_Data/ip.csv" '10 MB file
    Public Shared List As List(Of CountryMarker) = LoadData()

    Shared Function LoadData() As List(Of CountryMarker)
        Dim Markers As New List(Of CountryMarker)

        Using Stream = New IO.FileStream(Hosting.HostingEnvironment.MapPath(DatabasePath), FileMode.Open)
            Dim Reader = New StreamReader(Stream)

            Do While Reader.Peek > -1
                Dim Line = Reader.ReadLine()
                Dim Values = Line.Split(",").Select(Function(i) i.Replace("""", ""))

                Markers.Add(New CountryMarker With {.Country = Values(5), .CountryCode = Values(4), .StartAddress = IPAddress.Parse(Values(0)), .EndAddress = IPAddress.Parse(Values(1))})
            Loop
        End Using

        Return Markers
    End Function
End Class

结束命名空间

推荐答案

首先,如果文件是 ASCII 文本或 UTF-8,主要是西欧字符(如英语),那么文本的内存大小将为至少将磁盘上的文件大小增加一倍..NET 将字符串存储为 16 位 Unicode 值.因此,例如A"在文本文件中占用一个字节,在内存中需要两个字节.

First, if the file is ASCII text or UTF-8 with predominately Western European characters (like English), then the in-memory size of the text will be at least double the file's size on disk. .NET stores strings as 16-bit Unicode values. So "A", for example, which takes one byte in a text file, requires two bytes in memory.

您创建的每个类实例至少需要 24 个字节(16 个字节的分配空间,加上 8 个字节的引用空间).如果您的文件有 100,000 行,那么最少需要 2.4 兆字节.此外,您分配的每个字符串都需要 24 个字节,加上字符串所需的任何内容.事情加起来很快.

Each class instance that you create is going to require at least 24 bytes (16 bytes of allocation, plus 8 bytes for the reference.) If your file is 100,000 lines, that's 2.4 megabytes, minimum. In addition, every string that you allocate will require 24 bytes, plus whatever is required for the string. Things add up quick.

(请注意,我的 24 字节数适用于 64 位系统.在 32 位运行时中,每次分配为 16 字节.)

(Note that my 24 bytes number is for a 64-bit system. It's 16 bytes per allocation in the 32-bit runtime.)

正如其他人评论的那样,除非您发布一些代码(包括您的类定义),否则无法为您提供更多细节.

As others have commented, it's impossible to give you any more detail unless you post some code, including your class definition.

至于不释放任何内存:这有点难以证明.也许垃圾收集器还没有开始收集.如果它没有看到内存压力(即有足够的内存可用并且没有其他进程在请求内存),GC 可能会决定它不需要收集.

As to not freeing up any memory: that's kind of difficult to prove. Maybe the garbage collector just hasn't gotten around to doing a collection yet. If it sees no memory pressure (i.e. there's plenty of memory available and no other process is begging for memory), the GC might decide it doesn't need to collect yet.

这篇关于将csv文件放入内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆