阅读非常大的文本文件,我应该纳入异步? [英] Reading very large text files, should I be incorporating async?

查看:124
本文介绍了阅读非常大的文本文件,我应该纳入异步?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在质疑与生产,将读入大文本文件合并成一个程序的方法将这些文件的范围可以从2GB到100GB。

I have been challenged with producing a method that will read in very large text files into a program these files can range from 2gb to 100gb.

这个想法到目前为止已经读了一通1000行文本到方法。

The idea so far has been to read say a couple of 1000 lines of text into the method.

目前的计划是使用流读者逐行读取文件中的行和处理该行找到必要的数据方面的设置。

At the moment the program is setup using a stream reader reading a file line by line and processing the necessary areas of data found on that line.

using (StreamReader reader = new StreamReader("FileName"))
{
    string nextline = reader.ReadLine();
    string textline = null;

    while (nextline != null)
    {
        textline = nextline;
        Row rw = new Row();
        var property = from matchID in xmldata
                       from matching in matchID.MyProperty
                       where matchID.ID == textline.Substring(0, 3).TrimEnd()
                       select matching;

        string IDD = textline.Substring(0, 3).TrimEnd();

        foreach (var field in property)
        {
            Field fl = new Field();

            fl.Name = field.name;
            fl.Data = textline.Substring(field.startByte - 1, field.length).TrimEnd();
            fl.Order = order;
            fl.Show = true;

            order++;

            rw.ID = IDD;
            rw.AddField(fl);
        }
        rec.Rows.Add(rw);
        nextline = reader.ReadLine();

        if ((nextline == null) || (NewPack == nextline.Substring(0, 3).TrimEnd()))
        {
            d.ID = IDs.ToString();
            d.Records.Add(rec);
            IDs++;
            DataList.Add(d.ID, d);
            rec = new Record();

            d = new Data();
        }
    }
}

该计划的推移进一步和填充类。 (刚刚决定不张贴的其余部分)

The program goes on further and populates a class. ( just decided not to post the rest)

我知道,一旦该程序显示一个非常大的文件时,会出现内存异常错误。

I know that once the program is shown an extremely large file, memory exception errors will occur.

所以这就是我目前的问题,到目前为止,我一直在使用Google和很多人只是回答使用流阅读器和reader.readtoend,我知道ReadToEnd的不会为我工作,我会得到这些内存错误的几种方法。

so that is my current problem and so far i have been googling several approaches with many people just answering use a stream reader and reader.readtoend, i know readtoend wont work for me as i will get those memory errors.

最后我一直在寻找到异步,创建方法的方式,将阅读一定量的线条和处理线的下额前等待来电。

Finally i have been looking into async as a way of creating a method that will read a certain amount of lines and wait for a call before processing the next amount of lines.

这使我想到我的问题我正在努力理解异步和我似乎无法找到,这将有助于我学习任何材料,并希望这里有人能帮助我的方式来了解异步。

This brings me to my problem i am struggling to understand async and i can't seem to find any material that will help me learn and was hoping someone here can help me out with a way to understand async.

当然,如果任何人有更好的方式来解决这个问题,我所有的耳朵都知道。

Of course if anyone knows of a better way to solve this problem I am all ears.

修改添加了在code的剩余部分放一个结束混淆。

EDIT Added the remainder of the code to put a end to any confusion.

推荐答案

您的问题是不同步的五世的异步的,那就是你正在阅读的整个文件和存储在内存中的文件的部分您的之前做一些事情的与数据。

Your problem isn't synchronous v's asynchronous, it's that you're reading the entire file and storing parts of the file in memory before you do something with that data.

如果你读每一行,处理它,并将结果写入到另一个文件/数据库,那么的StreamReader 将让你处理多GB(或TB)的文件。

If you were reading each line, processing it and writing the result to another file/database, then StreamReader will let you process multi GB (or TB) files.

只的Theres如果您存储文件的部分,直到你读完它,然后你可以运行到内存的问题(但你会惊奇地发现你能有多大让列出问题&放大器; 字典让你耗尽内存前)

Theres only a problem if you're storing a portions of the file until you finish reading it, then you can run into memory issues (but you'd be surprised how large you can let Lists & Dictionaries get before you run out of memory)

您需要做的是,只要你可以保存处理的数据,而不是保存在内存中(或保留尽可能少在内存中尽可能)什么。

What you need to do is save your processed data as soon as you can, and not keep it in memory (or keep as little in memory as possible).

随着大型,你可能需要保持你的工作集(您处理数据)数据库中的文件 - 可能像SqlEx preSS或SqlLite会做(但同样,这取决于你的工作集多大获取)

With files that large you may need to keep your working set (your processing data) in a database - possibly something like SqlExpress or SqlLite would do (but again, it depends on how large your working set gets).

希望这会有所帮助,不要犹豫,请在评论中更多的问题,或编辑你原来的问题,我会更新这个答案,如果我能以任何方式帮助。

Hope this helps, don't hesitate to ask further questions in the comments, or edit your original question, I'll update this answer if I can help in any way.

更新 - 分页/分块

您需要读取在一页的块的文本文件,并允许用户通过在该文件中的页滚动。当你阅读用户滚动和present他们的下一个页面。

You need to read the text file in chunks of one page, and allow the user to scroll through the "pages" in the file. As the user scrolls you read and present them with the next page.

现在,有两件事情可以做,以帮助自己,始终保持在内存中大约10页,这可以让你的应用程序响应,如果用户的网页上/下几页非常快。在应用程序的空闲时间(应用程序空闲时),你可以在接下来的几页看,你又扔掉了五百多页之前或当前页之后的页面。

Now, there are a couple of things you can do to help yourself, always keep about 10 pages in memory, this allows your app to be responsive if the user pages up / down a couple of pages very quickly. In the applications idle time (Application Idle event) you can read in the next few pages, again you throw away pages that are more than five pages before or after the current page.

寻呼倒退是一个问题,因为你不知道在哪里每一行开头或文件结尾,因此,你不知道在哪里的每个页面开始或结束。所以对于寻呼倒退,因为你通过文件读取下来,保持偏移列表的每一页的开头( Stream.Pos ),那么你可以迅速寻找来指定位置并从那里读入的页面。

Paging backwards is a problem, because you don't know where each line begins or ends in the file, therefore you don't know where each page begins or ends. So for paging backwards, as you read down through the file, keep a list of offsets to the start of each page (Stream.Pos), then you can quickly Seek to a given position and read the page in from there.

如果你需要允许用户通过该文件进行搜索,那么pretty通过行的文件行多读(记住页面偏移,当您去)寻找文本,然后当你找到的东西,阅读和$ p $与该页面psent他们。

If you need to allow the user to search through the file, then you pretty much read through the file line by line (remembering the page offsets as you go) looking for the text, then when you find something, read in and present them with that page.

您可以通过加快了一切pre-处理文件到数据库中,有网格控件,将工作了一个动态的数据集(他们会为你做分页),你会得到的内置搜索的好处/过滤器。

You can speed everything up by pre-processing the file into a database, there are grid controls that will work off a dynamic dataset (they will do the paging for you) and you get the benefit of built in searches / filters.

所以,从的某种的角度来看,这是读取文件的异步的,但是从用户的角度看这是。但是从技术角度来看,我们往往意味着别的东西,当我们谈论编程的时候做一些异步的。

So, from a certain point of view, this is reading the file asynchronously, but that's from the users point of view. But from a technical point of view, we tend to mean something else when we talk about doing something asynchronous when programming.

这篇关于阅读非常大的文本文件,我应该纳入异步?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆