如何编写程序来减少文本文件的大小? [英] How to code a program to decrease a text file size?

查看:133
本文介绍了如何编写程序来减少文本文件的大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的文本文件,我想增加它的大小,所以我想编写一个程序,通过删除我不想在该文件中的一些数据来做到这一点。

这是该文件的一个小样本。



START

POINT 1000 5356 4720.589395 33044.474616 111.699997 10005356

2.197266 1554.908813

2.278646 1309.400635

5.615234 572.443115

5.696615 572.070190

6.510417 616.282471

6.591797 611.210938

6.673177 615.655396

POINT 1050 5360 4770.576031 33044.253728 112.699997 10005360

2.197266 883.810486

2.278646 1237.972656

2.360026 1187.120972

2.522787 922.997620

2.604167 868.807739

2.685547 810.683044

2.766927 794.258240

2.929688 706.232666



程序应询问文件的目的地。第一行开始不重复,应该被忽略(不删除)然后应该分组文件。 2'POINTS'之间的所有数据应该是它自己的一组。

例如,这将被视为一个组:

POINT 1000 5356 4720.589395 33044.474616 111.699997 10005356

2.197266 1554.908813

2.278646 1309.400635

5.615234 572.443115

5.696615 572.070190

6.510417 616.282471

6.591797 611.210938

6.673177 615.655396

等等....



然后它将按照标题POINT ......删除组。

我希望程序问我几个问题,例如:

起点(从POINT 1000 ......)例如

终点(直到POINT 3521 .....)

增量(每5分删除一次) ,例如删除POINT 10然后POINT 15然后20 ......直到结束点)



我希望你理解我,我更喜欢它在vb中完成但是我想这不会起作用,因为该文件是900万行。所以,如果没有,请告诉我是否可以用c ++或c#完成,请告诉我方法或教程可以帮助我。

提前谢谢

I have a very large text file that I want to increase its size, so I want to code a program that would do that by deleting some data I don't want in that file.
Here is a small sample of that file.

START
POINT 1000 5356 4720.589395 33044.474616 111.699997 10005356
2.197266 1554.908813
2.278646 1309.400635
5.615234 572.443115
5.696615 572.070190
6.510417 616.282471
6.591797 611.210938
6.673177 615.655396
POINT 1050 5360 4770.576031 33044.253728 112.699997 10005360
2.197266 883.810486
2.278646 1237.972656
2.360026 1187.120972
2.522787 922.997620
2.604167 868.807739
2.685547 810.683044
2.766927 794.258240
2.929688 706.232666

The program should ask for the destination of the file. The first line "start" is not repeated and should be ignored (not deleted) then it should 'group' the files. All data between 2 'POINTS' should be a group of its own.
So for example, this would be considered as a group:
POINT 1000 5356 4720.589395 33044.474616 111.699997 10005356
2.197266 1554.908813
2.278646 1309.400635
5.615234 572.443115
5.696615 572.070190
6.510417 616.282471
6.591797 611.210938
6.673177 615.655396
And so on....

Then it would delete the groups according to their heading "POINT ......"
I want the program to ask me a couple of questions, such as:
Start point (from POINT 1000 ......) for example
End point (till POINT 3521 .....)
Increment (delete every 5 points, for example delete POINT 10 Then POINT 15 Then 20... till the end point)

I hope you understood me and I prefer that it is done in vb but I guess it won't won't work as the file is 9 million lines. So if not please tell me if it could be done in c++ or c# and please tell me the method or a tutorial(s) that could help me.
Thanks in advance

推荐答案

请参阅我对该问题的评论。再次,你是以错误的方式接近它。



当我回答使用大文本文件是一件坏事并询问你的目标时,你并没有真正解释他们,但你提到某种软件可能不会给你一个选择。但那么,为什么要问减小尺寸呢?谁会减少呢?这不合逻辑。



但是,我们仍然不知道文件的基本信息,结构和语义。好的,这是一种可能的方法:



您可以索引该文件,以介绍通过较小的块读取它的能力。我们假设文件有一些浅的结构;特别是,它意味着它可以在一些较小的逻辑块上分解,我们称之为记录。记录可以是一行,但它可以是一组行,就像您在示例中显示的组一样。那么唯一的问题是每个组都有不同的大小;首先,所有行都有不同的大小,所以在你阅读整个文件之前你不知道每个记录的位置。



所以,在第一次运行时,您可以逐行读取整个文件,并创建另一个较小的文件索引文件。在索引文件中,您可以将每个记录的位置写为文件位置。最好使索引文件成为二进制文件,以便在该文件中更快地导航。您可以拥有多个索引文件,按不同的标准排序(一个按照原始文件中定义的顺序按记录编号排序,另一个按照某种关键字排序)。然后,您可以将索引文件保存在内存中,如果索引文件很大,则存储索引文件的唯一索引,并根据请求读取索引文件。



现在,根据请求/查询,您可以从一个或另一个索引文件中获取某些记录的信息(从内存中获取或从索引文件中读取)。从索引信息中,获取主要原始大文件中的位置并在文件流中查找该位置(打开它一次并在应用程序的整个生命周期内保持打开状态)。然后从原始文件中读取你的记录。



一个稍微不同的选择:按照上面的描述做一切,但是,在第一次运行时,完全重写原始文本文件的东西更方便导航,这可能是更短的二进制文件。在该二进制文件中,不要将数字存储为字符串;它将为您节省大量空间,更重要的是,可以大大提高您的表现。



-SA
Please see my comments to the question. Again, you are approaching it in a wrong way.

When I replied that using big text files is a bad thing and asked about your goals, you did not really explain them, but you mentioned that "some sort of software" which probably doesn't give you a choice. But then, why asking about "decreasing a size"? Who is going to decrease it? Isn't that logical.

And still, we don't know essential information, structure and semantic of the file. Okay, this is one of possible approaches:

You can index the file, to introduce the ability to read it by smaller chunks. Let's assume the file has some shallow structure; in particular, it would mean it can be decomposed on some smaller logical chunks we shall call "records". A record can be a line, but it could be a group of lines, like the group you've shown in your example. Then only problem then is that each group has different size; first of all, all lines have different size, so you don't know the location of each record before you read the whole file.

So, on first run, you can read the whole file line by line and create another, smaller file, the index file. In index file, you can write the location of each record as file position. It would be better to make the index file binary, to navigate faster in that file. You can have more then one index file, sorted by different criteria (one is sorted by record number in the order defined in the original file another one sorted by some kind of keyword, for example). Then, you can hold the index file in memory, and, if even the index files are big, store the only index of the index file, and read the index files on request.

Now, on request/query, you get the information on some record from one or another index file (take from memory or read from index file). From index information, get the position in the main original big file and seek this position in the file stream (open it once and keep open during the whole lifetime of the application). Then read your record from the original file.

One slightly different alternative: do everything as described above, but, on first run, completely rewrite the original text file in something more convenient for navigation, which could be much shorter binary file. In that binary file, don't store numbers as strings; it will save you a lot of space and, more importantly, greatly improve your performance.

—SA

这篇关于如何编写程序来减少文本文件的大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆