将大文本文件读入列表 [英] Read a large text file to a list

查看:96
本文介绍了将大文本文件读入列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,



我一直在搜索但未找到答案。也许有人可以提供帮助?



为什么:

  Dim  PointArray() as   String  
PointArray = System.IO.File。 ReadAllLines(MyFileName)



效果很好而且速度非常快,但我找不到类似的方法在List中做同样的事情:

  Dim  PointList  As  列表(  字符串
PointList = System.IO.File。 ReadAllLines(MyFileName)



我正在尝试使用List而不是数组,因为我相信它在处理数据时比数组更快,而MyFileName包含大约300,000行。 br />
谢谢

解决方案

这是错误的想法。



首先, 有System.IO.File .ReadAllLines ,很可能在内部使用一个列表(或类似的东西),然后从中创建一个数组,以获得更多的通用性和用户的便利性。为什么?因为元素的总数事先是未知的,所以在读取和重新调整大小之前,数组太昂贵而且简单愚蠢,并且因为读取文件两次会是愚蠢的。因此,可变长度集合用于首先读取文件。为什么阵列?因为数组代表一个固定长度的结构,这是正在读取的文件的最恰当的表示。



此外,你只需要一些非固定的(非数组) )如果您计划更改其大小,添加,插入或删除元素,请收集。如果没有,使用它只是一种浪费。顺便说一句,如果您的目的是从数据中删除一些不需要的元素,那将是一个非常糟糕的方法。相反,您可以更好地过滤掉数据,同时逐行读取文件到列表。你做的事情不合逻辑。如果您想稍后修改数据集,最有效的方法是逐行阅读,正如Richard Deeming建议的那样。如果没有,阵列将是最适合你的。



你的表现考虑不是基于任何理性的。



最后,如果你的档案也是如此大到甚至适合记忆。对于这种情况,存在一个狡猾的解决方案:保持文件打开并从保留在内存中的文件创建一些摘要。比如,它可以记住每一行或其他单位的文件位置。最重要的是,您可以按需组织文件读取,使用类索引属性this实现类似数组的接口。



-SA


对不起,我刚刚发现.ToList选项



 PointList = System.IO.File.ReadAllLines(MyFileName).ToList 



工作得很好....


Quote:

Philippe,



这是什么我有:



a文本文件包含一个(巨大的)3D点(X,Y,Z)列表,我称之为T1,T2,T3等.... :

  -  3153.61743,-6115.18164,-3289.31226 
-3256.75195,-6075.70801,-3550.83057
-3775.00977,-5870.36035,-3817.91870
-3890.03223,-5819.92871,-3797.58984
-4134.60156,-5724.16309,-3651.39648
....





对于每个点(称为T),我需要计算是否有任何其他点(称为x1,x2 (等)是从T矢量绘制到该矢量上无限点的锥体内的
。如果你喜欢它的阴影。

最后我需要一个所有这些点的清单。



这是我的工作:

1)将X,Y,Z读入三个列表(整数 - 足够接近)

2)计算每个点及其两个球面轴承的范围

3)按范围对列表进行排序,使最接近原点位于列表顶部

4)创建X,Y,Z,Range的单独列表,轴承#1,轴承#2每个点的值

5)运行一个这样的循环:



每个T

每个X(即T + 1)

计算范围内的向量,方位#1,方位#2为X,看它是否在锥矢量中

如果是,请添加到Y的列表
下一个X

下一个T

将Y列表写入文件。 />




这里有一些我试图加快的事情:

1)我确定哪个3d象限每个X都在 - 如果它与cur不同租用TI不运行矢量计算

2)我尝试使用Streamwriter将Y写入磁盘,每10,100,1000点Y,但这非常慢。

3)我尝试在每次T循环完成时使用List.RemoveAt(0)减小列表的大小,对速度没有影响

4)能够监控进度(尽管速度慢) !)循环存在于具有进度条和%完成标签的后台工作程序中。



我确定有一种聪明/快速的方法可以做到这一点,但它逃避我!







你做错了!



您必须使用OleDb读取此类数据!

这个想法是:

1)将数据加载到DataTable中 [ ^ ]使用 OleDbReader [ ^ ]

2)使用Linq to DataSet [ ^ ]进行计算



关注个是链接 [ ^ ]查看我过去的答案。



进一步信息,请参阅:

关于文本文件的大量ADO [< a href =https://msdn.microsoft.com/en-us/library/ms974559.aspxtarget =_ blanktitle =新窗口> ^ ]

文本文件连接字符串 [ ^ ]

如何:使用Jet OLE DB Provider 4.0连接到ISAM数据库 [ ^ ]

Schema.ini文件(文本文件驱动程序) [ ^ ]

使用OleDb导入文本文件(选项卡,CSV,自定义) [ ^ ]

如何使用Jet提供程序的Text IIsam打开分隔文本文件 [ ^ ]

LINQ to DataSet示例 [ ^ ]

LINQ to DataSet中的查询 [ ^ ]

查询数据集(LINQ to DataSet) [ ^ ]


Hello,

I've searched around but not found an answer to this. Perhaps someone can help?

How come:

Dim PointArray() as String
PointArray = System.IO.File.ReadAllLines(MyFileName)


Works well and is really fast, but I cant find a similar method for doing the same into a List as below:

Dim PointList As New List(Of String)
PointList = System.IO.File.ReadAllLines(MyFileName)


I'm trying to use a List rather than an Array as I believe its faster than an array when handling data and MyFileName contains around 300,000 lines.
Thanks

解决方案

This is wrong idea.

First of all, System.IO.File.ReadAllLines, most likely, uses a list internally (or something similar), and then make an array out of it, for more generality and user's convenience. Why? Because the total number of elements is not unknown in advance, before reading, and re-sizing and an array would be too expensive and plain stupid, and because reading a file twice would be as stupid. So, a variable-length collection is used to read a file first. Why to array? Because array represents a fixed-length structure, which is the most adequate representation of a file being read.

Moreover, you only need some non-fixed (non-array) collection if you plan to change its size, add, insert or remove elements. If not, using it would be just a waste. By the way, if your purpose was to remove some unwanted elements from data, it would be a really bad approach. Instead, you could better filter out data while reading file line-by-one, yes, to a list. You are doing something illogical. If you want to modify the data set later on, the most efficient way would be to read line-by-one, as Richard Deeming advised. If not, the array would be the best for you.

Your performance considerations are not based on anything rational.

And finally, if your file is too big to even fit in memory. For such cases, a cunning solution exists: you keep the file open and create some digest from the file kept in memory. Say, it can memorize the file position of every line, or some other unit. On top of that, you organize reading of piece of file on demand, implementing array-like interface using a class indexed property "this".

—SA


Sorry, I just discovered the .ToList option

PointList = System.IO.File.ReadAllLines(MyFileName).ToList


works fine....


Quote:

Philippe,

Here's what I have:

a text file containing a (huge) list of 3d points (X,Y,Z) which I call T1, T2, T3 etc....:

-3153.61743,-6115.18164,-3289.31226
-3256.75195,-6075.70801,-3550.83057
-3775.00977,-5870.36035,-3817.91870
-3890.03223,-5819.92871,-3797.58984
-4134.60156,-5724.16309,-3651.39648
....



For each point (called T) I need to calculate if any of the other points (called x1, x2 etc) are
within a cone drawn from the vector of T to an infinite point on that vector. In its 'shadow' if you like.
At the end I need a list of all such points.

Here's what I do:
1) Read the X,Y,Z into three lists (of integers - that's close enough)
2) Calculate the range of each point and its two spherical bearings
3) Sort the list by range so that the closest to origin are at the top of the list
4) Create seperate lists of the X, Y, Z, Range, Bearing#1, Bearing#2 value of every point
5) Run a loop a bit like this:

For every T
For every X (which is T+1)
Calculate vectors from range, bearing#1, bearing #2 for X to see if its in the cone vector
If yes, add to a List of Y
Next X
Next T
Write the Y list to file.


Here's a few things I've tried to speed things up:
1) I determine which 3d quadrant each X is in - if its not in the same as the current T I dont run the vector calcs
2) I tried writing the Y to disk using a Streamwriter, every 10, 100, 1000 points of Y but this was very slow.
3) I tried decreasing the size of the lists using List.RemoveAt(0) each time the T loop was completed, made no difference to speed
4) To be able to monitor progress (albeit slow!) the loop lives in a Background worker with a progress bar and % completed label.

I'm sure there's a clever/fast way to do this, but it escapes me!




You're doing it wrong!

You have to read such of data using OleDb!
The idea is:
1) load data into DataTable[^] using OleDbReader[^]
2) use Linq to DataSet[^] to make calculation

Follow this link[^] to see my past answers.

For further information, please see:
Much ADO About Text Files[^]
Textfile connection strings[^]
HOW TO: Use Jet OLE DB Provider 4.0 to Connect to ISAM Databases[^]
Schema.ini File (Text File Driver)[^]
Using OleDb to Import Text Files (tab, CSV, custom)[^]
How To Open Delimited Text Files Using the Jet Provider's Text IIsam[^]
LINQ to DataSet Examples[^]
Queries in LINQ to DataSet[^]
Querying DataSets (LINQ to DataSet)[^]


这篇关于将大文本文件读入列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆