使用NodeJS在大文件中合并几行或几句话的最佳方法是什么? [英] What is the optimal way of merge few lines or few words in the large file using NodeJS?

查看:59
本文介绍了使用NodeJS在大文件中合并几行或几句话的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何人都可以提出最好或更好的解决方案,以使用nodejs编辑从1MB到200MB的文件,我将不胜感激.

I would appreciate insight from anyone who can suggest the best or better solution in editing large files anyway ranges from 1MB to 200MB using nodejs.

我们的流程需要将行合并到文件系统中的现有文件,我们以以下格式获取更改后的数据,该格式需要在更改后的详细信息中定义的位置处合并到文件系统文件中.

Our process needs to merge lines to an existing file in the filesystem, we get the changed data in the following format which needs to be merged to filesystem file at the position defined in the changed details.

[{"range":{"startLineNumber":3,"startColumn":3,"endLineNumber":3,"endColumn":3},"rangeLength":0,"text":"\n","rangeOffset":4,"forceMoveMarkers":false},{"range":{"startLineNumber":4,"startColumn":1,"endLineNumber":4,"endColumn":1},"rangeLength":0,"text":"\n","rangeOffset":5,"forceMoveMarkers":false},{"range":{"startLineNumber":5,"startColumn":1,"endLineNumber":5,"endColumn":1},"rangeLength":0,"text":"\n","rangeOffset":6,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":1,"endLineNumber":6,"endColumn":1},"rangeLength":0,"text":"f","rangeOffset":7,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":2,"endLineNumber":6,"endColumn":2},"rangeLength":0,"text":"a","rangeOffset":8,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":3,"endLineNumber":6,"endColumn":3},"rangeLength":0,"text":"s","rangeOffset":9,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":4,"endLineNumber":6,"endColumn":4},"rangeLength":0,"text":"d","rangeOffset":10,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":5,"endLineNumber":6,"endColumn":5},"rangeLength":0,"text":"f","rangeOffset":11,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":6,"endLineNumber":6,"endColumn":6},"rangeLength":0,"text":"a","rangeOffset":12,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":7,"endLineNumber":6,"endColumn":7},"rangeLength":0,"text":"s","rangeOffset":13,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":8,"endLineNumber":6,"endColumn":8},"rangeLength":0,"text":"f","rangeOffset":14,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":9,"endLineNumber":6,"endColumn":9},"rangeLength":0,"text":"s","rangeOffset":15,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":10,"endLineNumber":6,"endColumn":10},"rangeLength":0,"text":"a","rangeOffset":16,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":11,"endLineNumber":6,"endColumn":11},"rangeLength":0,"text":"f","rangeOffset":17,"forceMoveMarkers":false},{"range":{"startLineNumber":6,"startColumn":12,"endLineNumber":6,"endColumn":12},"rangeLength":0,"text":"s","rangeOffset":18,"forceMoveMarkers":false}]

如果我们只打开完整文件并合并这些细节,则可以使用,但是如果我们经常频繁获取太多这些已更改的细节,则可能会中断,这可能会导致内存不足问题,因为该文件被多次打开了,这也是非常重要的.低效的方式.

If we just open the full file and merge those details would work but it would break if we getting too many of those changed details very frequently that can cause out of memory issues as the file been opened many times which is also a very inefficient way.

还有一个类似的问题专门针对c#此处.如果我们以流模式打开文件,nodejs中是否有类似的示例?

There is a similar question aimed specifically at c# here. If we open the file in stream mode, is there similar example in nodejs?

推荐答案

任何人都可以提出最好或更好的解决方案,以使用nodejs编辑从1MB到200MB的文件,我将不胜感激.

I would appreciate insight from anyone who can suggest the best or better solution in editing large files anyway ranges from 1MB to 200MB using nodejs.

我们的流程需要将行合并到文件系统中的现有文件,我们以以下格式获取已更改的数据,需要将其合并到更改的详细信息中定义的位置的文件系统文件中.

Our process needs to merge lines to an existing file in the filesystem, we get the changed data in the following format which needs to be merged to filesystem file at the position defined in the changed details.

常规OS文件系统不直接支持在文件中插入信息的概念.因此,如果您有一个平面文件,并且想从特定的行号开始向其中插入数据,则必须执行以下步骤:

General OS file systems do not directly support the concept of inserting info into a file. So, if you have a flat file and you want to insert data into it starting at a particular line number, you have to do the following steps:

  1. 打开文件并从头开始阅读.
  2. 从文件中读取数据时,请对行进行计数,直到达到所需的行号为止.
  3. 然后,如果要插入新数据,则需要阅读更多内容并将要插入的数据量缓冲到内存中.
  4. 然后在要插入的数据插入位置写入文件.
  5. 现在使用另一个缓冲区,其大小与您插入的数据大小相同,轮流读取另一个缓冲区,然后写出先前的缓冲区.
  6. 继续直到到达文件末尾,并且所有数据都写回到文件中(在新插入的数据之后).
  7. 这具有将插入点之后的所有数据重写回文件的作用,因此现在它可以正确地位于文件中的新位置.

如您所知,这对于大型文件而言根本没有效率,因为您必须一次读取整个文件的缓冲区,并且必须在插入点之后插入所有内容.

As you can tell, this is not efficient at all for large files as you have to read the entire file a buffer at a time and you have to write the insertion and everything after the insertion point.

在node.js中,您可以使用 fs 模块中的功能执行所有这些步骤,但是由于没有内置功能,因此您必须编写逻辑将它们连接在一起在将现有数据推入文件的同时将新数据插入文件.

In node.js, you can use features in the fs module to carry out all these steps, but you have to write the logic to connect them all together as there is no built-in feature to insert new data into a file while pushing the existing data after it.

这里有一个专门针对c#的类似问题.如果我们以流模式打开文件,nodejs中是否有类似的示例?

There is a similar question aimed specifically at c# here. If we open the file in stream mode, is there similar example in nodejs?

您引用的C#示例似乎只是将新数据附加到文件末尾.在几乎所有文件系统库中这样做都是微不足道的.在node.js中,您可以使用 fs.appendFile()进行此操作,也可以在附加模式下打开任何文件句柄,然后对其进行写入.

The C# example you reference appears to just be appending new data onto the end of the file. That's trivial to do in pretty much any file system library. In node.js, you can do that with fs.appendFile() or you can open any file handle in append mode and then write to it.

要更有效地将数据插入文件,您将需要使用比单个平面文件更有效的存储系统来存储所有数据.例如,如果您将文件分块存储在大约100个行块中,那么要插入数据,您只需要重写一个数据块的一部分,然后也许有一些清理过程可以重新平衡块边界(如果块被占用).太大或太小.

To insert data into a file more efficiently, you would need to use a more efficient storage system than a single flat file for all the data. For example, if you stored the file in pieces in approximately 100 line blocks, then to insert data you'd only have to rewrite a portion of one block of data and then perhaps have some cleanup process that rebalances the block boundaries if a block gets way too big or too small.

为了进行有效的行管理,您需要维护一个准确的索引,以了解每个文件段包含多少行以及显然应该按什么顺序排列.这将允许您以固定的成本插入数据,无论其大小如何整个文件是您最需要做的就是重写一两个数据块,即使整个内容的大小为数百GB.

For efficient line management, you would need to maintain an accurate index of how many lines each file piece contains and obviously what order the pieces should be in. This would allow you to insert data at a somewhat fixed cost no matter how big the entire file was as the most you would need to do is to rewrite one or two blocks of data, even if the entire content was hundreds of GB in size.

请注意,从本质上讲,您将在OS文件系统之上构建一个新的文件系统,以使自己在整个数据中更有效地进行插入或删除.显然,数据块也可以存储在数据库中并在那里进行管理.

Note, you would essentially be building a new file system on top of the OS file system in order to give yourself more efficient inserts or deletions within the overall data. Obviously, the chunks of data could also be stored in a database too and managed there.

请注意,如果这个项目确实是一个编辑器,那么对基于行的结构进行文本编辑是一个研究得非常透彻的问题,您还可以研究先前项目中使用的体系结构以获得进一步的想法.研究各种体系结构的优缺点在这里超出了典型答案的范围.如果您的系统也是客户端/服务器编辑器,更改说明是从客户端发送到服务器的,则这也会影响设计中的某些所需折衷,因为您可能希望在交易数量或交易额方面进行不同的折衷.客户端和服务器之间要发送的数据量.

Note, if this project is really an editor, text editing a line-based structure is a very well studied problem and you could also study the architectures used in previous projects for further ideas. It's a bit beyond the scope of a typical answer here to study the pros and cons of various architectures. If your system is also a client/server editor where the change instructions are being sent from a client to a server, that also affects some of the desired tradeoffs in the design since you may desire differing tradeoffs in terms of the number of transactions or the amount of data to be sent between client and server.

如果其他某种语言使用了最佳方式,那么我认为最好找到该选项,因为您说nodejs可能没有该选项.

If some other language uses an optimal way then I think it would be better to find that option as you saying nodejs might not have that option.

这实际上与您选择的语言没有任何关系.这是关于现代操作系统和典型操作系统如何将数据存储在文件中的.

This doesn't really have anything to do with the language you choose. This is about how modern and typical operating systems store data in files.

这篇关于使用NodeJS在大文件中合并几行或几句话的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆