Node.js v0.10:替换文件中的某些字节而不读取整个文件 [英] Node.js v0.10: Replace certain bytes in file without reading whole file

查看:44
本文介绍了Node.js v0.10:替换文件中的某些字节而不读取整个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在制作一个文本编辑器,并且为了编辑文件,我确实需要某种方式来仅从文件中读取某些字节,这是我使用 fs.createReadStream 并在>开始 end 选项.

I am making a text editor and for editing a file I really need some sort of way to only read certain bytes from a file, which I've achieved using fs.createReadStream uisng the start and end options.

我还需要替换文件中的某些字节.我不确定如何做到这一点.到目前为止,我想到的最好的解决方案是使用流读取文件,然后写入新文件,当遇到需要查找的字节时,我改写了我的新内容,因此将旧内容替换为新的东西.

I also need to replace certain bytes in the file. I am not sure how this can be done. So far the best solution I've come up is to read the file using a stream and then write to a new file, when I come across the bytes I'm looking for I write my new content instead, thus replacing the old stuff with the new stuff.

您可能会知道,这不是最好的方法.要编辑4个字节,我正在读取一个2GB的巨大文件,然后写入2GB(假设我正在编辑一个2GB的文件),至少效率不高.

This is not the best way, as you'll probably know. To edit 4 bytes I am reading a huge 2GB file and writing the 2GB (assuming I'm editing a 2GB file), not efficient in the least.

实现此目标的最佳方法是什么?我已经花了两个星期的时间来做这件事,而且我还考虑过使用Buffers,但是Buffers将整个文件加载到内存中,如果它是2GB的文件,则效率再低.

What is the best way to achieve this? I've spent two weeks doing this and I've also thought of using Buffers, but Buffers load the entire file into memory, which again is unefficient if it's a 2GB file.

在不读取整个文件且不安装某些具有C ++代码的npm软件包的情况下,如何实现替换文件中的某些字节.我不希望我的编辑器必须编译C ++代码.

How would you achieve replacing certain bytes in a file without reading the entire file and without installing some npm package that has C++ code. I don't want my editor to have to compile C++ code.

如果这样做不是很简单,那么如何在不读取整个文件的情况下从文件中删除某些字节呢?如果可以,那么我可以删除要替换的字节,并使用 fs.write()之类的内容添加我想要替换的字节.

If doing that is not straightforward, how about deleting certain bytes from a file without reading the entire file? If I can do that then I can delete the bytes to be replaced and use something like fs.write() to add the ones I want them to be replaced with.

编辑#1:

玩转之后,我发现如果我先用 fs.open 打开带有标志 r + 的文件,然后再使用 fs.write 替换的东西.因此,如果文本是"Lorem ipsum",而我 fs.write "!!!!"结果将是"!!!! m ipsum".

After playing around, I've found that if I open a file with fs.open with flag r+ and then fs.write that replaces stuff. So if the text is "Lorem ipsum" and I fs.write "!!!!" the result will be "!!!!m ipsum".

如果我要写的所有内容都是完美的长度,这将很好地工作.:/

This would work fine if only all the stuff I was going to write was the perfect length. :/

我知道在新内容长度不够长的情况下该怎么办,但我不知道如何做.:/也许如果有某种空字节" ...

I know what to do in the case that the new content isn't the perfect length, but I don't know how. :/ Maybe if there was some sort of "empty byte"...

编辑#2:

因此,如上所述, fs.open (带有 r + 标志选项)+ fs.write 允许我覆盖文件,而无需读取整个文件,这真是太棒了.现在,我遇到了一个新问题.让我们获取以下文件:

So as said above, fs.open (with r+ flags option) + fs.write allow me to overwrite the content in a file without reading the entire file, which is terrific. Now with this I am running into a new problem. Let's take the following file:

one\n
two\n
three\n

如果我在字节0处 fs.open 然后是 fs.write 是",我最终得到:

If I fs.open at byte 0 and then fs.write "yes", I end up with:

yes\n
two\n
three\n

如果我执行相同的操作,但使用 fs.write "niet",我最终得到:

If I do the same but instead fs.write "niet", I end up with:

niettwo\n
three\n

请注意将 \ n 字符替换为"t"的原因,这是因为 fs.write 如何通过使用 r + <替换字节来工作/code>在 fs.open 中.这是我现在要解决的问题.

Notice how the \n character was replaced with the "t", this is because of how fs.write works by replacing bytes when using r+ in fs.open. This is the problem I am trying to solve right now.

人们将如何做这样的事情:从该字节到该字节,用其他字节替换",所以我的功能可能类似于 function replaceBytes(filePath,newBytes,startByte,endByte),它将替换从 startByte endByte ,无论 newBytes 多长时间,它是短于还是长于 endByte-startByte .

How would one go about doing something like "from this byte to this byte, replace it with these other bytes" so my function could be something like function replaceBytes(filePath, newBytes, startByte, endByte) and that would replace only from startByte to endByte, no matter how long newBytes, whether it be shorter or longer than the length of endByte - startByte.

编辑#3:

好的,我发现了新内容比被替换的旧内容长的情况.多亏了 \ x00 ,我才能够弄清楚它.如果新内容和旧内容的长度相同,那么就不难弄清了,因为这里无事可做.

OK, I figured out the case where the new content is longer than the old content that is being replaced. Thanks to \x00, I've been able to figure it out. In case both the new and the old content are the same length, that's not hard to figure out as there's nothing to do there.

但是,旧内容短于新内容的情况仍然无法解决.

But the case where the old content is shorter than the new content, that's still unresolved.

对于那些好奇的人来说,这是旧内容比新内容更长的工作代码:

For those curious, this is the working code for old content longer than new content: https://github.com/noedit/file/blob/592a35134440a03d3e3c3e366f6cda7f565c11aa/lib/replaceBytes.js#L27-L34

尽管确实在其中放置了一个空字节,这取决于编辑器,但它可能显示为字符,因此看起来很奇怪.:/

Although it does put a null byte in there, which depending on the editor, it may show up as a character and thus looking weird. :/

推荐答案

如您所见,具有 r + 模式的 fs.write 允许您覆盖字节.这足以满足添加和删除的片段长度完全相同的情况.

As you've discovered, fs.write with r+ mode allows you to overwrite bytes. This suffices for the case where the added and deleted pieces are exactly the same length.

当添加的文本短于删除的文本时,我建议您不要像在其中一项编辑中所建议的那样用 \ x00 字节填充.这些字符在大多数类型的文件中都是完全有效的字符(在源代码中,它们通常会导致编译器/解释器抛出错误).

When the added text is shorter than the deleted text, I advise that you not fill in with \x00 bytes, as you suggest in one of your edits. Those are perfectly valid characters in most types of files (in source code, they will usually cause the compiler/interpreter to throw an error).

简而言之,这通常是不可能的.这不是一个抽象问题.在文件系统级别,文件存储在连续字节的块中.没有从文件中间插入/删除文件的通用方法.

The short answer is that this is not generally possible. This is not an abstraction issue; at the file system level, files are stored in chunks of contiguous bytes. There is no generic way to insert/remove from the middle of a file.

执行此操作的正确方法是查找需要更改的第一个字节,然后写入文件的其余部分(除非到达添加/删除相同字节数的位置,在这种情况下,您可以停止写).

The correct way to do this is to seek to the first byte you need to change, and then write the rest of the file (unless you get to a point at which you've added/deleted the same number of bytes, in which case you can stop writing).

为了避免在长时间写入或类似操作时崩溃的问题,通常先写入一个临时文件位置,然后再将 mv 写入您希望的实际文件,然后写入临时文件保存.

In order to avoid issues with crashing during a long write or something like that, it is common to write to a temporary file location and then mv the temporary file in place of the actual file you wish to save.

这篇关于Node.js v0.10:替换文件中的某些字节而不读取整个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆