通过文件描述符(fd)而不是文件名区分/比较两个文件 [英] Diff/compare two files by file descriptor (fd) instead of file name

查看：103 发布时间：2020/4/23 11:28:41 c linux diff patch mmap

本文介绍了通过文件描述符(fd)而不是文件名区分/比较两个文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在Linux中，可以使用 c ，以使用通用格式(即统一的diff，如使用命令行diff实用程序)来生成存储在内存中的两个文件的差异/补丁?

Is there any way in Linux, using c, to generate a diff/patch of two files stored in memory, using a common format (ie: unified diff, like with the command-line diff utility)?

我正在一个系统上运行，该系统在内存中生成两个文本文件，并且没有可用的外部存储，也不需要外部存储.我需要创建两个文件的逐行差异，并且由于它们是mmap格式的，因此它们没有文件名，这使我无法简单地调用system("diff file1.txt file2.txt").

I'm working on a system where I generate two text files in memory, and no external storage is available, or desired. I need to create a line-by-line diff of the two files, and since they are mmap'ed, they don't have file names, preventing me from simply calling system("diff file1.txt file2.txt").

我有可供使用的文件描述符(fd s)，这是我唯一的数据入口点.有什么办法可以通过比较两个打开的文件来生成差异/补丁?如果实现是MIT/BSD许可的(即非GPL)，那就更好了.

I have file descriptors (fds) available for use, and that's my only entry point to the data. Is there any way to generate a diff/patch by comparing the two open files? If the implementation is MIT/BSD licensed (ie: non-GPL), so much the better.

谢谢.

Considering the requirements, the best option would be to implement your own in-memory diff -au. You could perhaps adapt the relevant parts of OpenBSD's diff to your needs.

这里概述了如何通过管道使用/usr/bin/diff命令来获取存储在内存中的两个字符串之间的统一差异:

Here's an outline of one how you can use the /usr/bin/diff command via pipes to obtain the unified diff between two strings stored in memory:

创建三个管道: I1 ， I2 和 O .

派生一个子进程.

在子进程中:

将管道 I1 和 I2 的读取端移至描述符3和4，并将管道 O 的写入端移至描述符3.描述符1.

Move the read ends of pipes I1 and I2 to descriptors 3 and 4, and the write end of pipe O to descriptor 1.

在子进程中关闭这些管道的另一端.打开用于从/dev/null读取的描述符0，以及用于写入/dev/null的描述符2.

Close the other ends of those pipes in the child process. Open descriptor 0 for reading from /dev/null, and descriptor 2 for writing to /dev/null.

执行execl("/usr/bin/diff", "diff", "-au", "/proc/self/fd/3", "/proc/self/fd/4", NULL);

这将在子进程中执行diff二进制文件.它将读取两个管道 I1 和 I2 的输入，并将差异输出到管道 O .

This executes the diff binary in the child process. It will read the inputs from the two pipes, I1 and I2, and output the differences to pipe O.

父进程关闭 I1 和 I2 管道的读取端，以及 O 管道的写入端.

The parent process closes the read ends of the I1 and I2 pipes, and the write end of the O pipe.

父进程将比较数据写入 I1 和 I2 管道的写入端，并从的读取端读取差异O 管道.

The parent process writes the comparison data to the write ends of I1 and I2 pipes, and reads the differences from the read end of the O pipe.

请注意，父进程必须使用 select() 或 poll() 或类似方法(最好使用非阻塞描述符)以避免死锁. (如果父级和子级都尝试同时读取或写入，则会发生死锁.)通常，父进程必须避免不惜一切代价进行阻塞，因为这很可能导致死锁.

Note that the parent process must use select() or poll() or a similar method (preferably with nonblocking descriptors) to avoid deadlock. (Deadlock occurs if both parent and child try to read at the same time, or write at the same time.) Typically, the parent process must avoid blocking at all costs, because that is likely to lead to a deadlock.

当输入数据已被完全写入时，父进程必须关闭管道的相应写端，以便子进程检测到输入端. (除非发生错误，否则必须在子进程关闭 O 管道结束之前关闭写结束.)

When the input data has been completely written, the parent process must close the respective write end of the pipe, so that the child process detects the end-of-input. (Unless an error occurs, the write ends must be closed before the child process closes its end of the O pipe.)

当父进程注意到 O 管道中没有更多数据可用(read()返回0)时，它要么已经关闭了 I1 和 I2 管道，否则发生错误.如果没有错误，则表明数据传输已完成，并且可以获取子进程.

When the parent process notices that no more data is available in the O pipe (read() returning 0), either it has already closed the write ends of the I1 and I2 pipes, or there was an error. If there is no error, the data transfer is complete, and the child process can be reaped.

父进程使用例如waitpid().请注意，如果存在任何差异，diff会返回退出状态1.

The parent process reaps the child using e.g. waitpid(). Note that if there were any differences, diff returns with exit status 1.

您可以使用第四个管道从子进程接收标准错误流； diff通常不输出任何标准错误.

You can use a fourth pipe to receive the standard error stream from the child process; diff does not normally output anything to standard error.

您可以使用第五条管道，在子管道中将标记为O_CLOEXEC并带有fcntl()的一端写入子级，以检测execl()错误. O_CLOEXEC标志表示在执行另一个二进制文件时描述符已关闭，因此父进程可以通过检测读取端的数据结束(read()返回0)来检测diff命令的成功启动.如果execl()失败，则孩子可以例如将errno值(作为十进制数字或int)写入此管道，以便父进程可以读取失败的确切原因.

You can use a fifth pipe, write end marked O_CLOEXEC with fcntl() in the child, to detect execl() errors. O_CLOEXEC flag means the descriptor is closed when executing another binary, so the parent process can detect successful starting of the diff command by detecting the end-of-data in the read end (read() returning 0). If the execl() fails, the child can e.g. write the errno value (as a decimal number, or as an int) to this pipe, so that the parent process can read the exact cause for the failure.

总共，complete方法(既记录标准错误，又检测exec错误)使用10个描述符.在正常的应用程序中这应该不是问题，但可能很重要-例如，考虑具有传入连接使用的描述符的面向Internet的服务器.

In all, the complete method (that both records standard error, and detects exec errors) uses 10 descriptors. This should not be an issue in a normal application, but may be important -- for example, consider an internet-facing server with descriptors used by incoming connections.

这篇关于通过文件描述符(fd)而不是文件名区分/比较两个文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

通过文件描述符(fd)而不是文件名区分/比较两个文件 [英] Diff/compare two files by file descriptor (fd) instead of file name

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

通过文件描述符(fd)而不是文件名区分/比较两个文件 [英] Diff/compare two files by file descriptor (fd) instead of file name

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭