从 Process.StandardOutput 捕获二进制输出 [英] Capturing binary output from Process.StandardOutput

查看:19
本文介绍了从 Process.StandardOutput 捕获二进制输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 C#(在 SuSE 上的 Mono 2.8 下运行的 .NET 4.0)中,我想运行一个外部批处理命令并以二进制形式捕获其输出.我使用的外部工具称为samtools"(samtools.sourceforge.net),除其他外,它可以从称为 BAM 的索引二进制文件格式返回记录.

In C# (.NET 4.0 running under Mono 2.8 on SuSE) I would like to run an external batch command and capture its ouput in binary form. The external tool I use is called 'samtools' (samtools.sourceforge.net) and among other things it can return records from an indexed binary file format called BAM.

我使用 Process.Start 来运行外部命令,并且我知道我可以通过重定向 Process.StandardOutput 来捕获它的输出.问题是,这是一个带有编码的文本流,所以它不能让我访问输出的原始字节.我发现的几乎可行的解决方案是访问底层流.

I use Process.Start to run the external command, and I know that I can capture its output by redirecting Process.StandardOutput. The problem is, that's a text stream with an encoding, so it doesn't give me access to the raw bytes of the output. The almost-working solution I found is to access the underlying stream.

这是我的代码:

        Process cmdProcess = new Process();
        ProcessStartInfo cmdStartInfo = new ProcessStartInfo();
        cmdStartInfo.FileName = "samtools";

        cmdStartInfo.RedirectStandardError = true;
        cmdStartInfo.RedirectStandardOutput = true;
        cmdStartInfo.RedirectStandardInput = false;
        cmdStartInfo.UseShellExecute = false;
        cmdStartInfo.CreateNoWindow = true;

        cmdStartInfo.Arguments = "view -u " + BamFileName + " " + chromosome + ":" + start + "-" + end;

        cmdProcess.EnableRaisingEvents = true;
        cmdProcess.StartInfo = cmdStartInfo;
        cmdProcess.Start();

        // Prepare to read each alignment (binary)
        var br = new BinaryReader(cmdProcess.StandardOutput.BaseStream);

        while (!cmdProcess.StandardOutput.EndOfStream)
        {
            // Consume the initial, undocumented BAM data 
            br.ReadBytes(23);

//... 更多解析如下

// ... more parsing follows

但是当我运行它时,我读取的前 23 字节不是输出中的前 23 字节,而是下游数百或数千字节.我假设 StreamReader 做了一些缓冲,所以底层流已经提前说 4K 到输出中.底层流不支持回溯.

But when I run this, the first 23bytes that I read are not the first 23 bytes in the ouput, but rather somewhere several hundred or thousand bytes downstream. I assume that StreamReader does some buffering and so the underlying stream is already advanced say 4K into the output. The underlying stream does not support seeking back to the start.

我被困在这里了.有没有人有运行外部命令并以二进制形式捕获其标准输出的可行解决方案?输出可能非常大,所以我想流式传输.

And I'm stuck here. Does anyone have a working solution for running an external command and capturing its stdout in binary form? The ouput may be very large so I would like to stream it.

任何帮助表示赞赏.

顺便说一下,我目前的解决方法是让 samtools 以文本格式返回记录,然后解析这些记录,但这很慢,我希望通过直接使用二进制格式来加快速度.

By the way, my current workaround is to have samtools return the records in text format, then parse those, but this is pretty slow and I'm hoping to speed things up by using the binary format directly.

推荐答案

使用 StandardOutput.BaseStream 是正确的方法,但您不得使用 cmdProcess.StandardOutput 的任何其他属性或方法.例如,访问 cmdProcess.StandardOutput.EndOfStream 将导致 StandardOutputStreamReader 读取流的一部分,删除要访问的数据.

Using StandardOutput.BaseStream is the correct approach, but you must not use any other property or method of cmdProcess.StandardOutput. For example, accessing cmdProcess.StandardOutput.EndOfStream will cause the StreamReader for StandardOutput to read part of the stream, removing the data you want to access.

相反,只需从 br 读取并解析数据(假设您知道如何解析数据,并且不会读取流的末尾,或者愿意捕获 EndOfStreamException).或者,如果您不知道数据有多大,请使用 Stream.CopyTo 将整个标准输出流复制到新文件或内存流中.

Instead, simply read and parse the data from br (assuming you know how to parse the data, and won't read past the end of stream, or are willing to catch an EndOfStreamException). Alternatively, if you don't know how big the data is, use Stream.CopyTo to copy the entire standard output stream to a new file or memory stream.

这篇关于从 Process.StandardOutput 捕获二进制输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆