管道 - 管道内的多个输出文件 [英] Conduit - Multiple output file within the pipeline

查看:156
本文介绍了管道 - 管道内的多个输出文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个程序,将输入文件分割成多个文件(Shamir的秘密共享方案)。



下面是我想象的流水线:


  • source: 使用Conduit.Binary.sourceFile从输入中读取

  • conduit:取得ByteString,产生[ByteString]

  • sink:从管道获取[ByteString],并将每个ByteString(在[ByteString]中)写入其相应的文件。 (例如,如果我们的输入[ByteString]被称为bsl,那么 bsl !! 0 将被写入文件0, bsl !! 1 到文件1等等)



我发现了一个关于多个输入文件的问题这里,但在他们的情况下,整个管道为每个输入文件运行一次,而对于我的程序I '写入到管道中的多个输出文件



我还在查看Conduit源代码这里看看我是否可以实现一个multiSinkFile我自己,但我的消费者类型的sinkFile稍微困惑,更多的,如果我试图深入挖掘...(我仍然是一个初学者)



所以,问题是,我应该如何去实现一个允许多个文件的multiSinkFile函数o写成水槽的一部分?



任何提示都会被赞赏!



澄清

假设我们希望在包含ABCDEF二进制值的文件(分为3部分)上执行Shamir的Secret共享。



(所以我们有我们的输入文件 srcFile 和我们的输出文件 outFile0 outFile1 outFile2

我们首先从文件中读取ABC,并进行处理,这会给我们一个列表,例如 [133,426,765] 。所以133会被写入 outFile0 426 outFile1 765 outFile2 。然后我们从 srcFile 中读取DEF,对其进行处理,并将相应的输出写入每个输出文件。



编辑:



感谢您的回答。我花了一段时间来了解ZipSinks等等的情况,并且我写了一个简单的测试程序,它接受源文件的输入,并将其写入3个输出文件。

  { - #LANGUAGE NoImplicitPrelude# - } 
{ - #LANGUAGE RankNTypes # - }
{ - #LANGUAGE OverloadedStrings# - }
import ClassyPrelude.Conduit
import Safe(atMay)
import Text.Printf
import Filesystem.Path.CurrentOS (decodeString,encodeString)
导入Control.Monad.Trans.Resource(runResourceT,ResourceT(..))

- 获取给定基本(文件)路径的输出文件名和分割号
getFileName :: FilePath - > Int - > FilePath
getFileName basePath splitNumber = decodeString $ encodeString basePath ++。 ++ printf%03dsplitNumber

- 给定文件路径生成器(需要一个Int)和分割数
获取sink文件idxSinkFile :: MonadResource m
=> (Int→> FilePath)
- > Int
- > Consumer [ByteString] m()
idxSinkFile mkFP splitNumber =
concatMapC(flip atMay splitNumber)= $ = sinkFile(mkFP splitNumber)

sinkMultiFiles :: MonadResource m
=> (Int→> FilePath)
- > [Int]
- > Sink [ByteString] m()
sinkMultiFiles mkFP splitNumbers = getZipSink $ otraverse_(ZipSink.idxSinkFile mkFP)splitNumbers

simpleConduit :: Int - > Conduit ByteString(ResourceT IO)[ByteString]
simpleConduit num = mapC(replicate num)
$ b $ main :: IO()
main = do
let mkFP = getFileName test.txt
splitNumbers = [0..2]
runResourceT $ sourceFiletest.txt$$ simpleConduit(length splitNumbers)= $ sinkMultiFiles mkFP splitNumbers


解决方案

有很多方法可以做到这一点,具体取决于您是否想动态增长数量您正在写入的文件,或只保留一个固定的数字。下面是一个固定列表文件的例子:

  { - #LANGUAGE NoImplicitPrelude# - } 
{ - #LANGUAGE OverloadStrings# - }
{ - #LANGUAGE ViewPatterns# - }
import ClassyPrelude.Conduit
import Safe(atMay)

idxSinkFile :: MonadResource m
=> (Int→> FilePath)
- > Int
- >消费者[ByteString] m()
idxSinkFile mkFP idx =
concatMapC(flip atMay idx)= $ = sinkFile fp
其中
fp = mkFP idx

sinkMultiFiles :: MonadResource m
=> (Int→> FilePath)
- > [Int]
- > Sink [ByteString] m()
sinkMultiFiles mkFP indices = getZipSink $ otraverse_(ZipSink.idxSinkFile mkFP)indices

someFunc :: ByteString - > [ByteString]
someFunc(decodeUtf8 - > x)= map encodeUtf8 [x,toUpper x,toLower x]

mkFP :: Int - > FilePath
mkFP 0 =file0.txt
mkFP 1 =file1.txt
mkFP 2 =file2.txt

src :: Monad m => Producer m ByteString
src = yieldMany $ map encodeUtf8 $ wordsHello There World!

main :: IO()
main = do
let indices = [0..2]
runResourceT $ src $$ mapC someFunc = $ sinkMultiFiles mkFP indices
forM_ indices $ \idx - >做
让fp = mkFP idx
bs< - readFile fp
print(fp,bs :: ByteString)

您可以与Haskell FP School在线试用

I'm writing a programme where an input file is split into multiple files (Shamir's Secret Sharing Scheme).

Here's the pipeline I'm imagining:

  • source: use Conduit.Binary.sourceFile to read from the input
  • conduit: Takes a ByteString, produces [ByteString]
  • sink: Takes [ByteString] from the conduit, and write each ByteString (in [ByteString]) to their corresponding file. (say if our input [ByteString] is called bsl, then bsl !! 0 will be written to file 0, bsl !! 1 to file 1 and so on)

I found a question regarding multiple input files here, but in their case the whole pipeline is run once for each input file, whereas for my programme I'm writing to multiple output files within the pipeline.

I'm also looking through the Conduit source code here to see if I can implement a multiSinkFile myself, but I'm slightly confused by the Consumer type of sinkFile, and more so if I try to dig deeper... (I'm still a beginner)

So, the question is, how should I go about implementing a function like multiSinkFile which allows multiple files to be written as part of a sink?

Any tips is appreciated!

Clarification

Let's say we want to do Shamir's Secret sharing on the file containing binary value of "ABCDEF" (into 3 parts).

(So we have our input file srcFile and our output files outFile0,outFile1 and outFile2)

We first read "ABC" from the file, and do the processing which will give us a list of, say, ["133", "426", "765"]. so "133" will be written to outFile0, "426" to outFile1 and "765" to outFile2. And then we read "DEF" from srcFile, do processing on it, and write the corresponding outputs to each output file.

EDIT:

Thank you for your answers. I took sometime to understand what's going with ZipSinks etc, and I've written a simple test program which takes the source file's input and simply write it to 3 output files. Hopefully this will help others in the future.

{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE OverloadedStrings #-}
import ClassyPrelude.Conduit 
import Safe (atMay)
import Text.Printf
import Filesystem.Path.CurrentOS (decodeString, encodeString)
import Control.Monad.Trans.Resource (runResourceT, ResourceT(..))

-- get the output file name given the base (file) path and the split number
getFileName :: FilePath -> Int -> FilePath
getFileName basePath splitNumber = decodeString $ encodeString basePath ++ "." ++ printf "%03d" splitNumber

-- Get the sink file, given a filepath generator (that takes an Int) and the split number
idxSinkFile :: MonadResource m
            => (Int -> FilePath)
            -> Int
            -> Consumer [ByteString] m ()
idxSinkFile mkFP splitNumber =
    concatMapC (flip atMay splitNumber) =$= sinkFile (mkFP splitNumber)

sinkMultiFiles :: MonadResource m
               => (Int -> FilePath)
               -> [Int]
               -> Sink [ByteString] m ()
sinkMultiFiles mkFP splitNumbers = getZipSink $ otraverse_ (ZipSink . idxSinkFile mkFP) splitNumbers

simpleConduit :: Int -> Conduit ByteString (ResourceT IO) [ByteString]
simpleConduit num = mapC (replicate num)

main :: IO ()
main = do
    let mkFP = getFileName "test.txt"
        splitNumbers = [0..2]
    runResourceT $ sourceFile "test.txt" $$ simpleConduit (length splitNumbers) =$ sinkMultiFiles mkFP splitNumbers

解决方案

There are a number of ways to do it, depending on whether you want to dynamically grow the number of files you're writing to, or just keep a fixed number. Here's one example with a fixed list of files:

{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ViewPatterns      #-}
import           ClassyPrelude.Conduit
import           Safe                  (atMay)

idxSinkFile :: MonadResource m
            => (Int -> FilePath)
            -> Int
            -> Consumer [ByteString] m ()
idxSinkFile mkFP idx =
    concatMapC (flip atMay idx) =$= sinkFile fp
  where
    fp = mkFP idx

sinkMultiFiles :: MonadResource m
               => (Int -> FilePath)
               -> [Int]
               -> Sink [ByteString] m ()
sinkMultiFiles mkFP indices = getZipSink $ otraverse_ (ZipSink . idxSinkFile mkFP) indices

someFunc :: ByteString -> [ByteString]
someFunc (decodeUtf8 -> x) = map encodeUtf8 [x, toUpper x, toLower x]

mkFP :: Int -> FilePath
mkFP 0 = "file0.txt"
mkFP 1 = "file1.txt"
mkFP 2 = "file2.txt"

src :: Monad m => Producer m ByteString
src = yieldMany $ map encodeUtf8 $ words "Hello There World!"

main :: IO ()
main = do
    let indices = [0..2]
    runResourceT $ src $$ mapC someFunc =$ sinkMultiFiles mkFP indices
    forM_ indices $ \idx -> do
        let fp = mkFP idx
        bs <- readFile fp
        print (fp, bs :: ByteString)

You can try this online with FP School of Haskell.

这篇关于管道 - 管道内的多个输出文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆