阅读Haskell中的长数据结构 [英] Reading long data structure in Haskell

查看:89
本文介绍了阅读Haskell中的长数据结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须从文本文件(空格分隔)读取数据结构,每行一个数据项。我的第一个尝试将是

  data Person = Person {name :: String,surname :: String,age :: Int,。 ..数十个其他字段}派生(显示,...)

main = do
字符串< - readFilefilename.txt
let people = readPeople字符串
do_something people
$ b readPeople s = map(readPerson.words)(lines s)

readPerson row = Person(read(row !! 0))(read(行!! 1))(read(row !! 2))(read(row !! 3))...(read(row !! dozens))

这段代码有效,但 readPerson 的代码很糟糕:我必须复制粘贴<$ c

因此,作为第二次尝试,我认为在我的数据结构中所有的字段都是$ c> read(row !! n))我可以利用 Person 函数的Currying,并将它的参数传递给它。



Hoogle中肯定有些东西,但我无法弄清楚签名类型......没关系,它看起来很简单,我可以自己写:

  readPerson row = readFields Person行

readFields f [x] =(fx)
readFields f(x:xs)= readFields(f(read x))xs

看起来更好的编码风格!



但是,它不能编译! 发生检查:无法构造无限类型:t〜String - > t



的确,函数 f 我传递给 readFields 在每个调用中都有不同的类型签名;这就是为什么我不能确定它的类型签名......所以,我的问题是:什么是最简单和优雅的方式来读取一个数据结构与许多领域?

/ p>

解决方案

首先,包含所有顶级声明的类型总是一个好习惯。它使代码结构更好,可读性更强。



一个简单的方法就是利用 applicative functors 。在解析过程中,您有一个有效的计算,其效果是消耗部分输入,其结果是一个解析的片段。我们可以使用 State monad来跟踪剩余的输入,并创建一个多态函数,它消耗输入的一个元素, read

  import Control.Applicative 
import Control.Monad.State

数据Person = Person {name :: String,surname :: String,age :: Int}
导出(Eq,Ord,Show,Read)

readField ::(读取a) => State [String] a
readField = state $ \(x:xs) - > (阅读x,xs)

为了解析许多这样的字段,我们使用< $> < /> 组合子,它们允许按如下顺序操作:

  readPerson :: [String]  - > Person 
readPerson = evalState $ Person< $> readField< *> readField< *> readField

表达式 Person< $> ... 类型为 State [String] Person ,我们运行 evalState 输入来运行有状态计算并提取输出。我们仍然需要具有与字段相同数量的 readField ,但不必使用索引或显式类型。



对于一个真正的程序,您可能会包含一些错误处理,因为 read 会失败并显示异常,以及patterm (x:xs)如果输入列表太短。使用完整的解析器,例如 parsec attoparsec 允许您使用相同的符号并进行适当的错误处理,自定义各个字段的解析等。






更通用的方法是使用泛型的。然后,您只需派生 Generic 。如果您有兴趣,我可以举一个例子。



或者,您可以使用现有的序列化软件包,或者是二进制文件,如 cereal 二进制,或者基于文本的文件,如 aeson yaml ,它们通常允许您同时执行这两种操作(自动派生(de)序列化来自 Generic 或提供您自定义的一个)。


I have to read a data structure from a text file (space separated), one data item per line. My first tentative would be

data Person = Person {name :: String, surname :: String, age :: Int, ... dozens of other fields} deriving (Show,...)

main = do
  string <- readFile "filename.txt"
  let people = readPeople string
  do_something people

readPeople s = map (readPerson.words) (lines s)

readPerson row = Person (read(row!!0)) (read(row!!1)) (read(row!!2)) (read(row!!3)) ... (read(row!!dozens))

This code works, but the code for readPerson is terrible: I have to copy-paste the read(row!!n)) for all fields in my data structure!

So, as a second attempt, I think that I might exploit Currying of the Person function, and pass it the arguments one at the time.

Uhm, there must be something in Hoogle, but I cannot figure out the type signature ... Never mind, it looks simple enough and I can write it myself:

readPerson row = readFields Person row

readFields f [x] = (f x)
readFields f (x:xs) = readFields (f (read x)) xs

Ahh, looks much better coding style!

But, it does not compile! Occurs check: cannot construct the infinite type: t ~ String -> t

Indeed, the function f I am passing to readFields has a different type signature in each invocation; that's why I could not figure its type signature ...

So, my question is: what is the simplest and elegant way to read a data structure with many fields?

解决方案

First, it's always a good practice to include types for all top-level declaration. It makes code better structured and much more readable.

One simple way how to achieve this is to take advantage of applicative functors. During parsing, you have an "effectful" computation where the effect is consuming part of the input and its result is one parsed piece. We can use the State monad to track the remaining input, and create a polymorphic function that consumes one element of the input and reads it:

import Control.Applicative
import Control.Monad.State

data Person = Person { name :: String, surname :: String, age :: Int }
    deriving (Eq, Ord, Show, Read)

readField :: (Read a) => State [String] a
readField = state $ \(x : xs) -> (read x, xs)

And in order to parse many such fields we use the <$> and <*> combinators which allow to sequence operations as follows:

readPerson :: [String] -> Person
readPerson = evalState $ Person <$> readField <*> readField <*> readField

Expression Person <$> ... is of type State [String] Person and we run evalState on given input to run the stateful computation and extract the output. We still need to have the same number of readField as many times as there are fields, but without having to use indices or explicit types.

For a real program you'd probably include some error handling, as read fails with an exception, as well as the patterm (x : xs) if the input list is too short. Using a full-fledged parser such as parsec or attoparsec allows you to use the same notation and to have proper error handling, customize parsing of individual fields etc.


Even more universal way is to automate wrapping and unwrapping fields into lists using generics. Then you just derive Generic. If you're interested, I can give an example.

Or, you could use an existing serialization package, either a binary one like cereal or binary, or a text-based one such as aeson or yaml, which usually allow you to do both (either automatically derive (de)serialization from Generic or provide your custom one).

这篇关于阅读Haskell中的长数据结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆