Haskell中关系数据的安全建模 [英] Safe modelling of relational data in Haskell

查看:162
本文介绍了Haskell中关系数据的安全建模的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现想要在我的功能程序中建模关系数据非常普遍。例如,当开发一个网站时,我可能希望有以下数据结构来存储关于我的用户的信息:

  data User = User 
{name :: String
,birthDate :: Date
}

接下来,我想存储关于用户在我的网站上发布的消息的数据:

  data Message = Message 
{user :: User
,timestamp :: Date
,content :: String
}

这个数据结构有许多问题:


  • 我们没有任何区分具有相似名称和出生日期的用户。
  • 用户数据将在序列化/反序列化中复制

  • 比较用户需要比较其数据成本高昂。

  • User 字段的更新很脆弱 - 您可以忘记更新所有用户结构。



这些问题是可管理的,而我们的数据可以用树表示。例如,您可以像这样重构:

  data User = User 
{name :: String
,birthDate :: Date
,messages :: [(String,Date)] - 你得到的想法
}

但是,可以将数据塑造为DAG(想象任何多对多关系),甚至可以将其作为一般图形(可能,也许不是)。在这种情况下,我倾向于通过将数据存储在 Map s中来模拟关系数据库:

  newtype Id a = Id整数
类型Table a = Map(Id a)a

这类作品,由于多种原因而不安全和丑陋:


  • 您只是一个 Id 构造函数从无意义的查找中调用。

  • 查找得到也许一个,但通常数据库在结构上确保存在价值。

  • 它笨拙。

  • 很难确保数据的参照完整性。

  • 管理索引(这对性能非常必要)并确保它们的完整性更加困难和笨拙。



是否有解决这些问题的工作?

它看起来像模板哈斯克尔可以解决它们(正如通常那样),但我不想重新发明轮子。 解决方案

ixset 库将会帮助你。它是支持 acid-state 的关系部分的库



关于 ixset 就是它自动为您的数据条目管理键。



例如,您可以创建一对多关系你的数据类型是这样的:

  data User = 
User
{name :: String
,birthDate :: Date
}派生(Ord,Typeable)

数据消息=
消息
{user :: User
,timestamp: :日期
,content :: String
}派生(Ord,Typeable)

实例可索引消息其中
empty = ixSet [ixGen(Proxy :: Proxy User) ]

然后您可以找到特定用户的消息。如果你已经建立了 IxSet 这样的话:

  user1 = User John Doeundefined 
user2 = UserJohn Smithundefined
$ b $ messageSet =
foldr insert empty
[Message user1 undefinedbla
,消息user2 undefinedblu
]

...然后您可以通过 user1 with:

  user1Messages = toList $ messageSet @ = user1 

如果您需要查找消息的用户,只需使用 user 函数就像正常一样。这建立了一个一对多的关系。



现在,对于多对多关系,有这样的情况:

  data User = 
User
{name :: String
,birthDate :: Date
,messages :: [消息]
}派生(Ord,Typeable)

数据消息=
消息
{users :: [User]
,timestamp :: Date
,content :: String
}派生(Ord,Typeable)

.. 。您可以使用 ixFun 创建一个索引,它可以与索引列表一起使用。像这样:

 实例可索引消息其中
空= ixSet [ixFun用户]

实例可索引用户,其中
空= ixSet [ixFun消息]

查找所有消息用户,您仍然使用相同的功能:

  user1Messages = toList $ messageSet @ = user1 

$ b另外,假设你有一个用户索引:

 

userSet =
foldr insert empty
[用户John Doeundefined [messageFoo,messageBar]
,用户John Smithundefined [messageBar]
]

...您可以找到消息的所有用户:

  messageFooUsers = toList $ userSet @ = messageFoo 

如果您不想在添加新用户/消息时更新消息的用户或用户的消息,则应该创建中间数据类型它模拟用户和消息之间的关系,就像在SQL中一样(并且移除用户消息字段):

  data UserMessage = UserMessage {umUser :: User,umMessage :: Message} 

实例可索引UserMessage where
empty = ixSet [ixGen(Proxy :: Proxy User),ixGen(Proxy :: Proxy Message)]

创建一组这些关系可以让你通过用户的消息和消息来查询用户,而不必更新任何东西。



库有一个简单的界面考虑它的作用!



编辑:关于您需要比较的昂贵数据: ixset 仅比较您在索引中指定的字段(所以要在第一个示例中查找用户的所有邮件,它会比较整个用户)。



您可以通过更改 Ord 实例。因此,如果比较用户对您来说代价高昂,您可以添加 userId 字段并将实例Ord User 修改为仅例如,比较这个字段。



这也可以用来解决鸡与鸡蛋问题:如果你有一个id,但是既不是 User ,也不是 Message



然后,您可以简单地创建显式索引为id,通过该id找到用户(使用 userSet @ =(12423 :: Id)),然后执行搜索。


I find it very common to want to model relational data in my functional programs. For example, when developing a web-site I may want to have the following data structure to store info about my users:

data User = User 
  { name :: String
  , birthDate :: Date
  }

Next, I want to store data about the messages users post on my site:

data Message = Message
  { user :: User
  , timestamp :: Date
  , content :: String
  }

There are multiple problems associated with this data structure:

  • We don't have any way of distinguishing users with similar names and birth dates.
  • The user data will be duplicated on serialisation/deserialisation
  • Comparing the users requires comparing their data which may be a costly operation.
  • Updates to the fields of User are fragile -- you can forget to update all the occurences of User in your data structure.

These problems are manageble while our data can be represented as a tree. For example, you can refactor like this:

data User = User
  { name :: String
  , birthDate :: Date
  , messages :: [(String, Date)] -- you get the idea
  }

However, it is possible to have your data shaped as a DAG (imagine any many-to-many relation), or even as a general graph (OK, maybe not). In this case, I tend to simulate the relational database by storing my data in Maps:

newtype Id a = Id Integer
type Table a = Map (Id a) a

This kind of works, but is unsafe and ugly for multiple reasons:

  • You are just an Id constructor call away from nonsensical lookups.
  • On lookup you get Maybe a, but often the database structurally ensures that there is a value.
  • It is clumsy.
  • It is hard to ensure referential integrity of your data.
  • Managing indices (which are very much necessary for performance) and ensuring their integrity is even harder and clumsier.

Is there existing work on overcoming these problems?

It looks like Template Haskell could solve them (as it usually does), but I would like not to reinvent the wheel.

解决方案

The ixset library will help you with this. It's the library that backs the relational part of acid-state, which also handles versioned serialization of your data and/or concurrency guarantees, in case you need it.

The thing about ixset is that it manages "keys" for your data entries automatically.

For your example, one would create one-to-many relationships for your data types like this:

data User =
  User
  { name :: String
  , birthDate :: Date
  } deriving (Ord, Typeable)

data Message =
  Message
  { user :: User
  , timestamp :: Date
  , content :: String
  } deriving (Ord, Typeable)

instance Indexable Message where
  empty = ixSet [ ixGen (Proxy :: Proxy User) ]

You can then find the message of a particular user. If you have built up an IxSet like this:

user1 = User "John Doe" undefined
user2 = User "John Smith" undefined

messageSet =
  foldr insert empty
  [ Message user1 undefined "bla"
  , Message user2 undefined "blu"
  ]

... you can then find messages by user1 with:

user1Messages = toList $ messageSet @= user1

If you need to find the user of a message, just use the user function like normal. This models a one-to-many relationship.

Now, for many-to-many relations, with a situation like this:

data User =
  User
  { name :: String
  , birthDate :: Date
  , messages :: [Message]
  } deriving (Ord, Typeable)

data Message =
  Message
  { users :: [User]
  , timestamp :: Date
  , content :: String
  } deriving (Ord, Typeable)

... you create an index with ixFun, which can be used with lists of indexes. Like so:

instance Indexable Message where
  empty = ixSet [ ixFun users ]

instance Indexable User where
  empty = ixSet [ ixFun messages ]

To find all the messages by an user, you still use the same function:

user1Messages = toList $ messageSet @= user1

Additionally, provided that you have an index of users:

userSet =
  foldr insert empty
  [ User "John Doe" undefined [ messageFoo, messageBar ]
  , User "John Smith" undefined [ messageBar ]
  ]

... you can find all the users for a message:

messageFooUsers = toList $ userSet @= messageFoo

If you don't want to have to update the users of a message or the messages of a user when adding a new user/message, you should instead create an intermediary data type that models the relation between users and messages, just like in SQL (and remove the users and messages fields):

data UserMessage = UserMessage { umUser :: User, umMessage :: Message } 

instance Indexable UserMessage where
  empty = ixSet [ ixGen (Proxy :: Proxy User), ixGen (Proxy :: Proxy Message) ]

Creating a set of these relations would then let you query for users by messages and messages for users without having to update anything.

The library has a very simple interface considering what it does!

EDIT: Regarding your "costly data that needs to be compared": ixset only compares the fields that you specify in your index (so to find all the messages by a user in the first example, it compares "the whole user").

You regulate which parts of the indexed field it compares by altering the Ord instance. So, if comparing users is costly for you, you can add an userId field and modify the instance Ord User to only compare this field, for example.

This can also be used to solve the chicken-and-egg problem: what if you have an id, but neither a User, nor a Message?

You could then simply create an explicit index for the id, find the user by that id (with userSet @= (12423 :: Id)) and then do the search.

这篇关于Haskell中关系数据的安全建模的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆