当被多个数据结构引用时更新记录 [英] Updating record when referenced by multiple data structures

查看:108
本文介绍了当被多个数据结构引用时更新记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个记录,例如 Person ,我希望能够通过多个数据结构查看这个人。也许有一个按姓名的索引,另一个由该人的邮政编码索引,以及该人目前的经度和纬度的另一个索引。也许还有更多的数据结构。所有这些都是因为我需要高效地查找一个或多个具有不同标准的人员。

如果我只需要阅读个人的属性,这没有问题。但是现在假设我需要使用这些数据结构之一来查找一个人,然后更新个人的数据。



在OOP语言中,每个数据结构都会指向内存中的同一个人。所以当你更新一个时,你也隐式地更新了其他数据结构的参照物。这几乎是副作用和杂质的定义。我知道这与Haskell范式完全相反,我不希望Haskell以这种方式工作。

那么,Haskell的方法是什么?要清楚的是,问题是这样的:我通过一个数据结构查找一个人,并且将该人(也可能是其他任意数据)传递给 ArbitraryData - >类型的函数。人 - >人。我如何在所有各种查找结构中传播这种变化?作为Haskell的新手,我的第一本能是用新更新的人重建每个查找结构,每次我更新一个人。但是,这似乎是一个很大的仪式,我有很多机会让GHC无法察觉,而且一点都不优雅。哈斯克尔以其优雅着称,我无法想象它对这样一个常见和基本问题缺乏优雅的解决方案。所以我想我错过了一些东西。



作为参考,这个问题扩展了我在以下问题中讨论的一些问题:



相同数据的多种查找结构:内存重复?



Haskell中模拟对象的身份



编辑

解决方案只是在我的脑海中:不要在主状态下维护每个查找结构的副本。只保留所有人员的单一列表,这是我们更新人员时唯一需要更新的内容。每次您需要通过邮政编码查找时,将所有人的列表传递到生成高效的邮政编码数据结构的函数中。然后执行查找结果。



我不知道这是否有效。如果它导致CPU在每次使用时实际重新计算查找结构,这是不可接受的。但我知道Haskell有时可以避免重新评估相同的表达式。不幸的是,当是这种情况时,我仍然没有弄清楚。所以我不知道这种方法是否可行。



换句话说:我可以编写我的函数,就好像它们在计算每次查找时,实际上GHC都会将其优化掉,以防底层数据没有改变的情况?因为这对我上面提到的问题来说是一个很好的解决方案。

既然我回答了这个问题,Freenode的#haskell中的一些人推荐了可选的预制解决方案: /hackage.haskell.org/package/ixset/docs/Data-IxSet.htmlrel =nofollow> Data.IxSet

  • donri建议 Data.Store ,据说这是提供镜片的原因。





    你可以创建一个包含查询表的数据结构,以及一个 > Vector 实际 Person s。查找表将为您提供 Int Int s的列表(而不是 Person Person s)列表,这是 Vector Person 的索引。例如:

      data PersonStuff = PersonStuff {
    persons :: Vector Person,
    firstNameLookupTable :: LookupTable名称,
    ...
    }

    数据LookupTable a = LookupTable {
    table :: Map a Int,
    update :: Person - >人 - >映射Int - > Map a Int

    update 函数被赋予旧的 Person ,更新后的 Person ,并且只有在相关细节有改变。当你通过方便的 PersonStuff 函数修改 Person 时,这些函数会更新所有查找表你用所有相关数据返回一个新的 PersonStuff 。这可以为快速查找纯数据结构。



    您可以使用 updatePeopleWithFirstName :: Name - > (人物 - >人物) - > PersonStuff - > PersonStuff 将会得到所有名字的人,应用 Person - > Person 给它们中的每一个,修改它们在 Vector 中的条目,并使用 update 函数来更新所有lookupTables。


    Suppose I have a record, e.g. Person, and I want to be able to look this person up through multiple data structures. Maybe there's an index by name, another index by the person's zip code, and another index by the person's current latitude and longitude. And maybe many more data structures. All of which exist because I need to efficiently look up a person or persons with different criteria.

    If I just need to read a person's attributes, this is no problem. But now suppose I need to look up a person using one of these data structures and then update the person's data.

    In an OOP language, each data structure would point to the same person in memory. So when you update one, you're implicitly updating the referents of the other data structures as well. This is pretty much the definition of side effects and impurity. I know it's totally counter to the Haskell paradigm, and I'm not expecting Haskell to work this way.

    So, what is the Haskell-ish way to do it? To be clear, the problem is this: I look up a person by one data structure, and I pass that person (and maybe some other arbitrary data) into a function of type ArbitraryData -> Person -> Person. How do I propagate this change across all the various lookup structures?

    As a relative newcomer to Haskell, my first instinct is to reconstruct every lookup structure with the newly updated person, every time I update a person. But that seems like a lot of ceremony, with plenty of opportunity for me to screw up in a way GHC can't detect, and not at all elegant. Haskell is known for its elegance, and I can't imagine it lacks an elegant solution to such a common and basic problem. So I think I'm missing something.

    For reference, this question expands on some of the issues I was discussing in the following questions:

    Multiple lookup structures for same data: Memory duplication?

    Identity of simulation objects in Haskell

    Edit

    One solution that just crossed my mind: Don't maintain a copy of each lookup structure in your main state. Just keep one single list of all persons in existence, and that's the only thing we need to update when we update a person. Every time you need to lookup by, say, zip code, pass the list of all persons into a function that generates the efficient by-zip-code data structure. Then perform the lookup on the result.

    I don't know if this would be efficient. If it results in the CPU actually recomputing the lookup structure on each use, it's unacceptable. But I know Haskell sometimes can avoid reevaluating identical expressions. Unfortunately, I still haven't figured out when this is the case. So I don't know if this approach is viable.

    So in other words: Can I write my functions as if they're computing the lookup each time, when in fact GHC will optimize it away for cases where the underlying data hasn't changed? Because that would be a very elegant solution to the problem I've identified above.

    解决方案

    Since I answered this, a few people in #haskell on Freenode recommended alternative, premade solutions:


    You can make a data structure that contains your lookup tables, as well as a Vector of actual Persons. The lookup tables will give you an Int or a list of Ints (rather than a Person or a list of Persons) which is the index into the Vector Person. For example:

    data PersonStuff = PersonStuff {
                                     persons              :: Vector Person,
                                     firstNameLookupTable :: LookupTable Name,
                                     ...
                                   }
    
    data LookupTable a = LookupTable {
                                       table  :: Map a Int,
                                       update :: Person -> Person -> Map a Int -> Map a Int
                                     }
    

    The update function is given the old Person, the updated Person, and will update the table only if the relevant details have changed. When a Person is modified through the convenient PersonStuff functions you'll write, those functions will update all the lookup tables for you, returning a new PersonStuff with all associated data. This makes for a pure data structure with quick lookups.

    You can make functions like updatePeopleWithFirstName :: Name -> (Person -> Person) -> PersonStuff -> PersonStuff that will get all the people with a first name, apply a Person -> Person to each of them, modify their entries in the Vector, and use the update functions to update all of the lookupTables.

    这篇关于当被多个数据结构引用时更新记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆