地理查询性能改进 [英] Datomic query performance improvements

查看:80
本文介绍了地理查询性能改进的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Datomic数据库中,我有一个与此相似的架构:

I have a schema that looks similar to this in a Datomic database:

; --- tenant
{:db/id                 #db/id[:db.part/db]
 :db/ident              :tenant/guid
 :db/unique             :db.unique/identity
 :db/valueType          :db.type/string
 :db/cardinality        :db.cardinality/one
 :db.install/_attribute :db.part/db}
{:db/id                 #db/id[:db.part/db]
 :db/ident              :tenant/name
 :db/valueType          :db.type/string
 :db/cardinality        :db.cardinality/one
 :db.install/_attribute :db.part/db}
{:db/id                 #db/id[:db.part/db]
 :db/ident              :tenant/taks
 :db/valueType          :db.type/ref
 :db/cardinality        :db.cardinality/many
 :db.install/_attribute :db.part/db}

; --- task
{:db/id                 #db/id[:db.part/db]
 :db/ident              :task/guid
 :db/unique             :db.unique/identity
 :db/valueType          :db.type/string
 :db/cardinality        :db.cardinality/one
 :db.install/_attribute :db.part/db}
{:db/id                 #db/id[:db.part/db]
 :db/ident              :task/createdAt
 :db/valueType          :db.type/instant
 :db/cardinality        :db.cardinality/one
 :db.install/_attribute :db.part/db}
{:db/id                 #db/id[:db.part/db]
 :db/ident              :task/name
 :db/valueType          :db.type/string
 :db/cardinality        :db.cardinality/one
 :db.install/_attribute :db.part/db}
{:db/id                 #db/id[:db.part/db]
 :db/ident              :task/subtasks
 :db/valueType          :db.type/ref
 :db/cardinality        :db.cardinality/many
 :db.install/_attribute :db.part/db}

; --- subtask
{:db/id                 #db/id[:db.part/db]
 :db/ident              :subtask/guid
 :db/valueType          :db.type/string
 :db/cardinality        :db.cardinality/one
 :db/unique             :db.unique/identity
 :db.install/_attribute :db.part/db}
{:db/id                 #db/id[:db.part/db]
 :db/ident              :subtask/type
 :db/valueType          :db.type/string
 :db/cardinality        :db.cardinality/one
 :db.install/_attribute :db.part/db}
{:db/id                 #db/id[:db.part/db]
 :db/ident              :subtask/startedAt
 :db/valueType          :db.type/instant
 :db/cardinality        :db.cardinality/one
 :db.install/_attribute :db.part/db}
{:db/id                 #db/id[:db.part/db]
 :db/ident              :subtask/completedAt
 :db/valueType          :db.type/instant
 :db/cardinality        :db.cardinality/one
 :db.install/_attribute :db.part/db}
{:db/id                 #db/id[:db.part/db]
 :db/ident              :subtask/participants
 :db/valueType          :db.type/ref
 :db/cardinality        :db.cardinality/many
 :db.install/_attribute :db.part/db}

 ; --- participant
{:db/id                 #db/id[:db.part/db]
 :db/ident              :participant/guid
 :db/valueType          :db.type/string
 :db/cardinality        :db.cardinality/one
 :db/unique             :db.unique/identity
 :db.install/_attribute :db.part/db}
{:db/id                 #db/id[:db.part/db]
 :db/ident              :participant/name
 :db/valueType          :db.type/string
 :db/cardinality        :db.cardinality/one
 :db.install/_attribute :db.part/db}     

随着时间的推移,任务是非常静态的,但是每个任务平均每5分钟添加和删除一次子任务。我要说的是,每个任务在任何给定时间平均大约有40个子任务,其中包含(几乎总是,但也有一些例外)一个参与者。使用Datomic的唯一目的是能够查看任务随时间的变化情况,即,我想查看给定时间任务的外观。为了实现这一目标,我目前正在执行以下操作:

The tasks are pretty static over time but subtasks are added and removed on average about once per 5 minutes per task. I would say that each task on average has about 40 subtasks at any given time containing (almost always but there are a few exceptions) one participant. My sole purpose of using Datomic is to be able to see how tasks have evolved over time, i.e. I'd like to see the what a task looked like at a given time. To achieve I'm currently doing something similar to this:

(defn find-tasks-by-tenant-at-time 
    [conn tenant-guid ^long time-epoch]
    (let [db-conn (-> conn d/db (d/as-of (Date. time-epoch)))
          task-ids (->> (d/q '[:find ?taskIds
                              :in $ ?tenantGuid
                              :where
                              [?tenantId :tenant/guid ?tenantGuid]
                              [?tenantId :tenant/tasks ?taskIds]]
                            db-conn tenant-guid)
                       vec flatten)
          task-entities (map #(d/entity db-conn %) task-ids)
          dtos (map (fn [task]
                (letfn [(participant-dto [participant]
                          {:id   (:participant/guid participant)
                           :name (:participant/name participant)})
                        (subtask-dto [subtask]
                          {:id           (:subtask/guid subtask)
                           :type         (:subtask/type subtask)
                           :participants (map participant-dto (:subtask/participants subtask))})]
                  {:id       (:task/guid task)
                   :name     (:task/name task)
                   :subtasks (map subtask-dto (:task/subtasks task))})) task-entities)]
          dtos))

不幸的是,这非常慢。如果一个租户有许多任务(例如20个),每个任务包含大约40个子任务,则从该函数返回可能要花费近60秒。我在这里做错了什么吗?

Unfortunately this is extremely slow. It can take almost 60 seconds to return from this function if there are many tasks for a tenant (say 20) each containing roughly 40 subtasks. Am I doing something obviously wrong here? Is it possible to speed this up?

更新:
整个数据集大约2 Gb,对等方具有3.5 Gb的内存(但没有)如果我将其减少为1.5 Gb,而事务处理程序具有1 Gb的内存,似乎没有什么区别。我正在使用Datomic Free。

Update: The entire dataset is roughly 2 Gb and the peer has 3.5Gb of memory (but it doesn't seem to make any difference if I decrease it to say 1.5 Gb) and the transactor has 1 Gb of memory. I'm using Datomic Free.

推荐答案

在开始分析之前,可以替换

Before you start profiling etc. you could replace

[:find ?taskIds ...]

by

[:find (pull ?task-entity [*]) ...]

减少到对等方的往返次数,从而摆脱的map语句任务实体。第二步,将 [*] 替换为您真正想为每个实体提取的一组适当的键。

to reduce the number of round-trips to the peer and thus get rid of the map statement for task-entities. In a second step replace [*] with the appropriate set of keys you really want to pull for each entity.

这篇关于地理查询性能改进的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆