使用Java和SQLite递归的数据处理性能 [英] Recursive data processing performance using Java and SQLite

查看:250
本文介绍了使用Java和SQLite递归的数据处理性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果你有一个答案,它不是Java / SQLite的关系,我会很高兴地阅读。

予储存到用下面的方案的数据库项目:

I store items in a database with the following scheme :

###################
#       Item      #    
###################
#      _id        #    This is the primary key
#    parent_id    #    If set, it the ID of the item containing this item
#      date       #    An ordinary date
#  geocontext_id  #    Foreign key to a pair of named coordinates
###################

###################
#   Geocontext    #    
###################
#       _id       #    This is the primary key
#       name      #    Way for the user to label a pair of coordinates (e.g : "home", "work")
#         x       #    One of the coordinate
#         y       #    The other one
###################

问题

我必须根据geocontext和日期进行过滤的项目。这将是一个容易的工作,如果项目都在同一水平,但诀窍是它的递归。例如:

The problem

I must filter the items according to the geocontext and the date. It would be a easy job if items were all at the same level, but the trick is it's recursive. E.G :

root
      |_item 1
      |_item 2 
      |      |_item 4
      |      |_item 5
      |             |_item 6
      |_item 3
      |      |_item 8
      |             |_item 10
      |_item 11
      |       |_item 12
      |_item 7

没有明确的限制递归深度。

There is no explicit limit to the recursive depth.

现在,如果我们在同日4月1日任何节点和过滤器,我们必须看到的不仅是直接包含在匹配的日期,该节点的项目,但我们必须看到,所包含的项项目配套的日期,以及

Now, if we are in any node and filter with the date "1st of April", we must see not only the items directly contained in the node that match the date, but we must see the items that contain items matching the date as well.

EG:我们在项目2,如果第6项的日期一致,则认为项目5的日期也相匹配,我们必须保持它。如果我们的根,则第2项必须显示。

E.G : We are in "Items 2", if "item 6" matches the date, then we consider "item 5" matches the date too and we must keep it. If we are at the root, then item 2 must be displayed.

同样适用于geocontext,但它甚至更难,因为:

Same goes for the geocontext, but it's even harder because  :

  • 在它存储在其它表中。
  • 匹配的情况下是一个昂贵的数学计算。

当然,蛮力匹配会导致该软件是缓慢和非常差的用户体验。

Of course, brute forcing the matching would result in the software to be slow and a very poor user experience.

注:我不需要显示树。我从树上显示过滤后的数据列表。我们必须看到顶部的元素只是一个平面列表。面临的挑战是决定是否显示每个元素与否,根据所有的孩子层次结构

NOTE : I don't need to display a tree. I display a list of filtered data from a tree. We must see only a flat list of the top elements. The challenge is to decide whether to display each element or not, according to all the children hierachy.

我想我可以通过使用多个表来缓存平面数据缓和了一下这个问题:

I thought I could ease a bit the problem by using more tables to cache flat data :

###################
# Geocontex_cache #    
###################
#     item_id     #     I can Join the items table on this field
#     child_id    #     I can delete / update a child, and so delete / update the cache
#  geocontext_id  #     I can delete / update a geocontext, and so delete / update the cache
#        x        #      Here, I can brute force :-)
#        y        # 
###################

###################
#    Date_cache   #    
###################
#     item_id     #     
#     child_id    #    
#       date      #    
###################

这似乎是合理的,但我还没有尝试过呢。然而,它应该以下缺点:

This seems reasonable, but I haven't tried it yet. Nevertheless, it should the following drawbacks :

  • 我感动了昂贵的过程给该get /设置/创建/删除的方法 必须管理缓存的日期。 这将是一个棘手的code到 编写和维护。阿五的深度 级别项目将TRIGER一个过程, 会打递归5父母。

  • I moved the costly process to the get / set / create / delete methods that will have to manage the cached date. That will be a troublesome code to write and maintain. A five depth level item will triger a process that will hit recursively five parents.

尺寸加时赛数据库可能 成为巨大的。一个五德的深度级别 项目存储缓存数据五 父母。不知道这是否是相关的, 因为这是一个带有一个单用户应用程序 手工输入。我不认为任何人 将插入更多的腋臭1000项 与深度超过10级。

The size ot the data base could become HUGE. A five de depth level item store cached data for five parents. Don't know if it's relevant, since this a a mono-user app with a manual input. I don't think anybody would insert more thatn 1000 items with more than 10 level of depth.

现在的好消息是我们去从    金字塔底部到顶部,不    另一种方式,所以它不是有    可怕的,因为它似乎。当我将    要处理的父项    删除,这将是另一个不错    头痛,但我保存它的另一个    问题; - 。)

Now the good news is we go from the bottom of the pyramid to the top, not the other way, so it's not has horrible as it seems. When I will have to deal with parent item deletion, it will be another nice headache, but I save it for another question ;-).

你将如何存储数据和处理过滤INT最优化的方式?

How would you store the data and process the filtering int the most optimal way ?

可选:

我应该定义一个明确的递归深度的限制? 我应该执行使用SQL或Java过滤? SQL肯定会更快,但配套的geocontext要容易得多做Java的。

Should I define an explicit recursive depth limit ? Should I perform the filtering using SQL or Java ? SQL surely will be faster, but matching the geocontext is far easier to do in Java.

由于我的工作在Android平台上,我有以下限制:

As I am working on the Android platform, I have the following constraints :

  • Java是唯一的语言, 而不是与整个标准库。

  • Java is the only language available, and not with the entire standard lib.

SQLite是唯一可用的数据库管理系统。

SQLite is the only DBMS available.

性能和内存是很重要的 的问题。如果你要选择, 电池寿命和因此 性能是首要问题。

Performance and memory are important issues. In case you have to choose, battery life and therefore performance is the priority.

异国外部库可能无法 使用

Exotics external libs may not be able to be used.

PS:我挖成SO,发现了一些有趣的作品信息(espacially <一个href="http://stackoverflow.com/questions/192220/what-is-the-most-efficientelegant-way-to-parse-a-flat-table-into-a-tree">http://stackoverflow.com/questions/192220/what-is-the-most-efficientelegant-way-to-parse-a-flat-table-into-a-tree).这是一个暗示,但不是一个解决问题的能手。

P.S : I dug into SO and found some interesting pieces of information (espacially http://stackoverflow.com/questions/192220/what-is-the-most-efficientelegant-way-to-parse-a-flat-table-into-a-tree). It's a hint, but not a problem solver.

推荐答案

1)首先,让我们来看看简单地把一切都在内存中。这是很简单的,灵活的,以及最重要的是,快速,溶液。缺点包括事实,你必须阅读一切到内存在启动时(给用户一个pretty的加载条,他们甚至不会注意到),也许必须做一些额外的工作,以确保一切都体现到磁盘时用户认为它是,使得数据不会丢失。

1) First, let's look at simply putting everything in memory. This is simple, flexible, and above all, fast, solution. Drawbacks include the fact that you'll have to read everything into memory at startup (give the user a pretty loading bar and they won't even notice), and perhaps have to do a little extra work to ensure everything is reflected to disk when the user thinks it is, so that data isn't lost.

在这个分析我正在做关于Android / Dalvik的一些通用的假设,我真的不知道很多有关,所以希望它有点准确:)还记得G1拥有192MB的RAM。此外,你的假设之上是一个最大约1000项。

In this analysis I'm making some generic assumptions about Android/Dalvik I don't really know that much about, so hopefully it's somewhat accurate :) Remember the G1 has 192MB of RAM. Also, your assumption above was a max around 1000 items.

Object superclass ~ 8 bytes
parent/child pointer ~ 4 bytes
date (long) ~ 8 bytes
name (non interned string avg 32 chars) ~ 64 bytes
x point (int) ~ 4 bytes
y point (int) ~ 4 bytes

Total = 92 bytes + possible memory alignment + fudge factor = 128 bytes
1000 items = 125kB
10000 items = 1.22MB

注:我认识到,而孩子只能有一个指针,家长可以有多个孩子。然而,父母与>子球的数量(元素 - 1),因此父母与>子指针的平均成本(元 - 1)/元〜1元或4个字节。这假定不分配未使用的存储器,例如一个链表(而不是一个ArrayList)的子结构

Note: I realize that while a child can only have one pointer, a parent can have multiple children. However, the number of parent->child pointers is (elements - 1), so the average cost of parent->child pointer is (elements - 1)/elements ~ 1 element or 4 bytes. This assumes a child structure that doesn't allocate unused memory, such as a LinkedList (as opposed to an ArrayList)

2)在我的书呆子说,这将是一个有趣的地方来分析一个B +树,但我认为这是矫枉过正,你想要的那一刻:)不过,不管解决方案,你到底是什么采用了,如果你不将一切记忆,你一定会想缓存尽可能多的在内存中的树的顶端水平,你可以。这可以减少磁盘大幅度活动量。

2) The nerd in me says that this would be a fun place to profile a B+ tree, but I think that's overkill for what you want at the moment :) However, whatever solution you end up adopting, if you are not holding everything in memory, you will definitely want to cache as much of the top levels of the tree in memory as you can. This may cut down on the amount of disk activity drastically.

3)如果你不想去的所有内存,另一种可能的解决办法如下。比尔Karwin提出了一个相当完美的<一href="http://stackoverflow.com/questions/192220/what-is-the-most-efficient-elegant-way-to-parse-a-flat-table-into-a-tree/192462#192462">RDBMS结构称为封闭表用于优化基于树的读取,同时使写入更复杂。与顶尖级高速缓存相结合,这可能会给你的性能优势,但我会考虑我的话就可以了之前测试:

3) If you don't want to go all memory, another possible solution might be as follows. Bill Karwin suggests a rather elegant RDBMS structure called a Closure Table for optimizing tree based reads, while making writes more complex. Combining this with top level cache may give you performance benefits, although I would test this before taking my word on it:

当评估一个视图,使用任何你在内存评估尽可能多的孩子,你可以。对于那些孩子不匹配,使用封表和平面表与适当的之间的SQL连接WHERE子句,看看是否有任何匹配的儿童。如果是这样,你会显示你的结果列表中的节点。

When evaluating a view, use whatever you have in memory to evaluate as many children as you can. For those children that do not match, use an SQL join between the closure table and the flat table with an appropriate where clause to find out if there are any matching children. If so, you'll be displaying that node on your result list.

希望这一切是有意义的,并似乎将工作的你所需要的。

Hope this all makes sense and seems like it would work for what you need.

这篇关于使用Java和SQLite递归的数据处理性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆