智能(?)数据库缓存 [英] Smart (?) Database Cache

查看:157
本文介绍了智能(?)数据库缓存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看过几个数据库缓存引擎,所有这些引擎都很蠢(即保持此查询缓存X分钟),并要求您手动删除整个在 INSERT / UPDATE / DELETE 已经执行。

I've seen several database cache engines, all of them are pretty dumb (i.e.: keep this query cached for X minutes) and require that you manually delete the whole cache repository after a INSERT / UPDATE / DELETE query has been executed.

大约2或3年前,我为一个项目开发了一个替代的数据库缓存系统,我的想法是基本上使用正则表达式来找到包含在特定SQL查询中的表:

About 2 or 3 years ago I developed an alternative DB cache system for a project I was working on, the idea was basically to use regular expressions to find the table(s) involved in a particular SQL query:

$query_patterns = array
(
    'INSERT' => '/INTO\s+(\w+)\s+/i',
    'SELECT' => '/FROM\s+((?:[\w]|,\s*)+)(?:\s+(?:[LEFT|RIGHT|OUTER|INNER|NATURAL|CROSS]\s*)*JOIN\s+((?:[\w]|,\s*)+)\s*)*/i',
    'UPDATE' => '/UPDATE\s+(\w+)\s+SET/i',
    'DELETE' => '/FROM\s+((?:[\w]|,\s*)+)/i',
    'REPLACE' => '/INTO\s+(\w+)\s+/i',
    'TRUNCATE' => '/TRUNCATE\s+(\w+)/i',
    'LOAD' => '/INTO\s+TABLE\s+(\w+)/i',
);

我知道这些正则表达式可能有一些缺陷不匹配嵌套查询,但是因为我从来不使用它们对我来说不是一个问题。

I know that these regexs probably have some flaws (my regex skills were pretty green back then) and obviously don't match nested queries, but since I never use them that isn't a problem for me.

无论如何,在找到涉及的表后,我将按字母顺序排序它们,使用以下命名约定在缓存存储库中创建一个新文件夹:

Anyway, after finding the involved tables I would alphabetically sort them and create a new folder in the cache repository with the following naming convention:

+table_a+table_b+table_c+table_...+

如果 SELECT 将从数据库获取结果, serialize()并将它们存储在适当的缓存文件夹中,例如以下查询的结果:

In case of a SELECT query, I would fetch the results from the database, serialize() them and store them in the appropriate cache folder, so for instance the results of the following query:

SELECT `table_a`.`title`, `table_b`.`description` FROM `table_a`, `table_b` WHERE `table_a`.`id` <= 10 ORDER BY `table_a`.`id` ASC;

将存储在:

/cache/+table_a+table_b+/079138e64d88039ab9cb2eab3b6bdb7b.md5

MD5作为查询本身。

The MD5 being the query itself. Upon a consequent SELECT query the results would be trivial to fetch.

对于任何其他类型的写入查询( INSERT REPLACE UPDATE DELETE )我会 glob()所有在其名称中有 + matched_table(s)+ 的文件夹中删除所有文件内容。这样,不必删除整个缓存,只需删除受影响的表和相关表使用的缓存。

In case of any other type of write query (INSERT, REPLACE, UPDATE, DELETE and so on) I would glob() all the folders that had +matched_table(s)+ in their name all delete all the file contents. This way it wouldn't be necessary to delete the whole cache, just the cache used by the affected and related tables.

系统运行良好,性能差异是可见的 - 尽管该项目具有比写入查询更多的读取查询。从那时起,我开始使用事务,FK CASCADE UPDATES / DELETES ,从来没有时间完善系统使用这些功能。

The system worked pretty well and the difference of performance was visible - although the project had many more read queries than write queries. Since then I started using transactions, FK CASCADE UPDATES / DELETES and never had the time to perfect the system to make it work with these features.

我使用了 MySQL查询缓存过去,但我必须说性能甚至不比较。

I've used MySQL Query Cache in the past but I must say the performance doesn't even compare.

想知道:我是唯一一个在这个系统中看到美丽的人吗?有没有瓶颈我可能不知道?为什么像 CodeIgniter Kohana (我不知道 Zend Framework )有这种基本的数据库缓存系统吗?

I'm wondering: am I the only one who sees beauty in this system? Is there any bottlenecks I may not be aware of? Why do popular frameworks like CodeIgniter and Kohana (I'm not aware of Zend Framework) have such rudimentary DB cache systems?

更重要的是,你认为这是一个值得追求的特性吗?如果是,是否有任何我可以做/用来使它更快(我的主要问题是磁盘I / O和(解)序列化的查询结果)?

More importantly, do you see this as a feature worth pursuing? If yes, is there anything I could do / use to make it even faster (my main concerns are disk I/O and (de)serialization of query results)?

感谢所有输入,谢谢。

推荐答案

美在这个解决方案,然而,我相信它只适用于一个非常具体的应用程序集。不适用的情况包括:

I can see the beauty in this solution, however, I belive it only works for a very specific set of applications. Scenarios where it is not applicable include:


  • 使用级联删除/更新或任何类型触发器的数据库。例如,您对表A的DELETE可能会导致表B中的DELETE。正则表达式永远不会捕捉到这个。

  • Databases which utilize cascading deletes/updates or any kind of triggers. E.g., your DELETE to table A may cause a DELETE from table B. The regex will never catch this.

从不经过的点访问数据库你缓存失效方案,例如crontab脚本等。如果你决定在机器之间实现复制(引入只读从属),它也可能会扰乱缓存(因为它不通过缓存失效等)。

Accessing the database from points which do not go through you cache invalidation scheme, e.g. crontab scripts etc. If you ever decide to implement replication across machines (introduce read-only slaves), it may also disturb the cache (because it does not go through cache invalidation etc.)

即使这些情况对于你的情况不现实,它仍然回答为什么框架不实现这种缓存的问题。

Even if these scenarios are not realistic for your case it does still answer the question of why frameworks do not implement this kind of cache.

关于这是否值得追求,这一切都取决于你的申请。也许你在乎提供更多信息?

Regarding if this is worth pursuing, it all depends on your application. Maybe you care to supply more information?

这篇关于智能(?)数据库缓存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆