需要MySQL优化用于EAV结构化数据的复杂搜索 [英] Need MySQL optimization for complex search on EAV structured data

查看:237
本文介绍了需要MySQL优化用于EAV结构化数据的复杂搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有EAV结构化数据的大型数据库,必须是可搜索和可分页的。我尝试过我的书中的每一个技巧,使其足够快,但在某些情况下,它仍然无法在合理的时间内完成。



这是我的表结构零件,如果需要更多请问):

  CREATE TABLE IF NOT EXISTS`object`(
`object_id `bigint(20)NOT NULL AUTO_INCREMENT,
`oid` varchar(32)CHARACTER SET utf8 NOT NULL,
`status` varchar(100)CHARACTER SET utf8 DEFAULT NULL,
`created` datetime NOT NULL,
`updated` datetime NOT NULL,
PRIMARY KEY(`object_id`),
UNIQUE KEY`oid`(`oid`)
)ENGINE = InnoDB DEFAULT CHARSET = utf8;

CREATE TABLE IF NOT EXISTS`version`(
`version_id` bigint(20)NOT NULL AUTO_INCREMENT,
`type_id` bigint(20)NOT NULL,
`object_id` bigint(20)NOT NULL,
`created` datetime NOT NULL,
`status` varchar(100)CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY(`version_id`)
)ENGINE = InnoDB DEFAULT CHARSET = utf8;

CREATE TABLE IF NOT EXISTS`value`(
`value_id` bigint(20)NOT NULL AUTO_INCREMENT,
`object_id` int(11)NOT NULL,
`attribute_id` int(11)NOT NULL,
`version_id` bigint(20)NOT NULL,
`type_id` bigint(20)NOT NULL,
`value` text NOT NULL,
PRIMARY KEY(`value_id`),
KEY`field_id`(`attribute_id`),
KEY`action_id`(`version_id`),
KEY`form_id`(`type_id `)
)ENGINE = InnoDB DEFAULT CHARSET = utf8;

这是一个示例对象。我的数据库中有大约100万。每个对象可能具有不同数量的属性,具有不同的attribute_id

  INSERT INTO`owner`(`owner_id`,`uid`, status`,`created`,`updated`)VALUES(1,'cwnzrdxs4dzxns47xs4tx','Green',NOW(),NOW()); 
INSERT INTO`object`(`object_id`,`type_id`,`owner_id`,`created`,`status`)VALUES(1,1,1,NOW(),NOW());
INSERT INTO`value`(`value_id`,`owner_id`,`attribute_id`,`object_id`,`type_id`,`value`)VALUES(1,1,1,1,1,'Munich') ;
INSERT INTO`value`(`value_id`,`owner_id`,`attribute_id`,`object_id`,`type_id`,`value`)VALUES(2,1,2,1,1,'Germany) ;
INSERT INTO`value`(`value_id`,`owner_id`,`attribute_id`,`object_id`,`type_id`,`value`)VALUES(3,1,3,1,1,'123') ;
INSERT INTO`value`(`value_id`,`owner_id`,`attribute_id`,`object_id`,`type_id`,`value`)VALUES(4,1,4,1,1,'2012-01 -13');
INSERT INTO`value`(`value_id`,`owner_id`,`attribute_id`,`object_id`,`type_id`,`value`)VALUES(5,1,5,1,1,'A cake! ');






现在我目前的机制。我的第一个尝试是Mysql的典型方法。做一个巨大的SQL与加载的任何我需要的连接。完成垃圾由于内存空闲,导致PHP和MySQL服务器无法加载甚至崩溃。



所以我把我的查询分成几个步骤:



1确定所有需要的attribute_ids。



我可以在另一个表中引用type_id一个东西。结果是attribute_ids的列表。 (这个表与性能不是很相关,所以它不包含在我的示例中。)



:type_id包含我想要包含在我的搜索中的任何对象的所有type_ids 。我已经在我的应用程序中获得了这些信息。所以这是便宜的。

  SELECT * FROM attribute WHERE form_id IN(:type_id)

结果是一个type_id整数的数组。



2搜索匹配的对象
编译一个大型SQL查询,为我想要的每个条件添加一个INNER JOIN。这听起来很可怕,但最终却是最快的方法:(



一个典型的生成查询可能看起来像这样,LIMIT可能是必要的,或者我可能会得到这么多的ID,结果数组使得PHP在下一个查询中爆炸或中断IN语句:

  SELECT DISTINCT`version`。 object_id FROM`version` 
INNER JOIN`version` AS condition1
ON`version`.version_id = condition1.version_id
AND condition1.created ='2012-03-04' - 按版本日期
INNER JOIN`value` AS条件2
ON`version`.version_id = condition2.version_id
AND condition2.type_id IN(:type_id) - 尝试限制连接到对象类型我们需要
AND condition2.attribute_id =:field_id2 - 搜索特定属性中的值
AND condition2.value ='Munich' - 搜索值'Munich'
INNER JOIN` value` AS condition3
ON`version`.version_id = condition3.version_id
AND condition3.type_id IN(:type_id) - 尝试限制连接到我们需要的对象类型
AND condition3.attribute_id =:field_id3 - 搜索特定属性中的值
AND condition3.value ='Green' - 搜索值'Green'
WHERE`version`.type_id IN(:type_id)ORDER BY`version`.version_id DESC LIMIT 10000

结果将包含我可能需要的任何对象的所有object_ids 。我正在选择object_ids而不是version_ids,因为我需要有所有版本的匹配对象,无论哪个版本匹配。



3排序和页面结果
接下来,我将创建一个查询,按照某个属性对对象进行排序,然后页面生成数组。

  SELECT DISTINCT object_id 
FROM value
WHERE object_id IN(:foundObjects)
AND attribute_id =:attribute_id_to_sort
AND value> ''
ORDER BY value ASC LIMIT:limit OFFSET:offset

结果是一个排序并从以前的搜索中分页的对象ID列表



4获取我们的完整对象,版本和属性
在最后一步,我将为以前查询找到的任何对象和版本选择所有值。

  SELECT`value`。*,`object`。* ,`version`。*,`type`。* 
`object`.status AS`object.status`,
`object`.flag AS`object.flag`,
`version `.created AS`version.created`,
`version`.status AS`version.status`,
FROM version
INNER JOIN`type` ON`version`.form_id =`type `.type_id
INNER JOIN`object` ON`version`.object_id =`object`.object_id
LEFT JOIN值ON`version`.version_id =`value`.version_id
WHERE版本。 object_id IN(:sortedObjectIds)AND`version.type_id IN(:typeIds)
ORDER BY version.created DESC

结果将通过PHP编译成nice对象 - > version->值数组结构。






现在的问题




  • 这个整个混乱可以以任何方式加速吗?

  • 我可以以某种方式从我的搜索查询中删除LIMIT 10000限制吗?



如果所有其他都失败, ?请参阅我的其他问题:针对搜索具有不同属性的大量对象






实际样本



表格尺寸:对象 - 193801行,版本 - 193841行,值 - 1053928行

 code> SELECT * FROM attribute WHERE attribute_id IN(30)

SELECT DISTINCT`version`.object_id
FROM version
INNER JOIN值AS condition_d4e328e33813
ON version.version_id = condition_d4e328e33813.version_id
AND condition_d4e328e33813.type_id IN(30)
AND condition_d4e328e33813.attribute_id IN(377)
AND condition_d4e328e33813.value LIKE'%e%'
INNER JOIN值AS condition_2c870b0a429f
ON version.version_id = condition_2c870b0a429f.version_id
AND condition_2c870b0a429f.type_id IN(30)
AND condition_2c870b0a429f.attribute_id IN(376)
AND condition_2c870b0a429f.value LIKE' %s%'
WHERE version.type_id IN(30)
ORDER BY version.version_id DESC LIMIT 10000 - 限制为10000或断开!

解释:

 code> id select_type表type possible_keys key key_len ref rows Extra 
1 SIMPLE condition_2c870b0a429f ref field_id,action_id,form_id field_id 4 const 178639 Using where;使用临时;使用filesort
1 SIMPLE操作eq_ref PRIMARY PRIMARY 8 condition_2c870b0a429f.action_id 1使用where
1 SIMPLE condition_d4e328e33813 ref field_id,action_id,form_id action_id 8 action.action_id 11使用where;不寻常的

对象搜索完成(峰值RAM:5.91MB,时间:4.64秒)

  SELECT DISTINCT object_id 
FROM version
WHERE object_id IN(193793,193789,...,135326,135324) - 10000 ids在这里!
ORDER BY创建ASC
LIMIT 50 OFFSET 0

对象排序完成RAM:6.68MB,时间:0.352s)

  SELECT`value`。*,object。*,version。*,type 。*,
object.status AS`object.status`,
object.flag AS`object.flag`,
version.created AS`version.created`,
version .status AS`version.status`,
version.flag AS`version.flag`
FROM version
INNER JOIN type ON version.type_id = type.type_id
INNER JOIN对象ON version.object_id = object.object_id
LEFT JOIN值ON version.version_id =`value`.version_id
WHERE version.object_id IN(135324,135326,...,135658,135661)AND版本。 type_id IN(30)
ORDER BY quality DESC,version.created DESC

对象加载查询完成(峰值RAM:6.68MB,时间:0.083s)

对象编译成阵列完成(峰值RAM:6.68MB,时间:0.007s)

解决方案

我将首先尝试覆盖索引(即:所有列匹配您正在查询的标准,甚至结果)。这样,引擎不必返回原始页面数据。



由于您需要版本中的object_id,并且使用version_id作为连接基础到其他表。您的版本表在TYPE_ID上也有一个WHERE子句,所以我将在



版本表 - (object_id,version_id,type_id)



对于您的值表格,还可以与条件匹配



值表 - (version_id,type_id,attribute_id,value,创建)


I have a large database with EAV structured data that has to be searchable and pageable. I tried every trick in my book to get it fast enough, but under certain circumstances, it still fails to complete in a reasonable time.

This is my table structure (relevant parts only, ask away if you need more):

CREATE TABLE IF NOT EXISTS `object` (
  `object_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `oid` varchar(32) CHARACTER SET utf8 NOT NULL,
  `status` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
  `created` datetime NOT NULL,
  `updated` datetime NOT NULL,
  PRIMARY KEY (`object_id`),
  UNIQUE KEY `oid` (`oid`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8;

CREATE TABLE IF NOT EXISTS `version` (
  `version_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `type_id` bigint(20) NOT NULL,
  `object_id` bigint(20) NOT NULL,
  `created` datetime NOT NULL,
  `status` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
  PRIMARY KEY (`version_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8;

CREATE TABLE IF NOT EXISTS `value` (
  `value_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `object_id` int(11) NOT NULL,
  `attribute_id` int(11) NOT NULL,
  `version_id` bigint(20) NOT NULL,
  `type_id` bigint(20) NOT NULL,
  `value` text NOT NULL,
  PRIMARY KEY (`value_id`),
  KEY `field_id` (`attribute_id`),
  KEY `action_id` (`version_id`),
  KEY `form_id` (`type_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8;

This is a sample object. I have around 1 million of those in my database. each object may have different number of attributes with different attribute_id

INSERT INTO `owner` (`owner_id`, `uid`, `status`, `created`, `updated`) VALUES (1, 'cwnzrdxs4dzxns47xs4tx', 'Green', NOW(), NOW());
INSERT INTO `object` (`object_id`, `type_id`, `owner_id`, `created`, `status`) VALUES (1, 1, 1, NOW(), NOW());
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (1, 1, 1, 1, 1, 'Munich');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (2, 1, 2, 1, 1, 'Germany');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (3, 1, 3, 1, 1, '123');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (4, 1, 4, 1, 1, '2012-01-13');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (5, 1, 5, 1, 1, 'A cake!');


Now on to my current mechanism. My first try was the typical approach to Mysql. Do one huge SQL with loads of joins on anything I require. Complete desaster! Took way to long to load and even crashed the PHP and MySQL servers due to exhausted RAM.

So I split my queries up into several steps:

1 Determine all needed attribute_ids.

I can look them up in another table that references the type_id of an object. The result is a list of attribute_ids. (this table is not very relevant to the performance, so it's not included in my sample.)

:type_id contains all type_ids from any objects I want to include in my search. I already got this information in my application. So this is inexpensive.

SELECT * FROM attribute WHERE form_id IN (:type_id)

Result is an array of type_id integers.

2 Search for matching objects A big SQL query is compiled that adds one INNER JOIN for each and every condition I want. This sounds horrible, but in the end, it was the fastest method :(

A typical generated query might look like this. The LIMIT sadly is necessary or I will potentially get so many IDs that the resulting array makes PHP explode or break the IN statement in the next Query:

SELECT DISTINCT `version`.object_id FROM `version`
INNER JOIN `version` AS condition1 
        ON `version`.version_id = condition1.version_id 
       AND condition1.created = '2012-03-04' -- Filter by version date
INNER JOIN `value` AS condition2 
        ON `version`.version_id = condition2.version_id
       AND condition2.type_id IN (:type_id) -- try to limit joins to object types we need
       AND condition2.attribute_id = :field_id2 -- searching for a value in a specific attribute
       AND condition2.value = 'Munich' -- searching for the value 'Munich'
INNER JOIN `value` AS condition3 
        ON `version`.version_id = condition3.version_id
       AND condition3.type_id IN (:type_id) -- try to limit joins to object types we need
       AND condition3.attribute_id = :field_id3 -- searching for a value in a specific attribute
       AND condition3.value = 'Green' -- searching for the value 'Green'
WHERE `version`.type_id IN (:type_id) ORDER BY `version`.version_id DESC LIMIT 10000

The result will contain all object_ids from any object I might need. I am selecting object_ids and not version_ids as I need to have all versions of matching objects, regardless of which version matched.

3 Sort and page results Next I will create a query that sorts the objects by a certain attribute and then pages the resulting array.

SELECT DISTINCT object_id
FROM value
WHERE object_id IN (:foundObjects)
AND attribute_id = :attribute_id_to_sort
AND value > ''
ORDER BY value ASC LIMIT :limit OFFSET :offset

The result is a sorted and paged list of object ids from former search

4 Get our complete objects, versions and attributes In the last step, I will select all values for any objects and versions the former queries found.

SELECT `value`.*, `object`.*, `version`.*, `type`.*
`object`.status AS `object.status`,
`object`.flag AS `object.flag`,
`version`.created AS `version.created`,
`version`.status AS `version.status`,
FROM version
INNER JOIN `type` ON `version`.form_id = `type`.type_id
INNER JOIN `object` ON `version`.object_id = `object`.object_id
LEFT JOIN value ON `version`.version_id = `value`.version_id
WHERE version.object_id IN (:sortedObjectIds) AND `version.type_id IN (:typeIds)
ORDER BY version.created DESC

The result will then be compiled via PHP into nice object->version->value array structures.


Now the question:

  • Can this whole mess be accelerated in any way?
  • Can I somehow remove the LIMIT 10000 restriction from my search query?

If all else fails, maybe switch database technology? See my other question: Database optimized for searching in large number of objects with different attributes


Real Life samples

Table sizes: object - 193801 rows, version - 193841 rows, value - 1053928 rows

SELECT * FROM attribute WHERE attribute_id IN (30)

SELECT DISTINCT `version`.object_id
FROM version  
INNER JOIN value AS condition_d4e328e33813 
     ON version.version_id = condition_d4e328e33813.version_id
    AND condition_d4e328e33813.type_id IN (30)
    AND condition_d4e328e33813.attribute_id IN (377) 
    AND condition_d4e328e33813.value LIKE '%e%'  
INNER JOIN value AS condition_2c870b0a429f 
     ON version.version_id = condition_2c870b0a429f.version_id
    AND condition_2c870b0a429f.type_id IN (30)
    AND condition_2c870b0a429f.attribute_id IN (376) 
    AND condition_2c870b0a429f.value LIKE '%s%' 
WHERE version.type_id IN (30) 
ORDER BY version.version_id DESC LIMIT 10000 -- limit to 10000 or it breaks!

Explain:

id  select_type  table                   type      possible_keys                key         key_len ref                               rows      Extra   
1   SIMPLE       condition_2c870b0a429f  ref       field_id,action_id,form_id   field_id    4       const                             178639    Using where; Using temporary; Using filesort
1   SIMPLE       action                  eq_ref    PRIMARY                      PRIMARY     8       condition_2c870b0a429f.action_id  1         Using where
1   SIMPLE       condition_d4e328e33813  ref       field_id,action_id,form_id   action_id   8       action.action_id                  11        Using where; Distinct

objects search completed (Peak RAM: 5.91MB, Time: 4.64s)

SELECT DISTINCT object_id
FROM version
WHERE object_id IN (193793,193789, ... ,135326,135324) -- 10000 ids in here!
ORDER BY created ASC
LIMIT 50 OFFSET 0                                                  

objects sort completed (Peak RAM: 6.68MB, Time: 0.352s)

SELECT `value`.*, object.*, version.*, type.*,
    object.status AS `object.status`,
    object.flag AS `object.flag`,
    version.created AS `version.created`,
    version.status AS `version.status`,
    version.flag AS `version.flag`
FROM version
INNER JOIN type ON version.type_id = type.type_id
INNER JOIN object ON version.object_id = object.object_id
LEFT JOIN value ON version.version_id = `value`.version_id
WHERE version.object_id IN (135324,135326,...,135658,135661) AND version.type_id IN (30)
ORDER BY quality DESC, version.created DESC 

objects load query completed (Peak RAM: 6.68MB, Time: 0.083s)
objects compilation into arrays completed (Peak RAM: 6.68MB, Time: 0.007s)

解决方案

I would start by trying to have covering indexes (ie: all columns to match the criteria you are querying on and even pulling out as result). This way the engine does not have to go back to the raw page data.

Since you need the "object_id" from version, and using the "version_id" as join basis to the other tables. Your version table also has a WHERE clause on the TYPE_ID, so I would have an index on

version table -- (object_id, version_id, type_id)

For your "value" table, match there too for criteria

value table -- ( version_id, type_id, attribute_id, value, created )

这篇关于需要MySQL优化用于EAV结构化数据的复杂搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆