正确索引具有许多要搜索的字段的表所需的建议 [英] Advice needed to properly indexing a table with many fields to be searched on

查看:138
本文介绍了正确索引具有许多要搜索的字段的表所需的建议的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含许多列的用户表,它看起来大致如下:

I have a user table that has many columns, it looks roughly like this:

dname:             { type: string(255), notnull: true }
email:             { type: string(255), notnull: true, unique: true }
email_code:        { type: string(255) }
email_confirmed:   { type: boolean, default: false }
profile_filled:    { type: boolean, default: false }
password:          { type: string(255), notnull: true }
image_id:          { type: integer }
gender:            { type: enum, values: [male, female] }
description:       { type: string }
dob:               { type: date }
height:            { type: integer(3) }
looks:             { type: enum, values: [thin, average, athletic, heavy] }
looking_for:       { type: enum, values: [marriage, dating, friends] }
looking_for_age1:  { type: integer }
looking_for_age2:  { type: integer }
color_hair:        { type: enum, values: [black, brown, blond, red] }
color_eyes:        { type: enum, values: [black, brown, blue, green, grey] }
marital_status:    { type: enum, values: [single, married, divorced, widowed] }
smokes:            { type: enum, values: [no, yes, sometimes] }
drinks:            { type: enum, values: [no, yes, sometimes] }
has_children:      { type: enum, values: [no, yes] }
wants_children:    { type: enum, values: [no, yes] }
education:         { type: enum, values: [school, college, university, masters, phd] }
occupation:        { type: enum, values: [no, yes] }
country_id:        { type: integer }
city_id:           { type: integer }
lastlogin_at:      { type: timestamp }
deleted_at:        { type: timestamp }

我创建了一个表单包含大多数字段(枚举,国家/地区,城市),这些字段允许用户根据他们选择的字段生成where语句。因此,如果有人选择抽烟:否和country_id:7然后sql where语句可能如下所示:

I have created a form that contains most of the fields (enums, country , city) which alows the user to generate a where statement based on the fields they selected. So if someone selected smokes: no and country_id: 7 then sql where statement could look like this:

SELECT id 
FROM user u 
WHERE u.deleted_t IS NULL AND u.profile_filled IS NOT NULL AND smokes = 'no' AND country_id = 7;

因为用户可以选择要过滤的任何字段组合,我不知道应该怎么做关于索引这个表,我应该在所有可以过滤的字段上创建一个列索引吗?你会提出什么建议?

Because user could select any combination of fields to filter by, I'm not sure how I should go about indexing this table, should I just create a single column index on all fields that can be filtered? What would you advise?

推荐答案

我有一张表在工作中有同样的东西,很多列和1000种不同的方式选择。这是一场噩梦。但我确实发现,经常使用某些过滤器组合。我会创建索引并留下其他很少用于缓慢运行的索引。在MSSQL中,我可以运行一个查询来向我展示针对数据库运行的最昂贵的查询,mySQL应该有类似的东西。一旦我拥有它们,我就会创建一个覆盖列的索引来加速它们。最终,你将获得90%的保障。除非我有一把AK47指着我,否则我个人绝不会再设计一张这样的桌子。 (我的索引比表中的数据大3倍,如果你需要添加一堆或记录,这个数据非常不酷)。
我不知道如何重新设计表格,我的第一个想法是将表格拆分为两个,但这会增加其他地方的头痛。

I have a table at work with the same sort of thing, lots of columns and 1000 different ways to select. Its a nightmare. I did find however, there are certain combinations of filters that are used often. It is those I would create indexes for and leave the others which are rarely used to run slowly. In MSSQL, I can run a query to show me the most expensive queries that have been run against the database, mySQL should have a similar thing. Once I have them, I create an index that covers the columns to speed them up. Eventually, you'll have it 90 percent covered. I personally would never design a table like that again unless I had an AK47 pointed at me. (my indexes are 3 times larger than the data in the table which is very uncool if you need to add a bunch or records). Im not sure how I would redesign the table though, My first thought would be to split the table into two, but that would add to headaches elsewhere.

用户表(UserID,名称)

User Table (UserID, Name)

1, Lisa
2, Jane
3, John

用户属性表(UserID,AttributeName,AttributeValue)

User Attribute Table(UserID, AttributeName,AttributeValue)

1, EYES, Brown
1, GENDER, Female
2, EYES, Blue
2, GENDER, Female
3  EYES, Blue
3, GENDER, Male

这样可以更快地识别属性,但可以查询不那么直接写。

This would make identifying attributes faster, but make your queries not as straight forward to write.

SELECT UserID, COUNT(*) as MatchingAttributes
FROM   UserAttributes 
WHERE  (UserAttributes.AttributeName = 'EYES' AND UserAttributes.AttributeValue = 'Blue') OR
       (UserAttributes.AttributeName = 'GENDER' AND UserAttributes.AttributeValue = 'Female') 

这应该回来了urn以下

This should return the following

UserID, MatchingAttributes
1, 1
2, 2
3, 1

您需要做的就是在查询中添加一个HAVING COUNT(*)= 2仅选择匹配的ID。它有一点参与选择,但它也提供了一个简洁的功能,比如说你过滤10个属性,并返回所有那些有10个匹配的。很酷,但说没有一个匹配100%。你可以说嘿,我发现没有一个匹配,但这些有9个10或90%匹配。 (只要确保,如果我搜索一个蓝眼睛的金发女性,我没有得到一条消息说没有找到,但这里是下一个最接近的匹配的包含蓝眼睛的金发女郎,匹配得分为60%。那就是非常不酷)

All you need to do then is add a HAVING COUNT(*) = 2 to the query to select only the IDs that match. Its a bit more involved to select from, but it also gives a neat feature, Say you filter on 10 Attributes, and return all those that have 10 matching. Cool, but say none matched 100%. You could say hey, I found none that matched, but these had 9 out 10 or a 90% match. (just make sure, if I search for a blue eyed blonde female, I don't get a message saying that none where found but here are the next closest matching ones containing blue eyed blonde blokes with a matching score of 60%. That would be very uncool)

如果您选择拆分表,还有更多需要考虑的事项,例如如何将属性作为数字,日期和文本存储在一个表中柱?或者是这些单独的表或列。无论是宽表还是分表,都不容易回答。

There are more things that would need consideration if you chose to split the table, like how do you store attributes as numbers,dates and text in a single column? Or are these separate tables, or columns. No easy answer either way wide table or split tables.

这篇关于正确索引具有许多要搜索的字段的表所需的建议的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆