字符串列上的postgresql索引 [英] postgresql index on string column
问题描述
说,我有一个表 ResidentInfo
,在这个表中我有唯一约束 HomeAddress
,这是 VARCHAR
类型。为了将来的查询,我将在此列上添加一个索引。
查询只有操作 =
,我将使用B-TREE模式,因为目前不建议使用哈希模式。
Say, I have a table ResidentInfo
, and in this table I have unique constraints HomeAddress
, which is VARCHAR
type. For future query, I gonna add an index on this column.
The query will only have operation =
, and I'll use B-TREE pattern since the Hash pattern is not recommended currently.
问题:从效率角度来看,使用B-TREE,你认为我应该添加一个新的列,其中数字1,2,3 ....,N对应不同的homeaddress,而不是添加 HomeAddress
上的索引,我应该在数字列上添加索引吗?
Question: From efficiency view, using B-TREE, do you think I should add a new column with numbers 1,2,3....,N corresponding to different homeaddress, and instead of adding index on HomeAddress
, I should add index on the number column?
我问这个问题因为我没有知道索引是如何工作的。
I ask this question because I don't know how index works.
推荐答案
用于简单的相等检查( =
) , varchar
或 text
列上的B-Tree索引很简单,也是最佳选择。它肯定有助于提高性能。
For simple equality checks (=
), a B-Tree index on a varchar
or text
column is simple and the best choice. It certainly helps performance a lot.
当然,简单整数的B-Tree索引
表现更好。对于初学者来说,比较简单的整数
值要快一些。但更重要的是,性能也是索引大小的函数。较大的列意味着每个数据页面的行数较少,意味着必须读取更多页面...
Of course, a B-Tree index on a simple integer
performs better. For starters, comparing simple integer
values is a bit faster. But more importantly, performance is also a function of the size of the index. A bigger column means fewer rows per data page, means more pages have to be read ...
由于 HomeAddress
无论如何都不是唯一的,它不是一个好的自然主键。我强烈建议使用 代理主键 。 serial
列是显而易见的选择为了那个原因。它的唯一目的是拥有一个简单,快速的主键。
Since the HomeAddress
is hardly unique anyway, it's not a good natural primary key. I would strongly suggest to use a surrogate primary key instead. A serial
column is the obvious choice for that. Its only purpose is to have a simple, fast primary key to work with.
如果您有其他表引用该表,则效率会更高。您不需要为外键列复制冗长的字符串,而只需要整数列的4个字节。并且您不需要如此级联更新,因为地址必然会发生变化,而代理pk可以保持不变(当然也不一定)。
If you have other tables referencing said table, this becomes even more efficient. Instead of duplicating a lengthy string for the foreign key column, you only need the 4 bytes for an integer column. And you don't need to cascade updates so much, since an address is bound to change, while a surrogate pk can stay the same (but doesn't have to, of course).
您的表格可能如下所示:
Your table could look like this:
CREATE TABLE resident (
resident_id serial PRIMARY KEY
,address text NOT NULL
-- more columns
);
CREATE INDEX resident_adr_idx ON resident(address);
这导致两个B-Tree索引。 resident_id
上的唯一索引和地址上的普通索引
。
This results in two B-Tree indexes. A unique index on resident_id
and a plain index on address
.
有关手册中索引的更多信息。
Postgres提供了很多选择 - 但是对于这个简单的案例你不再需要了。
More about indexes in the manual.
Postgres offers a lot of options - but you don't need any more for this simple case.
这篇关于字符串列上的postgresql索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!