Postgres比MYSQL在全文搜索上有更多的性能? [英] How much more performant is Postgres than MYSQL on fulltext search?

查看:88
本文介绍了Postgres比MYSQL在全文搜索上有更多的性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直是MYSQL用户,从来没有试过Postgres。



但是当数据集很庞大时,MYSQL已经在全文搜索中占据了瓶颈。 b $ b

解决方案

几年前我在大数据集上运行了基准测试,发现:

$ ul b $ b

  • MySQL FULLTEXT



  • 非常慢。另一个缺点是它迫使你的MyISAM带来很多问题。索引达到一定大小后,索引更新也非常缓慢:当您插入新行时,会重新生成大部分索引,有时会重新编写几百兆字节的索引以插入论坛帖子。换句话说,对于有几MB帖子的小论坛来说没关系,但维基百科不会使用它......


    • PostgreSQL全文



    大约比MySQL全文快10-100倍,功能更强大,插入/更新,没有锁的问题,换句话说它是一个完全体面的解决方案。

    然而,当数据集因为MVCC而大于RAM时,搜索速度变慢,postgres需要点击堆检查行的可见性。请注意,这可能会在未来版本中更改。如果你的查询返回10行,没问题。但是,如果你想SELECT WHERE(全文查询)ORDER BY日期限制10和全文匹配10.000行,它可以变得很慢。 Xapian:我测试了这个,还有Lucene和Sphinx,它们都有很好的性能,比MySQL更快,但并不是你想要的性能。

    声誉。


    Xapian不必遵循与数据库相同的限制,因此可以进行更多的优化。例如,它是一个单写者多读者并发模型,因此您需要某种更新队列来更新后台中的索引。它也有它自己的磁盘格式。结果是,即使数据集比RAM大得多,特别是在匹配很多行的复杂查询时,它们的速度非常快,并且只返回最相关的行。

    该指数也很大,它可能包含大量重复的东西。基本上,一旦Postgres开始进入IO寻求墙,MySQL已经很长时间了,而Xapian一直在快速发展。

    但它并没有很好地整合到数据库中,所以它的使用更多。只有拥有庞大的数据集才值得。如果这是你的情况,试试吧,这太棒了。如果你的数据集适合内存,postgres只需要很少的麻烦。另外,如果您想将全文和数据库查询结合起来,那么集成就变得很重要。


    I've been a MYSQL user,never tried Postgres .

    But MYSQL has bottle neck on fulltext search when the data set is huge.

    解决方案

    I ran benchmarks a few years ago on large datasets and found that :

    • MySQL FULLTEXT

    Is pretty slow. Another drawback is that it forces MyISAM on you which brings a lot of problems. Also index updates are quite slow once the index reaches a certain size : when you insert a new row, a substantial part of the index is re-generated, sometimes a few hundred megabytes of index are rewritten just to insert a forum post. In other words, it's OK for a small forum with a few MBytes of posts, but there is a reason Wikipedia doesn't use it...

    • PostgreSQL fulltext

    Is about 10-100x faster than MySQL fulltext, is a lot more powerful, gist is fast on inserts/updates, no problem with locks, in other words it's a totally decent solution.

    However searches get slow when the data set is larger than RAM because of MVCC, postgres needs to check the visibility of rows by hitting the heap. Note this may change in a future version. If your query returns 10 rows, no problem. However, if you want to SELECT WHERE (fulltext query) ORDER BY date LIMIT 10 and the fulltext matches 10.000 rows, it can get pretty slow. Still faster than MySQL but not the performance you'd want.

    • Xapian : I tested this, there are also Lucene and Sphinx which have good reputation.

    Xapian does not have to conform to the same restrictions as a database, so it can make a lot more opimizations. For instance, it's a single-writer multiple-reader concurrency model, so you'll need some sort of update queue to update your index in the background. It also has its own on-disk format. The result is that it is incredibly fast, even when the dataset is much larger than RAM, and especially on complicated queries matching lots of rows, with sorts, and returning only the most relevant ones.

    The index is huge too, it probably contains lots of duplicated stuff. The consequence is that it doesn't need to seek to retrieve the stuff.

    Basically once Postgres started to hit the IO-seek wall, MySQL was long dead, and Xapian kept blazing fast.

    But it is not as nicely integrated in the database, so it is more work to use. It is only worth it if you have a huge dataset. If this is your case, try it, it's amazing. If your dataset fits in RAM, postgres will just work with a lot less hassle. Also if you want to combine fulltext and database queries, well, integration becomes important.

    这篇关于Postgres比MYSQL在全文搜索上有更多的性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆