我应该只是查询数据库或使用合适的搜索引擎解决方案? [英] Should i just query the database or use a proper search engine solution?

查看:111
本文介绍了我应该只是查询数据库或使用合适的搜索引擎解决方案?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个新闻网站,最终会有很多文章。我需要实现搜索功能,我知道solr是目前最流行的软件解决方案之一,用于实现今天。



该网站可能或可能不会得到繁忙的流量但是我必须实现为交通繁忙的网站设计的搜索功能。



使用像solr这样的搜索引擎而不仅仅是查询数据库有什么好处(mysql )为内容并显示给用户?只是因为搜索引擎产品像solr在搜索时具有优越的性能,除了(根据我已经阅读)更多的灵活性,当谈到搜索?

解决方案

他们不是在寻找像use solr这样的解释,而是寻找解释为什么不使用数据库。解决不同的问题。应用程序设计用于搜索具有不同于传统数据库(SQL和NoSQL变体)的核心功能集,因为要求是不同的,它们的用法不同。



有一些重叠之间的数据库能力相关的搜索,但如果我们使用标准数据库交互作为开始,写找到文章与这三个词存在是一个任务,你必须做手动处理来解决。添加您通常想要使搜索效果良好并为用户提供相关结果的所有其他内容,并且与常规数据库尝试解决的问题有很大的不同。



面向搜索的服务效果更好的一些功能:



字段和字段权重:如果在title应该比文本中的命中权重更重。但是你也可能有一个老因素影响分数,所以根据用例,所有这些字段和功能之间的权重可以调整,以解决几乎任何问题。



文字规范化和处理:在建立索引时,您可能想要展开同义词。搜索ipod和i-pod应该可能给出相同的结果。窗口和窗口。这些操作是大多数文档搜索引擎的基础。你可能想让一个字段执行语音匹配(单词的发音,而不是他们的书面形式),你可能想要得分不同于完全匹配。 Solr的分析程序,分词器和过滤器列表可能会让您了解一些可用的文本功能

:我的搜索中有多少文档在字段xyz中有不同的值,以及它们的计数是什么? 您可能已在许多网站上看到此功能,例如按文件类型过滤,仅显示过去7天,过去31天,过去365天的匹配数等等,以及每个文件夹的文档数。



突出显示:匹配文字的哪一部分,并提取适当的摘要,我可以将其提供给最终用户显示。您每次执行Google搜索时都会看到此功能,点击下方的文字会显示查找到的网页的实际内容。



..这些只是使用搜索的人每天都在考虑的一些功能。我不是说这些不是可以通过更传统的数据库功能解决,但他们需要你实现代码,保持同步和一般,写了很多代码,以获得免费的技术已经解决了问题。



性能取决于很多因素,但它可能会做得更好。您可以横向扩展大多数解决方案,因此您可以在增长时根据需要添加服务器。但你可能不会这样做一会儿,所以不要担心。过早优化等。


I have a news site where there will be a lot of articles eventually. I need to implement search functionality and i know that solr is one of the most popular software solutions to use to implement this today.

The site might or might not get heavy traffic but i have to implement search functionality that is designed for a heavy traffic site.

What are the benefits of using a search engine like solr instead of just querying the database (mysql) for the content and displaying it to the user ? Is it just because the search engine products like solr have superior performance when it comes to search in addition to (according to what i have read) more flexibility when it comes to searching ? Im not looking for answers like "use solr", im looking for an explaination as to why not use a database.

解决方案

They solve different problems. Applications designed for search have a different core feature set than traditional databases (both SQL and NoSQL variants), since the requirements are different and their usage differ.

There are some overlaps between DB capabilities relating to search these days, but if we use standard database interactions as a start, writing "find articles with these three words present" is a task that you'll have to do manual processing to solve. Add all the other things you usually want to make search perform well and provide relevant results for your users, and you have a very different problem from what regular databases tries to solve.

A few features that search-oriented services does better:

Term and field weights: If you have a match in "title", it should be weighted more heavily than a hit in "text". But you might also have an "oldness" factor affect the score, so depending on the use case, all these weights between fields and features can be tuned to solve almost any issue you have.

Text normalisation and processing: You might want to expand synonyms while indexing. Searching for ipod and i-pod should probably give the same result. Windows and window as well. These operations are fundamental to most document search engines. You might want to allow a field to perform phonetic matches (the pronunciation of words and not their written form), and you might want to score that differently from exact matches. Solr's list of analyzers, tokenizers and filters may give you an idea of some of the available features for text processing.

Faceting / Navigators: How many of the documents in my search has different values in the field xyz, and what are their counts? You've probably seen this feature on many sites, such as "filter by file type", "only show hits for the last 7 days, last 31 days, last 365 days" etc, together with a count of documents for each bin.

Highlighting: What part of the text was matched, and extract a proper snippet that I can give back to the end user to show. You're seeing this feature each time you do a Google search, and the text below the hit shows the actual content from the webpage where your query is found.

.. and these are just a few of the features that people who work with search is considering each day. I'm not saying that these aren't solvable by more traditional DB functionality, but they require you to implement code, keep stuff in sync and in general, write a whole lot of code to get something you'd get for free with technology already made to solve the problem.

Performance depends on a lot of factors, but it'll probably do better than OK. You can scale most solutions horizontally, so you can add servers as needed while growing. But you probably won't have to do that for a while, so don't worry about it. Premature optimization, etc.

这篇关于我应该只是查询数据库或使用合适的搜索引擎解决方案?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆