如何实施企业搜索 [英] How to implement an Enterprise Search

查看:154
本文介绍了如何实施企业搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在搜索我们公司的不同数据源。我们有多个数据库中的信息需要从我们的Intranet中进行搜索。全文搜索(FTS)的初步实验证明令人失望。我们已经实现了一个自定义的搜索引擎,它可以很好地用于我们的目的。然而,我们希望确保我们做的是正确的事情,并且不会错过任何能够让我们的工作更轻松的优秀工具。

我们需要:




  1. 列搜索


    • 按列搜索的功能

    • 我们标记表中的哪些列是可搜索的

  2. 保留db列和数据之间的某种关系


    • 我们对结果提供高级筛选
    • 有助于(亚马逊风格)过滤

    • 过滤器通过分组结果提供,并允许用户通过复选框进行过滤

    • 这是一个很棒的功能,用户非常喜欢它


  3. 部分Word匹配


    • 我们有很多独特的标识符(产品ID等)。

    • 或者只有一部分可用(当用户搜索时)

    • 或者(通过一个确定的糟糕的设计决定)id中可能会有空格

    • 这是我们现在实现的一项主要功能通过CHARINDEX(MSSQL)和INSTR(ORACLE)使用char索引函数
    • 在MSSQL上与全文相比性能相当(+/-)
    • 没有在Oracle上进行测试

    • 但是,针对两种类型的数据库的搜索都非常快速

    • 我们利用索引(MSSQL)和物化(Oracle)视图来提高速度


      • 这是一个巨大的胜利,Oracle物化视图优于MSSQL索引视图

      • 在只读连接的情况下(如搜索梳理公司和产品)提供加速功能


    • 符合用户对范式CT的期望的搜索RL-f - >输入文本 - >查找匹配项

    • 全文搜索不是这方面的最佳选择(缓慢且不一致的匹配)匹配(请参阅部分词匹配)




h1>


  1. 实时搜索数据库


    • 跳过索引跳过,这是不是硬性要求


  2. 拼写建议




不需要:




  1. 我们不需要索引文档


    • 在这一点上,搜索我们的数据源是最重要的事情即使在我们搜索文档时,我们也会寻找部分词匹配等。


  2. 排名


    • 我们自己的简单排名算法已经证明比FTS更好。

    • 用户理解它,我们理解它,它几乎总是相关的。


  3. 结构


    • 只需不需要[run | ran | running]


  4. 高级搜索运算符
    根据Jakob Nielsen

问题:
有没有一种解决方案可以让我们保留键值对过滤功能,提供列特定匹配,部分字匹配和其他功能,而没有全文搜索的痛苦?



我接受任何建议。我想知道是否可以使用文档/散列表nosql数据存储(MongoDB等)? ( http://www.mongodb.org/display/DOCS / Full + Text + Search + in + Mongo )。任何与这些经验的赞赏。

再一次,只要确保我们不会错过我们的内部定制版本。如果有什么现成的,我会对它感兴趣。或者,如果您已经从某些组件构建了某些内容,那么您使用了哪些组件(搜索引擎,数据存储库等)?为什么?

您也可以制作您对FTS的观点。请确保它符合上述要求,然后再说只使用全文搜索,因为这是我们唯一的工具。

div>

我最终编写了自己的代码。



结果非常棒。用户喜欢它,它适用于我们现有的技术。



它确实不那么难。


$ b

特点:$ b​​
$ b


  • 分面搜索(亚马逊,walmart) ,$等)
  • 部分单词搜索(真正的东西不是全文)

  • 搜索数据库(oracle,sql server等)和非数据库源

  • 与我们现有的环境很好地集成

  • 维护关系,因此我可以进行搜索和显示

  • - >这意味着我可以在搜索结果中显示主记录的子记录

  • - >我也可以搜索任何子字段并返回主记录



使用字典和大量内存您可以做的事情真的很棒。


We are searching disparate data sources in our company. We have information in multiple databases that need to be searched from our Intranet. Initial experiments with Full Text Search (FTS) proved disappointing. We've implemented a custom search engine that works very well for our purposes. However, we want to make sure we are doing "the right thing" and aren't missing any great tools that would make our job easier.

What we need:

  1. Column search
    • ability to search by column
    • we flag which columns in a table are searchable
  2. Keep some relation between db column and data
    • we provide advanced filtering on the results
    • facilitates (amazon style) filtering
    • filter provided by grouping of results and allowing user to filter them via a checkbox
    • this is a great feature, users like it very much
  3. Partial Word Match
    • we have a lot of unique identifiers (product id, etc).
    • the unique id's can have sub parts with meaning (location, etc)
    • or only a portion may be available (when the user is searching)
    • or (by a decidedly poor design decision) there may be white space in the id
    • this is a major feature that we've implemented now via CHARINDEX (MSSQL) and INSTR (ORACLE)
    • using the char index functions turned out to be equivalent performance(+/-) on MSSQL compared to full text
    • didn't test on Oracle
    • however searches against both types of db are very fast
    • We take advantage of Indexed (MSSQL) and Materialized (Oracle) views to increase speed
      • this is a huge win, Oracle Materialized views are better than MSSQL Indexed views
      • both provide speedups in read-only join situations (like a search combing company and product)
    • A search that matches user expectations of the paradigm CTRL-f -> enter text -> find matches
    • Full Text Search is not the best in this area (slow and inconsistent matching)
    • partial matching (see "Partial Word Match")

Nice to have:

  1. Search database in real time
    • skip the indexing skip, this is not a hard requirement
  2. Spelling suggestion

What we don't need:

  1. We don't need to index documents
    • at this point searching our data sources are the most important thing
    • even when we do search documents, we will be looking for partial word matching, etc
  2. Ranking
    • Our own simple ranking algorithm has proven much better than an FTS equivalent.
    • Users understand it, we understand it, it's almost always relevant.
  3. Stemming
    • Just don't need to get [run|ran|running]
  4. Advanced search operators
    • phrase matching, or/and, etc
    • according to Jakob Nielsen http://www.useit.com/alertbox/20010513.html
      • most users are using simple search phrases
      • very few use advanced searches (when it's available)
      • also in Information Architecture 3rd edition Page 185
      • "few users take advantage of them [advanced search functions]"
      • http://oreilly.com/catalog/9780596000356
      • our Amazon like filtering allows better filtering anyway (via user testing)
  5. Full Text Search
    • We've found that results don't always "make sense" to the user
    • Searching with FTS is hard to tune (which set of operators match the users expectations)
    • Advanced search operators are a no go
    • we don't need them because
    • users don't understand them
    • Performance has been very close (+/1) to the char index functions
    • but the results are sometimes just "weird"

The question: Is there a solution that allows us to keep the key value pair "filtering feature", offers the column specific matching, partial word matching and the rest of the features, without the pain of full text search?

I'm open to any suggestion. I've wondered if a document/hash table nosql data store (MongoDB, et al) might be of use? ( http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo ). Any experience with these is appreciated.

Again, just making sure we aren't missing something with our in-house customized version. If there is something "off the shelf" I would be interested in it. Or if you've built something from some components, what components (search engines, data stores, etc) did you use and why?

You can also make your point for FTS. Just make sure it meets the requirements above before you say "just use Full Text Search because that's the only tool we have."

解决方案

I ended up coding my own.

The results are fantastic. Users like it, it works well with our existing technologies.

It really wasn't that hard. Just took some time.

Features:

  • Faceted search (amazon, walmart, etc)
  • Partial word search (the real stuff not full text)
  • Search databases (oracle, sql server, etc) and non database sources
  • Integrates well with our existing environment
  • Maintains relations, so I can have a n to n search and display
  • --> this means I can display child records of a master record in search results
  • --> also I can search any child field and return the master record

It's really amazing what you can do with dictionaries and a lot of memory.

这篇关于如何实施企业搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆