将SOLR与Web项目一起使用的最佳方法是什么? [英] What's the best approach for using SOLR with web projects?

查看:137
本文介绍了将SOLR与Web项目一起使用的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,我对SOLR和Lucene完全不熟悉,但已经让Solr在Tomcat 6.x下运行开箱即用,并且刚刚完成了一些基本的Wiki条目。



我有几个问题,也需要一些建议。


  1. Solr可以索引数据在文件(XML,CSV)中,它也可以索引DB。您是否也可以将它指向一个URI /域,并让它以谷歌的方式索引一个网站?


  2. 如果我有一个带有Pages的网站数据,所以页面名称,页面内容等,以及产品数据,所以产品名称,SKU等,我需要两个不同的Schema.xml文件吗?如果是这样,这是否意味着两个不同的Solr实例?


最后,如果你有一个大项目关系数据库和规范化数据库,您认为下面3个选项中的最佳方法是什么?:


  1. 运行中间件服务在后台,它挖掘数据库并手动创建相关的XML文件然后发送到SOLR


  2. 让SOLR直接索引数据库。在这种情况下,最好只将SOLR指向视图,这会抽象所有表关系吗?


  3. 我不知道的其他任何选项?


上下文:我们在Windows 2003环境中运行,.NET 3.5,SQLServer 2005/2008



欢呼!

解决方案


  1. 不,你需要例如,一个爬行器 Nutch

  2. 是的,你想要两个单独的索引(= 2) schema.xml)因为数据集似乎没有相关性。这并不意味着Solr的两个实例,您可以使用核心来管理这两个索引。

至于填充Solr索引,它取决于您的特定项目,例如,它是否可以容忍陈旧数据,还是必须绝对新鲜。



索引数据的其他选项包括:




  • 数据库触发器

  • 如果您正在使用某种ORM,请使用其拦截功能。例如,您可以使用NHibernate事件来更新,插入或删除索引。如果您使用NHibernate和 SolrNet ,则自动处理


ok, I'm totally new to SOLR and Lucene, but have got Solr running out-of-the-box under Tomcat 6.x and have just gone over some of the basic Wiki entries.

I have a few questions, and require some suggestions too.

  1. Solr can index data in files (XML, CSV) and it can also index DBs. Can you also just point it to a URI/domain, and have it index a website in the way google would?

  2. If I have a website with "Pages" data, so "Page Name", "Page Content" etc, and "Products Data", so "Product Name", "SKU" etc, do I need two different Schema.xml files? and if so, does that mean two different instances of Solr?

Finally, if you have a project with a large relational and normalized database, what would you say is the best approach from the 3 options below?:

  1. Have a middleware service running in the background, which mines the DB and manually creates the relevant XML files to then send to SOLR

  2. Have SOLR index the DB directly. In this case, would it be best to just point SOLR to views, which would abstract all the table relationships?

  3. Any other options I'm unaware of?

Context: We're running in a Windows 2003 environment, .NET 3.5, SQLServer 2005/2008

cheers!

解决方案

  1. No, you need a crawler for that, e.g. Nutch
  2. Yes, you want two separate indexes ( = two schema.xml) since the datasets don't seem to be related. This doesn't mean two instances of Solr, you can manage the two indexes with Cores.

As for populating the Solr index, it depends on your particular project, for example, can it tolerate stale data or does it have to absolutely fresh.

Other options to index data include:

  • Database triggers
  • If you're using some sort of ORM use its interception capabilities. For example you can use NHibernate events to update the index on update, insert or delete. If you use NHibernate and SolrNet this is taken care of automatically

这篇关于将SOLR与Web项目一起使用的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆