我可以预测我的Zend Framework索引多大吗? (以及一些快速的问:s) [英] Can I predict how large my Zend Framework index will be? (and some quick Q:s)

查看:74
本文介绍了我可以预测我的Zend Framework索引多大吗? (以及一些快速的问:s)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在mysql表中大约有100thousand行,其中each row has about 8 fields.

I have around 100thousand rows in a mysql table, where each row has about 8 fields.

我终于掌握了如何使用Zend Lucene来索引和搜索mysql表中的数据.

I have finally got the hold on how to use Zend Lucene to index and search data from a mysql table.

在我对该网站完全实现此功能之前,我有一些问题:

Before I fully implement this funcionality to my website, I have some questions:

1-是否可以预先确定索引的大小?这是因为在Zend手册中说索引的最大大小为2GB.我立刻想到这还不够我的桌子!

1- Is it possible to determine the size of a index in advance? This because in the Zend manual it says the max size of a index is 2GB. I am straight away thinking that isn't enough for my table!

2-我读过一些帖子,他们说Zend Lucene在大型索引上的搜索速度非常慢,最长可达几分钟!直接使用mysql命令(SELECT,LIKE等)代替zend更快吗?

2- I have read posts where they say Zend Lucene search is very slow on large indexes, up to minutes! Is it faster to use mysql commands directly (SELECT, LIKE etc) instead of zend?

3-对于我的问题,是否还有其他解决方法,那就是为分类广告创建具有

3- Is there any other solutions to my problem which is to create a search engine for classifieds which has these functions atleast, and doesn't require full-text mysql indexes (fields).

谢谢

推荐答案

SOLR基本上是一个Apache Tomcat容器,该容器实现了REST接口来查询Apache Lucene索引.是的,您需要能够在Web服务器上运行Java应用程序.这是您与托管服务提供商一起解决的一个问题.

SOLR is basically an Apache Tomcat container that implements a REST interface to query an Apache Lucene index. Yes, you need to be able to run a Java application on your web server. This is an issue for you to work out with your hosting provider.

使用您的Web应用程序的客户端不需要运行Java.您的PHP应用程序可以对SOLR服务进行REST查询,并以HTML格式格式化结果.客户端只能看到HTML输出.永远不需要知道数据来自使用Java实现的服务.

Clients using your web app don't need to run Java. Your PHP app could make a REST query to the SOLR service and format the results in HTML. A client sees only the HTML output; it never needs to know that the data came from a service implemented in Java.

Zend_Search_Lucene是纯PHP实现,应该与Apache Lucene相同. Zend解决方案甚至使用相同的索引文件格式.因此,在存储方面,它们应该相等.

Zend_Search_Lucene is a pure-PHP implementation that is supposed to work identically to Apache Lucene. The Zend solution even uses an identical index file format. So storage-wise they should be equal.

我使用Java Lucene索引了StackOverflow数据转储(2009年10月).我索引了150万行,包括大约1 gig的文本数据. Lucene索引为1323 MB,而相同数据的MySQL FULLTEXT索引仅为466 MB.

I used Java Lucene to index the StackOverflow data dump (October 2009). I indexed 1.5 million rows, including about 1 gig of text data. The Lucene index was 1323 MB, whereas the MySQL FULLTEXT index of the same data was only 466 MB.

使用SQL LIKE谓词代替任何全文本索引解决方案当然不需要任何空间,因为它仍然无法利用常规索引.但是在我的测试中,使用LIKE的速度比Java Lucene慢200倍,而Java Lucene的速度却比相同数据上的MySQL FULLTEXT索引慢40%.

Using SQL LIKE predicates in lieu of any fulltext indexing solution requires no space of course, because it cannot make use of a conventional index anyway. But in my tests using LIKE was about 200 times slower than Java Lucene, which was in turn about 40% slower than a MySQL FULLTEXT index on the same data.

请参阅我最近关于MySQL全文索引解决方案的演讲:

See my recent presentation about fulltext indexing solutions with MySQL:

http://www.slideshare.net/billkarwin/practical-full-text-search-with-my-sql

它不能与Java Lucene技术的性能和可伸缩性相匹配就不足为奇了. PHP作为一种语言的优势在于提高了开发效率,而不是运行时效率.

It's not surprising that it can't match the performance and scalability of the Java Lucene technology. PHP's advantage as a language is increasing development efficiency, not runtime efficiency.

更新: 我只是尝试使用Zend_Search_Lucene创建索引.用PHP创建索引比使用Java Lucene技术要慢得多,因此我只索引了10,000个文档.这花费了将近15分钟,这使索引整个馆藏花费了大约36个小时.与此相比,Java Lucene在我的测试中在不到7分钟的时间内索引了150万份文档的全部集合.

update: I just tried creating an index using Zend_Search_Lucene. Creating an index is far slower with PHP than with the Java Lucene technology, so I only indexed 10,000 documents. This took almost 15 minutes, which would make it take about 36 hours to index the whole collection. Compare this to Java Lucene, which in my test indexed the full collection of 1.5 million documents in under 7 minutes.

我用Zend_Search_Lucene创建的索引大小为8.75 MB.推断这150倍,我估计完整索引将为1312.5 MB.因此,我得出结论,Zend_Search_Lucene创建的索引大小与Java Lucene生成的索引大小相同.这是预期的.

The size of the index I created with Zend_Search_Lucene is 8.75 MB. Extrapolating this 150x, I estimate the full index would be 1312.5 MB. So I conclude that Zend_Search_Lucene creates an index of about the same size as the index produced by Java Lucene. This is as expected.

这篇关于我可以预测我的Zend Framework索引多大吗? (以及一些快速的问:s)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆