正在查找数据集以测试FULLTEXT样式搜索 [英] Looking for dataset to test FULLTEXT style searches on
问题描述
我正在寻找一个文本语料库来运行一些试用全文风格数据搜索。我可以下载的东西,或者一个生成它的系统。有点更随机的东西会更好。
I am looking for a corpus of text to run some trial fulltext style data searches across. Either something I can download, or a system that generates it. Something a bit more random would be better e.g. 1,000,000 wikipedia articles in a format easy to insert into a 2 column database (id, text).
任何想法或建议?
推荐答案
我会把它放在那里,因为我熟悉它 - Prosper.com使他们的成员贷款列表可供分析通过XML导出。出口将有大约50,000个贷款请求,其中包含说明和超过1,000,000个会员资料(尽管其中许多是空的)。
I'll throw this out there since I'm familiar with it - Prosper.com makes their member loan listings available for analysis through an XML export. The export would have about 50,000 loan requests with descriptions and over 1,000,000 member profiles (although many of those are empty).
这篇关于正在查找数据集以测试FULLTEXT样式搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!