从数百万个excel文件中搜索内容的最佳方式? [英] Best way to search content from millions of excel files?

查看:91
本文介绍了从数百万个excel文件中搜索内容的最佳方式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想搜索数百万个excel文件中的单词或字符串。我想快速搜索一下!



那怎么可能并使用哪些工具?



搜索将是全球性的。

我对这种索引和全文搜索没有任何线索。



请告知我有关添加的最佳选项这个功能。



提前致谢。



我不想使用SQL Server。



使用的技术/语言.NET / C#



我尝试过:



我曾尝试在网上浏览我的问题,我找到了一些像Apache Solar,Lucene,Tika和Toxy这样的工具,但这对我来说似乎很困惑。我没有得到正确的方法来解决这个问题。

I want to search a words or string from millions of excel files. And I want rapid searching!

So how is that possible and using which tools?

Search will be global.
I have no clue about this type of Indexing and Full text searching.

Please advise me for Best options available to add this functionality.

Thanks in advance.

I don't want to use SQL Server.

Technology/Language used .NET/C#

What I have tried:

I have tried surfing the web for my problem and I have found some tools like Apache Solar, Lucene, Tika and Toxy, But It seems very confusing to me. I'm not getting correct way to solve this problem.

推荐答案

这不是一个容易解决的问题。您提到您不想使用SQL,因此我们可以假设您要查看Excel文件的实际内容。如果您正在谈论物理搜索数百万个Excel文件的内容,那么您将不得不等待很长时间。没有快速的方法可以做到这一点。



有文件管理系统允许这样做,但大多数都很复杂,昂贵,需要大量的硬件继续运行。



您提到Solr,它是索引和搜索信息的绝佳工具。使用Solr Cell,您可以索引文件的内容,如Word,Excel和PDf。



如果您感到困惑,最好的建议是选择像Solr这样的工具并开始学习。特别是这个工具有几个很好的资源。我花了几天时间才开始学习基础知识并进行服务器设置,但是一旦完成,我就可以很快地完成这些主题。我们现在使用该工具处理大约95%的应用程序的数据索引,然后将其用于搜索。
This is not an easy problem to solve. You mention that you don't want to use SQL, so we can assume you want to look at the actual contents of the Excel file. If you are talking about physically searching the contents of millions of Excel files then you are going to to have to wait a long time. There is no rapid way to do this.

There are document management systems that allow this to be done, but most are complex, expensive, and require a lot of hardware to run on.

You mention Solr which is a great tool for indexing and searching information. And using Solr Cell you are able to index the contents of file like Word, Excel, and PDf.

If you are confused the best advice to give is pick a tool like Solr and start learning. There are several great resources for this tool in particular. It took me a couple of days to first learn the basics and get a server setup, but once that was done I was able to move through the topics pretty quickly. We now use the tool to handle indexing of data for about 95% of our applications and in turn use that for searching.


您已声明不想使用SQL Server但数据库肯定是一个很好的前进方式。 PostgreSQL可能是您的最佳选择 - 请参阅 PostgreSQL:世界上最先进的开源数据库 [ ^ ]



或者你可以使用文件文本搜索 - 有关于这个在没有索引的情况下搜索文件中的字符串的工具的一些建议 - 堆栈溢出 [ ^ ]
You've stated that you don't want to use SQL Server but a database would certainly be a good way forward. PostgreSQL might be your best option - see PostgreSQL: The world's most advanced open source database[^]

Alternatively you could use a file text search - there are some suggestions on this Tools to search for strings inside files without indexing - Stack Overflow[^]


这篇关于从数百万个excel文件中搜索内容的最佳方式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆