在JavaScript中通过CD-Rom全文搜索静态HTML文件 [英] Full-text search for static HTML files on CD-Rom via javascript

查看:104
本文介绍了在JavaScript中通过CD-Rom全文搜索静态HTML文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将在CD-Rom上提供一组静态HTML页面;这些网页需要完全可见,且无需任何互联网访问。



我想为这些网页的内容提供全文搜索(类似Lucene) ,它应该在CD-Rom上正常工作,而客户端机器上没有软件安装。

在javascript中的搜索引擎实现将是完美的解决方案,但是我找不到任何看起来很扎实/流行/流行的...?



我找到了这些:
+ jsFind
+ js-search



但这两个项目看起来都很不活跃?

另一个解决方案,除了javascript中的特定搜索引擎之外,还可以从javascript访问本地Lucene索引:索引本身将使用Lucene构建,并与HTML文件一起复制到CD-ROM。 / p>

编辑:自己构建它(见下文)。

解决方案

实际上,我自己构建它。



现有解决方案(我可以找到)没有说服力。



我希望能够搜索一棵很长的树(ul / li / ul ...)显示为一个页面;它包含5000多个项目。



在一页上显示这么长的树听起来有点奇怪,但事实上,通过折叠/展开它比单独的页面更直观,而且由于我们处于离线状态,所以下载时间不成问题(解析时间是,但Chrome非常惊人; - )

现代化的搜索功能浏览器(FF和Chrome无论如何)有两个大问题:他们只搜索页面上的可见项目,而且他们不能搜索非连续的单词。



我想能够搜索折叠的项目(屏幕上不可见);我想在搜索one three时找到one two three(就像Google / Lucene一样);我只想打开包含找到的项目的树的分支。



所以,我做的是:

< (通过xslt)(文档中大约4500个唯一的单词)创建一个单词的倒序索引。


  • 将此索引转换为javascript数组(一个单词=一个数组,包含ids)
  • 搜索时,与搜索词表示的数组相交

  • 步骤3返回一个ID数组,然后我可以打开/高亮显示

  • 它完全符合我的需求,而且速度非常快。更好的是,因为它从一个独立的索引(id数组)搜索,所以它可以搜索列表甚至没有加载到浏览器中!


    I will be delivering a set of static HTML pages on CD-Rom; these pages need to be fully viewable with no Internet access whatsoever.

    I'd like to provide a full-text search (Lucene-like) for the content of those pages, which should "just work" from the CD-Rom with no software installation on the client machine.

    A search engine implementation in javascript would be the perfect solution, but I have trouble finding any that looks solid / current / popular...?

    I did find these: + jsFind + js-search

    but both projects seem rather inactive?

    Another solution, besides a specific search engine in javascript, would be the ability to access local Lucene indices from javascript: the indices themselves would be built with Lucene and copied to the CD-Rom along with the HTML files.

    Edit: built it myself (see below).

    解决方案

    Well in fact I built it myself.

    The existing solutions (that I could find) were unconvincing.

    I wanted to be able to search a very long tree (ul/li/ul...) that is displayed as one page; it contains 5000+ items.

    It sounds a little weird to display such a long tree on one page but in fact with collapse / expand it's much more intuitive than separate pages, and since we're offline, download times are not a problem (parsing times are, though, but Chrome is amazing ;-)

    The "search" function provided with modern browsers (FF and Chrome anyway) have two big problems: they only search visible items on the page, and they can't search non-consecutive words.

    I want to be able to search collapsed items (not visible on the screen); I want to find "one two three" when searching "one three" (just like with Google / Lucene); and I want to open just the branches of the tree containing found items.

    So, what I did was:

    1. create an inverted index of words <-> ids of items from the list (via xslt) (approx. 4500 unique words in the document)
    2. convert this index to bunch of javascript arrays (one word = one array, containing ids)
    3. when searching, intersect the arrays represented by the search words
    4. step 3 returns an array of ids that I can then open / highlight

    It does exactly what I needed and it's really fast. Better yet, since it searches from an independant "index" (arrays of ids) it can search when the list is not even loaded in the browser!

    这篇关于在JavaScript中通过CD-Rom全文搜索静态HTML文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆