Apache Lucene索引大型XML文件 [英] Apache Lucene indexing of large XML file

查看:121
本文介绍了Apache Lucene索引大型XML文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是lucene的新手,我想使用大型xml文件(15GB)的lucene进行索引,这些xml文件包含纯文本以及属性和许多xml标记.如何使用lucene和任何示例来解析和索引此xml文件,如果我们使用lucene,我们需要任何数据库

I am new in lucene I want to indexing with lucene of large xml files(15GB) that contain plain text as well as attribute and so many xml tags. how to parse and indexing this xml file using lucene with any sample and if we use lucene we need any database

如何使用Lucene解析和索引巨大的xml文件?任何示例或链接都将有助于我理解该过程.另一个,如果我使用lucene,我将需要任何数据库,就像我已经看到的并使用数据库建立索引一样.

How to parse and index huge xml file using lucene ? Any sample or links would be helpful to me to understand the process. Another one, if I use lucene, will I need any database, as I have seen and done indexing with Databases..

推荐答案

您的索引将像使用数据库一样进行构建,只需遍历要索引的所有数据并将其写入索引即可.只需使用 XmlReader 类来解析您的xml,一种仅前进的方式.与数据库一样,您将需要索引某种主键,以便您知道搜索结果代表的是什么.

Your indexing would be build as you would have done using a database, just iterate through all data you want to index and write it to the index. Just go with the XmlReader class to parse your xml in a forward-only fashion. You will, just as with a database, need to index some kind of primary-key so you know what the search result represents.

从主键查找索引数据时,数据库会提供帮助.如果您需要在每次请求时都迭代15 GiB xml文件,那么读取主键的数据将很麻烦.

A database helps when it comes to looking up the indexed data from the primary-key. It will be messy to read the data for a primary-key if you need to iterate a 15 GiB xml file at every request.

数据库不是必需的,但是它可以帮助很多.我会将其构建为导入工具,该工具可读取您的xml,将其转储到数据库中,然后使用您之前构建的常规"数据库索引代码.

A database is not required, but it helps a lot. I would build this as an import tool that reads your xml, dumps it into your database, and then use your "normal" database indexing code you've built before.

这篇关于Apache Lucene索引大型XML文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆