将 Solr xml 文件解析为 SolrInputDocument [英] parse Solr xml files to SolrInputDocument
问题描述
如果我有预期的 Solr 格式的单个文件(每个文件只有一个文档):
If I have individual files in the expected Solr format (having just ONE doc per file):
<add>
<doc>
<field name="id">GB18030TEST</field>
<field name="name">Test with some GB18030 encoded characters</field>
<field name="features">No accents here</field>
<field name="features">ÕâÊÇÒ»¸ö¹¦ÄÜ</field>
<field name="price">0</field>
</doc>
</add>
难道没有一种方法可以轻松地将该文件编组到 SolrInputDocument 中吗?我必须自己做解析吗?
Is not there a way to easily marshal that file into a SolrInputDocument? Do I have to do the parsing myself?
我需要在 java pojo 中使用它,因为我想在使用 SolrJ 对其进行索引之前修改一些字段...
I need it in java pojo cause I want to modify some fields before indexing it with SolrJ...
推荐答案
这最好以编程方式完成.我知道您正在寻找 Java 解决方案,但我个人推荐 groovy.
This is best done programmatically. I know you're looking for a Java solution, but I'd personally recommend groovy.
以下脚本处理在当前目录中找到的 XML 文件.
The following script processes XML files found in the current directory.
//
// Dependencies
// ============
import org.apache.solr.client.solrj.SolrServer
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
import org.apache.solr.common.SolrInputDocument
@Grapes([
@Grab(group='org.apache.solr', module='solr-solrj', version='3.5.0'),
])
//
// Main
// =====
SolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr/");
new File(".").eachFileMatch(~/.*.xml/) {
it.withReader { reader ->
def xml = new XmlSlurper().parse(reader)
xml.doc.each {
SolrInputDocument doc = new SolrInputDocument();
it.field.each {
doc.addField(it.@name.text(), it.text())
}
server.add(doc)
}
}
}
server.commit()
这篇关于将 Solr xml 文件解析为 SolrInputDocument的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!