使用Lucene进行索引时如何将JSON对象视为单独的文档 [英] How to treat JSON objects as separate documents while indexing using Lucene

查看:189
本文介绍了使用Lucene进行索引时如何将JSON对象视为单独的文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些JSON文件,看起来像下面的文件.我想将每个文件中的每个JSON对象都视为一个文档(以"user_id"作为唯一标识符).我的代码将整个JSON文件视为一个文档.我怎样才能解决这个问题?

I have a few JSON files, that look like the one below. I want to treat each JSON object in each file as one document (with "user_id" as a unique identifier). My code treats the entire JSON file as one document. How can I fix this?

[
{
"user_id": "john_doeee",
"lon": 204.0,
"lat": 101.0,
"stored" : true,
"hashtag" : "ucriverside"
},
{
"user_id": "carlos_baby",
"lon": 204.0,
"lat": 101.0,
"stored" : true,
"hashtag" : "UCR"
},
{
"user_id": "emmanuel_",
"lon": 204.0,
"lat": 101.0,
"stored" : false,
"hashtag": "riverside"
}
]

我认为这与Document方法有关? 这是我所拥有的:

I think it has something to do with the Document method? Here's what I have:

static void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOException
{
try (InputStream stream = Files.newInputStream(file))
{
     //Create lucene Document
     Document doc = new Document();

     doc.add(new StringField("path", file.toString(), Field.Store.YES));
     doc.add(new LongPoint("modified", lastModified));
     doc.add(new TextField("contents", new String(Files.readAllBytes(file)), Store.YES));

     writer.updateDocument(new Term("path", file.toString()), doc);
}
}

推荐答案

否,与Document方法无关. Lucene没有默认的方式来理解这是JSON文件,应该将其拆分为多个Lucene文档.您需要使用一些Java JSON库自己完成操作.

No, it's nothing to do with Document method. Lucene have no default ways of understanding that this is JSON file and it should be split up in several Lucene documents. You would need to do it yourself, by using some Java JSON library.

许多可能性之一可能是使用 https://github.com/stleary/JSON- java 库,其代码如下:

One of many of possibilities could be to use https://github.com/stleary/JSON-java library with code like this:

JSONArray arr = new JSONArray(" .... ");
for (int i = 0; i < arr.length(); i++) {
    String text = arr.getJSONObject(i);
    doc.add(new TextField("contents", text), Store.YES));
}

当然,您可以自由使用其他任何JSON库,例如Jackson,GSON等.

Of course you're free to use any other JSON libraries like Jackson, GSON, etc.

这篇关于使用Lucene进行索引时如何将JSON对象视为单独的文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆