在lucene中存储未索引的二进制数据 [英] storing non indexed binary data in lucene

查看:55
本文介绍了在lucene中存储未索引的二进制数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将未索引的字节数组存储到lucene文档中?

How do you store non indexed array of bytes into lucene document?

我尝试过这些:

     doc.add(new Field("bin1", new InputStreamReader(new ByteArrayInputStream(new byte [100000]))));
     doc.add(new BinaryDocValuesField("bin2", new BytesRef(new byte [100000])));

无效(字段未存储,查询时无法检索)

and nothing worked (field not stored, unable to retrieve when querying)

测试代码:

  String index="dms1";

  Directory indexDirectory = FSDirectory.open(Paths.get(index));
  StandardAnalyzer analyzer = new StandardAnalyzer();
     IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
     iwc.setOpenMode(IndexWriterConfig.OpenMode
             .CREATE
     );
     //create the indexer
  IndexWriter iw = new IndexWriter(indexDirectory, iwc);

  {
     Document doc = new Document();

     doc.add(new TextField("id", "1", Field.Store.YES));
     doc.add(new Field("bin1", new InputStreamReader(new ByteArrayInputStream(new byte [100000]))));
     doc.add(new BinaryDocValuesField("bin2", new BytesRef(new byte [100000])));

     iw.addDocument(doc);
     iw.commit();
  }

  DirectoryReader ir = DirectoryReader.open(indexDirectory);

  IndexSearcher is = new IndexSearcher(ir);


  QueryParser qp = new QueryParser(
          "",
          analyzer);
  Query q = qp.parse(
          //"content1:hp"
          "*:*"
  );
  TopDocs hits = is.search(q, 10);

  for (ScoreDoc scoreDoc : hits.scoreDocs) {
     Document doc = is.doc(scoreDoc.doc);

     System.out.println(doc);

     System.out.println("doc.getBinaryValue(bin1):" + doc.getBinaryValue("bin1"));;
     System.out.println("doc.getBinaryValues(bin1):" + doc.getBinaryValues("bin1"));;
     System.out.println("doc.getBinaryValues(bin1).length:" + doc.getBinaryValues("bin1").length);;
     System.out.println("doc.get(bin1):" + doc.get("bin1"));;

     System.out.println("doc.getBinaryValue(bin2):" + doc.getBinaryValue("bin2"));;
     System.out.println("doc.getBinaryValues(bin2):" + doc.getBinaryValues("bin2"));;
     System.out.println("doc.getBinaryValues(bin2).length:" + doc.getBinaryValues("bin2").length);;
     System.out.println("doc.get(bin2):" + doc.get("bin2"));;
  }

输出:

    Document<stored,indexed,tokenized<id:1>>
    doc.getBinaryValue(bin1):null
    doc.getBinaryValues(bin1):[Lorg.apache.lucene.util.BytesRef;@899e53
    doc.getBinaryValues(bin1).length:0
    doc.get(bin1):null
    doc.getBinaryValue(bin2):null
    doc.getBinaryValues(bin2):[Lorg.apache.lucene.util.BytesRef;@f98160
    doc.getBinaryValues(bin2).length:0
    doc.get(bin2):null

有人可以阐明如何存储字节以及如何再次获取值吗?

Could anyone shed a light on how to store the bytes and how to retrieve the values again?

我知道其他使用base64或其他编码将字节转换为文本或将其存储为文件链接的解决方案,但是我需要知道的是一种更有效的方法,因为lucene API具有二进制"方法,所以我认为这应该是正确的方法.

I know other solution using base64 or other encoding to convert the bytes to text or storing it as file links, but what I need to know is a more efficient way to do this, since lucene API has "binary" methods so I thought that should be the correct way to do it.

lucene 版本:5.3.1

lucene version: 5.3.1

推荐答案

使用

Use a StoredField. You can pass in either the BytesRef, or the byte array itself into the field:

byte[] myByteArray = new byte[100000];
document.add(new StoredField("bin1", myByteArray));

就检索值而言,您已经在正确的轨道上了.像这样:

As far as retrieving the value, you are on about the right track there already. Something like:

Document resultDoc = searcher.doc(docno);
BytesRef bin1ref = resultDoc.getBinaryValue("bin1");
bytes[] bin1bytes = bin1ref.bytes;


顺便说一句,您尝试过的两个字段的问题:


By the way, the problem with the two fields you've tried:

  • bin1:当您将阅读器传递到 Field 构造函数时,它决定将其视为将被索引但不存储的 TextField ,这实际上是与您正在寻找的相反.无论如何,建议不要使用该构造函数,而只使用 TextField

  • bin1: When you pass a reader into the Field constructor, it decides to treat it as a TextField which will be indexed and not stored, effectively the opposite of what you are looking for. That constructor is deprecated anyway, in favor of just using TextField

如果您选择仅传入 byte [] 而不是 Reader ,那么它实际上会起作用,因为它将作为 StoredField (如上所示),尽管该构造函数也已弃用).

If you had opted in favor just passing in the byte[] instead of the Reader, it actually would have worked, since that would have acted as a StoredField (as shown above), though that constructor is also deprecated).

bin2:DocValuesFields的工作方式不同.您可以在此处进行阅读,如果您好奇的话.

bin2: DocValuesFields work differently. You can read up a bit on that here, if you are curious.

这篇关于在lucene中存储未索引的二进制数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆