在lucene中存储未索引的二进制数据 [英] storing non indexed binary data in lucene

查看：55 发布时间：2021/5/30 21:43:42 lucene

本文介绍了在lucene中存储未索引的二进制数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何将未索引的字节数组存储到lucene文档中?

How do you store non indexed array of bytes into lucene document?

我尝试过这些:

     doc.add(new Field("bin1", new InputStreamReader(new ByteArrayInputStream(new byte [100000]))));
     doc.add(new BinaryDocValuesField("bin2", new BytesRef(new byte [100000])));

无效(字段未存储，查询时无法检索)

and nothing worked (field not stored, unable to retrieve when querying)

测试代码:

  String index="dms1";

  Directory indexDirectory = FSDirectory.open(Paths.get(index));
  StandardAnalyzer analyzer = new StandardAnalyzer();
     IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
     iwc.setOpenMode(IndexWriterConfig.OpenMode
             .CREATE
     );
     //create the indexer
  IndexWriter iw = new IndexWriter(indexDirectory, iwc);

  {
     Document doc = new Document();

     doc.add(new TextField("id", "1", Field.Store.YES));
     doc.add(new Field("bin1", new InputStreamReader(new ByteArrayInputStream(new byte [100000]))));
     doc.add(new BinaryDocValuesField("bin2", new BytesRef(new byte [100000])));

     iw.addDocument(doc);
     iw.commit();
  }

  DirectoryReader ir = DirectoryReader.open(indexDirectory);

  IndexSearcher is = new IndexSearcher(ir);


  QueryParser qp = new QueryParser(
          "",
          analyzer);
  Query q = qp.parse(
          //"content1:hp"
          "*:*"
  );
  TopDocs hits = is.search(q, 10);

  for (ScoreDoc scoreDoc : hits.scoreDocs) {
     Document doc = is.doc(scoreDoc.doc);

     System.out.println(doc);

     System.out.println("doc.getBinaryValue(bin1):" + doc.getBinaryValue("bin1"));;
     System.out.println("doc.getBinaryValues(bin1):" + doc.getBinaryValues("bin1"));;
     System.out.println("doc.getBinaryValues(bin1).length:" + doc.getBinaryValues("bin1").length);;
     System.out.println("doc.get(bin1):" + doc.get("bin1"));;

     System.out.println("doc.getBinaryValue(bin2):" + doc.getBinaryValue("bin2"));;
     System.out.println("doc.getBinaryValues(bin2):" + doc.getBinaryValues("bin2"));;
     System.out.println("doc.getBinaryValues(bin2).length:" + doc.getBinaryValues("bin2").length);;
     System.out.println("doc.get(bin2):" + doc.get("bin2"));;
  }

输出:

    Document<stored,indexed,tokenized<id:1>>
    doc.getBinaryValue(bin1):null
    doc.getBinaryValues(bin1):[Lorg.apache.lucene.util.BytesRef;@899e53
    doc.getBinaryValues(bin1).length:0
    doc.get(bin1):null
    doc.getBinaryValue(bin2):null
    doc.getBinaryValues(bin2):[Lorg.apache.lucene.util.BytesRef;@f98160
    doc.getBinaryValues(bin2).length:0
    doc.get(bin2):null

有人可以阐明如何存储字节以及如何再次获取值吗?

Could anyone shed a light on how to store the bytes and how to retrieve the values again?

我知道其他使用base64或其他编码将字节转换为文本或将其存储为文件链接的解决方案，但是我需要知道的是一种更有效的方法，因为lucene API具有二进制"方法，所以我认为这应该是正确的方法.

I know other solution using base64 or other encoding to convert the bytes to text or storing it as file links, but what I need to know is a more efficient way to do this, since lucene API has "binary" methods so I thought that should be the correct way to do it.

lucene 版本:5.3.1

lucene version: 5.3.1

推荐答案

使用

Use a StoredField. You can pass in either the BytesRef, or the byte array itself into the field:

byte[] myByteArray = new byte[100000];
document.add(new StoredField("bin1", myByteArray));

就检索值而言，您已经在正确的轨道上了.像这样:

As far as retrieving the value, you are on about the right track there already. Something like:

Document resultDoc = searcher.doc(docno);
BytesRef bin1ref = resultDoc.getBinaryValue("bin1");
bytes[] bin1bytes = bin1ref.bytes;

顺便说一句，您尝试过的两个字段的问题:

By the way, the problem with the two fields you've tried:

bin1:当您将阅读器传递到 Field 构造函数时，它决定将其视为将被索引但不存储的 TextField ，这实际上是与您正在寻找的相反.无论如何，建议不要使用该构造函数，而只使用 TextField

bin1: When you pass a reader into the Field constructor, it decides to treat it as a TextField which will be indexed and not stored, effectively the opposite of what you are looking for. That constructor is deprecated anyway, in favor of just using TextField

如果您选择仅传入 byte [] 而不是 Reader ，那么它实际上会起作用，因为它将作为 StoredField (如上所示)，尽管该构造函数也已弃用).

If you had opted in favor just passing in the byte[] instead of the Reader, it actually would have worked, since that would have acted as a StoredField (as shown above), though that constructor is also deprecated).

bin2:DocValuesFields的工作方式不同.您可以在此处进行阅读，如果您好奇的话.

bin2: DocValuesFields work differently. You can read up a bit on that here, if you are curious.

这篇关于在lucene中存储未索引的二进制数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在lucene中存储未索引的二进制数据 [英] storing non indexed binary data in lucene

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在lucene中存储未索引的二进制数据 [英] storing non indexed binary data in lucene

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭