在lucene中存储未索引的二进制数据 [英] storing non indexed binary data in lucene
问题描述
如何将未索引的字节数组存储到lucene文档中?
How do you store non indexed array of bytes into lucene document?
我尝试过这些:
doc.add(new Field("bin1", new InputStreamReader(new ByteArrayInputStream(new byte [100000]))));
doc.add(new BinaryDocValuesField("bin2", new BytesRef(new byte [100000])));
无效(字段未存储,查询时无法检索)
and nothing worked (field not stored, unable to retrieve when querying)
测试代码:
String index="dms1";
Directory indexDirectory = FSDirectory.open(Paths.get(index));
StandardAnalyzer analyzer = new StandardAnalyzer();
IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
iwc.setOpenMode(IndexWriterConfig.OpenMode
.CREATE
);
//create the indexer
IndexWriter iw = new IndexWriter(indexDirectory, iwc);
{
Document doc = new Document();
doc.add(new TextField("id", "1", Field.Store.YES));
doc.add(new Field("bin1", new InputStreamReader(new ByteArrayInputStream(new byte [100000]))));
doc.add(new BinaryDocValuesField("bin2", new BytesRef(new byte [100000])));
iw.addDocument(doc);
iw.commit();
}
DirectoryReader ir = DirectoryReader.open(indexDirectory);
IndexSearcher is = new IndexSearcher(ir);
QueryParser qp = new QueryParser(
"",
analyzer);
Query q = qp.parse(
//"content1:hp"
"*:*"
);
TopDocs hits = is.search(q, 10);
for (ScoreDoc scoreDoc : hits.scoreDocs) {
Document doc = is.doc(scoreDoc.doc);
System.out.println(doc);
System.out.println("doc.getBinaryValue(bin1):" + doc.getBinaryValue("bin1"));;
System.out.println("doc.getBinaryValues(bin1):" + doc.getBinaryValues("bin1"));;
System.out.println("doc.getBinaryValues(bin1).length:" + doc.getBinaryValues("bin1").length);;
System.out.println("doc.get(bin1):" + doc.get("bin1"));;
System.out.println("doc.getBinaryValue(bin2):" + doc.getBinaryValue("bin2"));;
System.out.println("doc.getBinaryValues(bin2):" + doc.getBinaryValues("bin2"));;
System.out.println("doc.getBinaryValues(bin2).length:" + doc.getBinaryValues("bin2").length);;
System.out.println("doc.get(bin2):" + doc.get("bin2"));;
}
输出:
Document<stored,indexed,tokenized<id:1>>
doc.getBinaryValue(bin1):null
doc.getBinaryValues(bin1):[Lorg.apache.lucene.util.BytesRef;@899e53
doc.getBinaryValues(bin1).length:0
doc.get(bin1):null
doc.getBinaryValue(bin2):null
doc.getBinaryValues(bin2):[Lorg.apache.lucene.util.BytesRef;@f98160
doc.getBinaryValues(bin2).length:0
doc.get(bin2):null
有人可以阐明如何存储字节以及如何再次获取值吗?
Could anyone shed a light on how to store the bytes and how to retrieve the values again?
我知道其他使用base64或其他编码将字节转换为文本或将其存储为文件链接的解决方案,但是我需要知道的是一种更有效的方法,因为lucene API具有二进制"方法,所以我认为这应该是正确的方法.
I know other solution using base64 or other encoding to convert the bytes to text or storing it as file links, but what I need to know is a more efficient way to do this, since lucene API has "binary" methods so I thought that should be the correct way to do it.
lucene 版本:5.3.1
lucene version: 5.3.1
推荐答案
Use a StoredField
. You can pass in either the BytesRef
, or the byte array itself into the field:
byte[] myByteArray = new byte[100000];
document.add(new StoredField("bin1", myByteArray));
就检索值而言,您已经在正确的轨道上了.像这样:
As far as retrieving the value, you are on about the right track there already. Something like:
Document resultDoc = searcher.doc(docno);
BytesRef bin1ref = resultDoc.getBinaryValue("bin1");
bytes[] bin1bytes = bin1ref.bytes;
顺便说一句,您尝试过的两个字段的问题:
By the way, the problem with the two fields you've tried:
-
bin1:当您将阅读器传递到
Field
构造函数时,它决定将其视为将被索引但不存储的TextField
,这实际上是与您正在寻找的相反.无论如何,建议不要使用该构造函数,而只使用TextField
bin1: When you pass a reader into the
Field
constructor, it decides to treat it as aTextField
which will be indexed and not stored, effectively the opposite of what you are looking for. That constructor is deprecated anyway, in favor of just usingTextField
如果您选择仅传入 byte []
而不是 Reader
,那么它实际上会起作用,因为它将作为 StoredField
(如上所示),尽管该构造函数也已弃用).
If you had opted in favor just passing in the byte[]
instead of the Reader
, it actually would have worked, since that would have acted as a StoredField
(as shown above), though that constructor is also deprecated).
bin2:DocValuesFields的工作方式不同.您可以在此处进行阅读,如果您好奇的话.
bin2: DocValuesFields work differently. You can read up a bit on that here, if you are curious.
这篇关于在lucene中存储未索引的二进制数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!