使用MongoDB进行弹性搜索:搜索PDF [英] Elastic search With MongoDB : Searching PDFs

查看:93
本文介绍了使用MongoDB进行弹性搜索:搜索PDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将我的pdf文件保存在Mongo Db的gridFS中,然后使用弹性搜索搜索该pdf。我执行以下操作:



1)Mongo DB Side:

  mongod --port 27017 --replSet rs0 --dbpathD:\Mongo-DB\mongodb-win32-i386-2.0.7\data17
mongod --port 27018 --replSet rs0 --dbpath D:\Mongo-DB\mongodb-win32-i386-2.0.7\data18
mongod --port 27019 --replSet rs0 --dbpathD:\Mongo-DB\mongodb -win32-i386-2.0.7\data19

mongo localhost:27017
rs.initiate()
rs.add(hostname:27018)
rs.add(hostname:27019)

mongofiles -hlocalhost:27017 --db testmongo - 收藏文件 - 类型应用程序/ pdf put D:\Sherlock-Holmes.pdf

2)弹性搜索端(已安装插件:bigdesk / head / mapper-attachments / river-mongodb) p>

- >使用弹性搜索头从任何请求选项卡中提供以下请求

  URL:http:// localhost:9200 / _river / mongodb / 
_meta / PUT

{
type:mongodb,
mongodb:{
db:testmongo,
collection:fs.files,
gridfs:true,
contentType ,
content:base64 / path / filename | perl -pe's / \\\
/ \\\\
/ g'
},
index:{
name:testmongo,
type:files,
content_type:application / pdf
}
}

现在我正在尝试访问以下URL:

  http:// localhost:9200 / testmongo / files / 508e82e21e43def09b5e1602?pretty = true 

我收到以下回应(我相信是如预期的):

  {
_index:testmongo,
_type:files ,
_id:508e82e21e43def09b5e1602,
_version:1,
exists:true,_source:{_id:508e82e21e43def09b5e1602,filename D:\\Sherlock-Holmes.pdf, CHUNKSIZE:262144 uploadDate: 2012-10-29T13:21:38.969Z, MD5: 025fa2046f9254d2aecb9e52ae851065, 长度:98272 ,contentType:application / pdf}
}

但是当我在尝试使用以下URL搜索此pdf:

  http:// localhost:9200 / testmongo / files / _search?q = Albers& pretty = true 

它给我以下结果:

  {
taken:0,
timed_out:false,
_shards:{
total:5,
success:5,
failed:0

hits:{
total:0,
max_score:null,
hits:[]
}
}

这里它显示了我没有任何命中,但这个pdf中出现了Albers。请帮忙。感谢提前。

解决方案

我想你必须指定要搜索的属性

  http:// localhost:9200 / testmongo / files / _search?q =< PROPERTYNAME>:Albers& pretty = true 

甚至复杂的搜索

  $ curl -XPOST'http:// localhost:9200testmongo / files / _search?q'-d'{
< PROPERTYNAME> :value,
< PROPERTYNAME> :{
< PROPERTYNAME> :value,
< PROPERTYNAME> :value
}
}
'

据我所知,您只能在索引数据后搜索您定义的属性。


I were trying to save my pdf file in Mongo Db's gridFS and then searching in that pdfs using elastic search. I performed following :

1) Mongo DB Side:

mongod --port 27017 --replSet rs0 --dbpath "D:\Mongo-DB\mongodb-win32-i386-2.0.7\data17"
mongod --port 27018 --replSet rs0 --dbpath "D:\Mongo-DB\mongodb-win32-i386-2.0.7\data18"
mongod --port 27019 --replSet rs0 --dbpath "D:\Mongo-DB\mongodb-win32-i386-2.0.7\data19"

mongo localhost:27017
rs.initiate()
rs.add("hostname:27018")
rs.add("hostname:27019")

mongofiles -hlocalhost:27017 --db testmongo --collection files --type application/pdf put D:\Sherlock-Holmes.pdf

2) Elastic Search Side (Installed Plugins : bigdesk/head/mapper-attachments/river-mongodb)

-> Using Elastic Search Head given following request from "Any request" tab

URL : http://localhost:9200/_river/mongodb/
_meta/PUT

{
  "type": "mongodb",
  "mongodb": {
    "db": "testmongo",
    "collection": "fs.files",
    "gridfs": true,
    "contentType": "",
    "content": "base64 /path/filename | perl -pe 's/\n/\\n/g'"
  },
  "index": {
    "name": "testmongo",
    "type": "files",
    "content_type": "application/pdf"
  }
}

Now i am trying to access following URL :

http://localhost:9200/testmongo/files/508e82e21e43def09b5e1602?pretty=true

I got following response (Which i believe is as expected) :

{
  "_index" : "testmongo",
  "_type" : "files",
  "_id" : "508e82e21e43def09b5e1602",
  "_version" : 1,
  "exists" : true, "_source" : {"_id":"508e82e21e43def09b5e1602","filename":"D:\\Sherlock-Holmes.pdf","chunkSize":262144,"uploadDate":"2012-10-29T13:21:38.969Z","md5":"025fa2046f9254d2aecb9e52ae851065","length":98272,"contentType":"application/pdf"}
}

But when i were trying to search on this pdf using following URL:

http://localhost:9200/testmongo/files/_search?q=Albers&pretty=true

Its giving me following result :

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

Here its showing me no any hit but word "Albers" present in this pdf. Please help. Thanks in advance.

解决方案

i think you have to specify the property to be searched

http://localhost:9200/testmongo/files/_search?q=<PROPERTYNAME>:Albers&pretty=true

or even for complex searches

$ curl -XPOST 'http://localhost:9200testmongo/files/_search?q' -d '{
    <PROPERTYNAME> : "value",
    <PROPERTYNAME> : {
                          <PROPERTYNAME> : "value",
                          <PROPERTYNAME> : "value"
                     }
}
'

but as far as i know you can only search for your defined properties after indexing your data.

这篇关于使用MongoDB进行弹性搜索:搜索PDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆