从C＃客户端Solr的PDF文档编制索引 [英] Index pdf documents in Solr from C# client

查看：236 发布时间：2016/9/18 12:02:02 c# pdf tomcat solr solrnet

本文介绍了从C＃客户端Solr的PDF文档编制索引的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

基本上我试图指数Word或PDF文件，Solr和发现ExtractingRequestHandler，但无法弄清楚如何写在执行像维基Solr的HTTP POST请求的C＃代码：的http://wiki.apache.org/solr/ExtractingRequestHandler 。

Basically I'm trying to index word or pdf documents in Solr and found the ExtractingRequestHandler, but can't figure out how to write code in c# that performs the HTTP POST request like in the Solr wiki: http://wiki.apache.org/solr/ExtractingRequestHandler.

我已经利用从Solr的拉链的例子/ Solr的目录中的文件安装在Tomcat 7（7.0.22）Solr的3.4和我没有改变任何东西。该ExtractingRequestHandler应在solrconfig.xml中配置开箱，并准备使用，怎么你吧？

I've installed Solr 3.4 on Tomcat 7 (7.0.22) using the files from the example/solr directory in the Solr zip and I haven't altered anything. The ExtractingRequestHandler should be configured out of the box in the solrconfig.xml and ready to use, right?

可以将部分你给一个C＃（HttpWebRequest的）例子使HTTP POST请求，并上传PDF文件就像是在Solr的维基使用curl呢？

Can some of you give an C# (HttpWebRequest) example of how you make the HTTP POST request and upload a PDF file like it is done using curl in the Solr wiki?

我已经看遍这个网站，许多人试图找到。一个例子或如何做到这一点，但没有发现任何教程

I've look all over this site and many others trying to find an example or a tutorial on how this is done, but haven't found anything.

编辑：

我终于设法得到它使用SolrNet工作！

I finally managed to get it to work using SolrNet!

为了为它工作，你需要在这个拷贝到一个lib文件夹从Solr的压缩你的Solr的安装目录：

In order for it to work you need to copy this to a lib-folder in your Solr installation directory from the Solr zip:

从dist文件夹Apache的Solr的细胞-3.4.0.jar文件
contrib\extraction\lib目录的内容

通过SolrNet 0.4.0 Beta 2中，这代码做这项工作：

With SolrNet 0.4.0 beta 2, this code does the job:

Startup.Init<IndexDocument>("YOUR-SOLR-SERVICE-PATH");
var solr = ServiceLocator.Current.GetInstance<ISolrOperations<IndexDocument>>();

using (FileStream fileStream = File.OpenRead("FILE-PATH-FOR-THE-FILE-TO-BE-INDEXED"))
{
    var response =
        solr.Extract(
            new ExtractParameters(fileStream, "doc1")
            {
                ExtractFormat = ExtractFormat.Text,
                ExtractOnly = false
            });
}

solr.Commit();

很抱歉的麻烦。但是我希望其他人会发现这个有用

Sorry for the trouble. I hope however that others will find this useful.

从C＃客户端Solr的PDF文档编制索引 [英] Index pdf documents in Solr from C# client

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

从C＃客户端Solr的PDF文档编制索引 [英] Index pdf documents in Solr from C# client

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭