Solr 增量导入不起作用 [英] Solr delta-import not working

查看:21
本文介绍了Solr 增量导入不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

完全导入和deletedPkQuery有效,我已经跟踪了数据库服务器,deltaQuery 和deletedPkQuery 都被执行了.

Full import and deletedPkQuery works , I've traced the database server both the deltaQuery and deletedPkQuery are executed.

我多次手动执行这些查询,它们确实返回了行,但是

ive executed these queries manually many times and they do indeed return row(s), but

它不获取任何行.我做的最后一件事是在所有查询中输出 FILE_ID 作为 id.还是不行.

It does not fetch any rows. Last thing i did was to output the FILE_ID as id on all the queries. Still doesnt work.

<dataConfig>

<dataSource name="db" type="JdbcDataSource" driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://localhost:1433;databaseName=norway_operations;responseBuffering=adaptive;selectMethod=cursor" user="noropuser" password="noropuser" autoCommit="false" transactionIsolation="TRANSACTION_READ_COMMITTED"holdability="CLOSE_CURSORS_AT_COMMIT"/>
<dataSource name="bin" type="BinFileDataSource"  basePath="D:OPG_FILESTORE"/>

<document>

    <entity name="file" dataSource="db" pk="id" query="select FILE_ID as id, CATEGORY_ID, CATEGORY_NAME, FILENAME, FILE_MIME_TYPE, PATH, LAST_MODIFIED as last_modified from DOCUMENTS"
            deltaQuery="select FILE_ID as id from DOCUMENTS where LAST_MODIFIED > '${dataimporter.last_index_time}'" 
            deltaImportQuery="select FILE_ID as id, CATEGORY_ID, CATEGORY_NAME, FILENAME, FILE_MIME_TYPE, PATH, LAST_MODIFIED as last_modified from DOCUMENTS where FILE_ID = '${dih.delta.id}'" 
            deletedPkQuery="delete from PK_DELETE_HISTORY output DELETED.PK AS id where PK_NAME = 'FILE_ID'" >

        <field column="id" name="id" />
        <field column="CATEGORY_ID" name="categoryId" />
        <field column="CATEGORY_NAME" name="category" />
        <field column="FILENAME" name="filename" />
        <field column="FILE_MIME_TYPE" name="content_type" />
        <field column="last_modified" name="last_modified" />

        <entity name="tika" processor="TikaEntityProcessor" url="${file.PATH}" parser="org.apache.tika.parser.AutoDetectParser" format="text" dataSource="bin" onError="continue">                
            <field column="text" name="content" />
            <field column="title" name="title"/>
            <field column="subject" name="subject"/>
            <field column="description" name="description"/>
            <field column="comments" name="comments"/>
            <field column="author" name="author"/>
            <field column="keywords" name="keywords"/>
            <field column="url"  name="url"/>
            <field column="content_type" name="content_type" />                
            <field column="links"  name="links" />                
        </entity>            
    </entity>        
</document>

跟踪

declare @p1 int
set @p1=180150003
declare @p5 int
set @p5=-1
exec sp_cursoropen @p1 output,N'select FILE_ID as id from DOCUMENTS where LAST_MODIFIED > ''2014-02-06 15:02:40''',16,8193,@p5 output
select @p1, @p5

当我手动运行它时,它返回 1 行

When i run this manually it returns 1 row

回复:

    <?xml version="1.0" encoding="UTF-8" ?> 
- <response>
- <lst name="responseHeader">
  <int name="status">0</int> 
  <int name="QTime">31</int> 
  </lst>
- <lst name="initArgs">
- <lst name="defaults">
  <str name="config">db-data-config.xml</str> 
  <int name="rows">0</int> 
  <int name="start">0</int> 
  </lst>
  </lst>
  <str name="command">delta-import</str> 
  <str name="mode">debug</str> 
  <arr name="documents" /> 
  <lst name="verbose-output" /> 
  <str name="status">idle</str> 
  <str name="importResponse" /> 
- <lst name="statusMessages">
  <str name="Total Requests made to DataSource">2</str> 
  <str name="Total Rows Fetched">0</str> 
  <str name="Total Documents Skipped">0</str> 
  <str name="Delta Dump started">2014-02-06 15:32:20</str> 
  <str name="Identifying Delta">2014-02-06 15:32:20</str> 
  <str name="Deltas Obtained">2014-02-06 15:32:20</str> 
  <str name="Building documents">2014-02-06 15:32:20</str> 
  <str name="Total Changed Documents">0</str> 
  <str name="Total Documents Processed">0</str> 
  <str name="Time taken">0:0:0.16</str> 
  </lst>
  <str name="WARNING">This response format is experimental. It is likely to change in the future.</str> 
  </response>

推荐答案

可能值得寻找的东西:

1.时间戳保存在dataimport.properties配置文件

1.Timestamp saved in dataimport.properties config file

这发生在我身上

运行 delta-import(成功)将更新 conf/dataimport.properties 文件中的 {dataimporter.last_index_time}.下一次,您的查询可能会基于新的时间戳运行,除非您更新数据库,否则可能会返回零行.

Running delta-import (successfully) will update the {dataimporter.last_index_time} in conf/dataimport.properties file. And next time, your query may run based on the new timestamp, which may return zero rows unless you updated the database.

2.dataimporter.delta.id 和 dataimporter.last_index_time

dataimporter.delta.id 应该是 dih.delta.id

last_index_time 保留在 dataimporter 命名空间中.**dataimporter.last_index_time** 至少适用于 solr 4.2.0.dih.last_index_time 也可能像 solr wiki 中提到的那样工作,但我还没有测试过

last_index_time remains in the dataimporter namespace. **dataimporter.last_index_time** works at least in solr 4.2.0. dih.last_index_time might works too as it was mentioned in the solr wiki, but I haven't test it

3.时间戳需要转换为正确的 DataTime 数据类型取决于数据库.

如果是 SQL 服务器:

In case of SQL server:

LAST_MODIFIED_DATETIME > convert(datetime,'${dataimporter.last_index_time}')

这篇关于Solr 增量导入不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆