Solr 增量导入不起作用 [英] Solr delta-import not working
问题描述
完全导入和deletedPkQuery有效,我已经跟踪了数据库服务器,deltaQuery 和deletedPkQuery 都被执行了.
Full import and deletedPkQuery works , I've traced the database server both the deltaQuery and deletedPkQuery are executed.
我多次手动执行这些查询,它们确实返回了行,但是
ive executed these queries manually many times and they do indeed return row(s), but
它不获取任何行.我做的最后一件事是在所有查询中输出 FILE_ID 作为 id.还是不行.
It does not fetch any rows. Last thing i did was to output the FILE_ID as id on all the queries. Still doesnt work.
<dataConfig>
<dataSource name="db" type="JdbcDataSource" driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://localhost:1433;databaseName=norway_operations;responseBuffering=adaptive;selectMethod=cursor" user="noropuser" password="noropuser" autoCommit="false" transactionIsolation="TRANSACTION_READ_COMMITTED"holdability="CLOSE_CURSORS_AT_COMMIT"/>
<dataSource name="bin" type="BinFileDataSource" basePath="D:OPG_FILESTORE"/>
<document>
<entity name="file" dataSource="db" pk="id" query="select FILE_ID as id, CATEGORY_ID, CATEGORY_NAME, FILENAME, FILE_MIME_TYPE, PATH, LAST_MODIFIED as last_modified from DOCUMENTS"
deltaQuery="select FILE_ID as id from DOCUMENTS where LAST_MODIFIED > '${dataimporter.last_index_time}'"
deltaImportQuery="select FILE_ID as id, CATEGORY_ID, CATEGORY_NAME, FILENAME, FILE_MIME_TYPE, PATH, LAST_MODIFIED as last_modified from DOCUMENTS where FILE_ID = '${dih.delta.id}'"
deletedPkQuery="delete from PK_DELETE_HISTORY output DELETED.PK AS id where PK_NAME = 'FILE_ID'" >
<field column="id" name="id" />
<field column="CATEGORY_ID" name="categoryId" />
<field column="CATEGORY_NAME" name="category" />
<field column="FILENAME" name="filename" />
<field column="FILE_MIME_TYPE" name="content_type" />
<field column="last_modified" name="last_modified" />
<entity name="tika" processor="TikaEntityProcessor" url="${file.PATH}" parser="org.apache.tika.parser.AutoDetectParser" format="text" dataSource="bin" onError="continue">
<field column="text" name="content" />
<field column="title" name="title"/>
<field column="subject" name="subject"/>
<field column="description" name="description"/>
<field column="comments" name="comments"/>
<field column="author" name="author"/>
<field column="keywords" name="keywords"/>
<field column="url" name="url"/>
<field column="content_type" name="content_type" />
<field column="links" name="links" />
</entity>
</entity>
</document>
跟踪
declare @p1 int
set @p1=180150003
declare @p5 int
set @p5=-1
exec sp_cursoropen @p1 output,N'select FILE_ID as id from DOCUMENTS where LAST_MODIFIED > ''2014-02-06 15:02:40''',16,8193,@p5 output
select @p1, @p5
当我手动运行它时,它返回 1 行
When i run this manually it returns 1 row
回复:
<?xml version="1.0" encoding="UTF-8" ?>
- <response>
- <lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">31</int>
</lst>
- <lst name="initArgs">
- <lst name="defaults">
<str name="config">db-data-config.xml</str>
<int name="rows">0</int>
<int name="start">0</int>
</lst>
</lst>
<str name="command">delta-import</str>
<str name="mode">debug</str>
<arr name="documents" />
<lst name="verbose-output" />
<str name="status">idle</str>
<str name="importResponse" />
- <lst name="statusMessages">
<str name="Total Requests made to DataSource">2</str>
<str name="Total Rows Fetched">0</str>
<str name="Total Documents Skipped">0</str>
<str name="Delta Dump started">2014-02-06 15:32:20</str>
<str name="Identifying Delta">2014-02-06 15:32:20</str>
<str name="Deltas Obtained">2014-02-06 15:32:20</str>
<str name="Building documents">2014-02-06 15:32:20</str>
<str name="Total Changed Documents">0</str>
<str name="Total Documents Processed">0</str>
<str name="Time taken">0:0:0.16</str>
</lst>
<str name="WARNING">This response format is experimental. It is likely to change in the future.</str>
</response>
推荐答案
可能值得寻找的东西:
1.时间戳保存在dataimport.properties
配置文件
1.Timestamp saved in dataimport.properties
config file
这发生在我身上
运行 delta-import(成功)将更新 conf/dataimport.properties 文件中的 {dataimporter.last_index_time}.下一次,您的查询可能会基于新的时间戳运行,除非您更新数据库,否则可能会返回零行.
Running delta-import (successfully) will update the {dataimporter.last_index_time} in conf/dataimport.properties file. And next time, your query may run based on the new timestamp, which may return zero rows unless you updated the database.
2.dataimporter.delta.id 和 dataimporter.last_index_time
dataimporter.delta.id 应该是 dih.delta.id
last_index_time 保留在 dataimporter 命名空间中.**dataimporter.last_index_time** 至少适用于 solr 4.2.0.dih.last_index_time 也可能像 solr wiki 中提到的那样工作,但我还没有测试过
last_index_time remains in the dataimporter namespace. **dataimporter.last_index_time** works at least in solr 4.2.0. dih.last_index_time might works too as it was mentioned in the solr wiki, but I haven't test it
3.时间戳需要转换为正确的 DataTime 数据类型取决于数据库.
如果是 SQL 服务器:
In case of SQL server:
LAST_MODIFIED_DATETIME > convert(datetime,'${dataimporter.last_index_time}')
这篇关于Solr 增量导入不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!