无法删除Solr键 [英] Can't delete Solr keys

查看:160
本文介绍了无法删除Solr键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

无法从Solr集合中删除文件中的键。



使用此更新Solr集合:

 < cfoutput query =fileQuery> 
< cfset theFile = defaultpath& #fileID#.pdf/>

< cfif fileExists(theFile)>
< cfindex
action =update
collection =file_vault_solr
type =file
key =#theFile#
=#documentName#
body =fileNumber,documentName
custom1 =/ filevault /#filealias#
custom2 =#fileNumber#
custom3 =# documentName#
>
< / cfif>
< / cfoutput>

但是,当试图从目录中删除密钥时,它根本不工作。下面是用于(尝试)删除键的代码:

 < cfoutput query =deletedFile> 
< cfset theFile = defaultpath& #fileID#.pdf/>

<!---从集合中删除已删除的文件。 --->
< cfindex
collection =file_vault_solr
type =file
action =Delete
key =#theFile#
;
< / cfoutput>

但是,键未被删除。唯一有用的是清除整个目录并重新索引所有文档。



任何见解?

解决方案

经过很多调试后,我发现了。



这种行为的原因是非常... uh ... unfortunate uhm ... Adob​​e在实现ColdFusion和Solr之间的接口时采用的设计决策。



所以你有一个索引文件的索引集合,并希望有选择地清除那些不再存在于磁盘上的文件。我相信这是你遇到的确切情况。



让我们假设:




  • 在您的系统上有一个名为 / path / to / file 的文件,

  • <

,ColdFusion向Solr发送以下HTTP请求:

  POST /solr/foo/update?wt=xml&version=2.2(application / xml; charset = UTF-8)
< delete> ; id> 1247603285< / id>< / delete>

这是一个完全合理的请求,Solr会愉快地满足。唯一奇怪的是< id> 中的数字。在任何情况下,此操作后文件将从索引中删除。



重新建立文件索引并从磁盘中删除。现在:




  • 不再有一个名为 / path / to / file
  • 中索引。


让我们再次进行< cfindex action =delete> / p>

  POST /solr/foo/update?wt=xml&version=2.2(application / xml; charset = UTF-8)
< delete>< id> / path / to / file< / id>< / delete>

Huh?在ID中不应该有一个数字?



事实证明,Adobe的人认为使用数字的唯一ID索引是一个快乐的聪明的主意文件,到,uhhh,保存空间,我假设。



是因为某些莫名的原因,这只会在有问题的文件仍然存在时才会发生。如果它不存在了,ColdFusion会注意到并传递路径。



检查数字表明它适合32位有符号整数值。 (我检查过,在 uid 字段中有很多负值。)



这看起来好像他们使用某种哈希算法,返回32位和卡在int。 CRC32弹出头脑,但不是这样。此外, java.util.zip.CRC32 会返回 long ,因此不会有任何负值



Java中另一个容易获得的32位哈希是... java.lang.Object.hashCode()



Bingo。

 / path / to / file.hashCode()//  - > 1247603285 

所以解决方案是永远不会删除文件的路径,但总是这样: p>

 < cfindex collection =fooaction =deletekey =#path.hashCode()#> 

对于不再存在的文件,这是正确的。



更重要的是:对于仍然存在的文件,这样做也是正确的 - ColdFusion会发送哈希码。





请注意,文件路径区分大小写,并且必须与索引中存储的路径完全匹配。



快速

 < cfsearch collection =foo =foo> 

没有任何条件将返回所有索引条目,因此检索孤立条目的确切路径并不是一个大问题。






Eric Lippert解释对象哈希码,为什么它是一个坏主意使用它们在应用程序中的任何实用它是一篇.NET文章,但同样适用于Java。



可以归结为:Adobe应该在Solr集合中存储实际的路径,并保留他们似乎尝试过Solr的性能优化。






我已提交 Bug 3589991 反对Adobe的ColdFusion错误数据库。


Having trouble deleting keys from a Solr collection for files.

Updating the Solr collection with this:

<cfoutput query="fileQuery">
  <cfset theFile = defaultpath & "#fileID#.pdf" />

  <cfif fileExists(theFile)>
    <cfindex
      action="update"
      collection="file_vault_solr"
      type="file"
      key="#theFile#"
      title="#documentName#"
      body="fileNumber,documentName"
      custom1="/filevault/#filealias#"
      custom2="#fileNumber#"
      custom3="#documentName#"
    >
  </cfif>
</cfoutput>

However, when attempting to delete the key from the catalog it simply doesn't work. Here's the code being used to (try to) delete the keys:

<cfoutput query="deletedFile">
  <cfset theFile = defaultpath & "#fileID#.pdf" />

  <!--- Remove the deleted file from the collection. --->
  <cfindex
    collection="file_vault_solr"
    type="file"
    action="Delete"
    key="#theFile#"
  >
</cfoutput>

The key is not deleted, however. The only thing that has worked has been to purge the whole catalog and re-index all of the documents.

Any insights?

解决方案

After a lot of debugging I found out.

The reason for this behavior is a very… uh… unfortunate uhm… "design decision" Adobe took when implementing the interface between ColdFusion and Solr.

So you have a Solr collection of indexed files and want to selectively purge the ones that do no longer exist on disk. I'm pretty sure that's the exact situation you've been in.

Let's assume:

  • there is a file called /path/to/file on your system and
  • it is indexed in the Solr collection foo.

When you issue a <cfindex collection="foo" action="delete" key="/path/to/file">, ColdFusion sends the following HTTP request to Solr:

POST /solr/foo/update?wt=xml&version=2.2 (application/xml; charset=UTF-8)
<delete><id>1247603285</id></delete>

This is a perfectly reasonable request that Solr will happily fulfill. The only strange thing is the number in the <id>. In any case, the file will be gone from the index after this operation.

Re-index the file and delete it from disk. Now:

  • there no longer is a file called /path/to/file on your system, but
  • it is still indexed in the Solr collection foo.

Let's do the same <cfindex action="delete"> operation again.

POST /solr/foo/update?wt=xml&version=2.2 (application/xml; charset=UTF-8)
<delete><id>/path/to/file</id></delete>

Huh? Shouldn't there be a number in the ID?

As it turns out, someone at Adobe thought it would be a jolly smart idea to use numbers for unique IDs of indexed files, to, uhhh, save space, I assume.

However for some inexplicable reason this only happens when the file in question still exists. If it does not exist anymore, ColdFusion will notice and pass the path instead.

Inspecting the number reveals that it would fit into a 32 bit signed integer value. (I've checked, there are plenty of negative values in the uid field of the collection.)

So this looks as if they use some kind of hashing algorithm that returns 32 bits and chuck that in a int. CRC32 springs to mind, but that's not it. Also, java.util.zip.CRC32 returns a long, so there wouldn't be any negative values in the first place.

The other readily available 32 bit hash in Java is ... java.lang.Object.hashCode().

Bingo.

"/path/to/file".hashCode() // -> 1247603285

So the solution is to never delete a file by its path, but always like this:

<cfindex collection="foo" action="delete" key="#path.hashCode()#">

For files that no longer exist this does the right thing.

More importantly: For files that still exist this does the right thing as well - ColdFusion would have sent the hash code anyway.

Until Adobe fixes this problem this is a safe and easy work-around.

Note that the file path is case sensitive and must match exactly with the one stored in the index.

A quick

<cfsearch collection="foo" name="foo">

without any criteria will return all index entries, so retrieving the exact path of orphaned entries it not a big problem.


Eric Lippert explains object hash codes and why it is a bad idea to use them for anything "practical" in an application It's a .NET article but applies to Java just as well.

It boils down to: Adobe should store the actual path in the Solr collection and leave the performance optimization they seem to have attempted to Solr.


I've filed Bug 3589991 against Adobe's ColdFusion bug database.

这篇关于无法删除Solr键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆