正确的记录访问实施 [英] The right record access implementation

查看:12
本文介绍了正确的记录访问实施的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究索引引擎,特别是 Apache Lucene Solr.我们愿意将其用于我们的搜索,但我们的框架搜索解决的问题之一是行级访问.

Solr 不提供开箱即用的记录访问:

<块引用>

<...> Solr 不关心文档级别或通信级别的安全性.

在关于文档级安全的部分:http://wiki.apache.org/solr/SolrSecurity#Document_Level_Security

很少有建议 - 要么使用 Manifold CF(这是高度未记录的并且似乎处于非常预测试阶段)或编写您自己的请求处理程序/搜索组件(该部分被标记为存根) - 我猜是后者一个会对性能产生更大的影响.

所以我认为在这个领域做的不多.

在最近发布的 Solr 4.0 版本中,他们引入了连接两个索引实体.加入似乎是一个不错的主意,因为我们的框架还进行了加入以了解用户是否可以访问该记录.这里的问题是,有时我们会进行内连接,有时会进行外连接(取决于范围内的乐观(所有未禁止的内容都是允许的)或悲观的(所有内容都被禁止,只有明确允许的内容)安全设置).

为了更好地了解我们的结构是什么样的:

文档

DocumentNr |名称------------------1 |富2 |酒吧

DocumentRecordAccess

DocumentNr |用户编号 |允许阅读 |允许更新 |允许删除------------------------------------------------------------1 |1 |1 |1 |0

例如,在悲观安全设置中生成的文档查询将是:

SELECT * FROM Documents AS dINNER JOIN DocumentRecordAccess AS dra ON dra.DocumentNr=d.DocumentNr AND dra.AllowRead=1 AND dra.UserNr=1

这将只返回 foo,而不返回 bar.在乐观的环境中:

SELECT * FROM Documents AS dLEFT JOIN DocumentRecordAccess AS dra ON dra.DocumentNr=d.DocumentNr AND dra.AllowRead=1 AND dra.UserNr=1

同时返回 - Foo 和 Bar.

回到我的问题 - 也许有人已经这样做了并且可以分享他们的见解和经验?

解决方案

恐怕这里没有简单的解决方案.您将不得不牺牲一些东西才能让 ACL 与搜索一起工作.

  1. 如果您的语料库很小(我说最多 10K 个文档),您可以创建一个缓存的禁止(或允许,以较不冗长者为准)文档的位集并发送相关的过滤器查询 (+*:* -DocumentNr:1 ... -DocumentNr:X).不用说,这不能扩展.发送大型查询会使搜索速度变慢,但这是可以管理的(当然在一定程度上).查询解析很便宜.

  2. 如果你能以某种方式对这些文档进行分组并对文档组应用 ACL,这将允许减少查询长度,并且上述方法将非常适合.这几乎就是我们正在使用的 - 我们的解决方案实现了分类并通过 fq 查询完成了分类权限.

  3. 如果您不需要显示总体结果集计数,您可以运行查询并在客户端过滤结果集.同样,并不完美.

  4. 您还可以对数据结构进行非规范化处理,并将两个表格平铺在一个文档中,如下所示:

    文档编号:1
    姓名:傅
    Allowed_users: u1, u2, u3 (或 Forbidden_​​users: ...)

    剩下的就像在查询中发送用户 ID 一样简单.

    仅当 ACL 很少发生变化时才可行并且当它们发生变化时,您可以负担得起重新索引整个语料库的费用.

  5. 您可以编写一个自定义查询过滤器,它可以缓存从数据库中检索到的用户(组?)允许或禁止的文档的 BitSet.这不仅需要为 Solr webapp 提供数据库访问,还需要扩展/重新打包 Solr 附带的 .war.虽然这相对容易,但更难的部分是缓存失效:主应用程序应该以某种方式向 Solr 应用程序发出信号当 ACL 数据发生更改时.

如果您可以将 Solr 和您的应用程序放在同一个 JVM 上并使用 javabin 驱动程序.

如果不了解语料库/ACL 的具体情况,很难给出更多建议.

I am looking into indexing engines, specifically Apache Lucene Solr. We are willing to use it for our searches, yet one of the problems solved by our frameworks search is row-level access.

Solr does not provide record access out of the box:

<...> Solr does not concern itself with security either at the document level or the communication level.

And in the section about document level security: http://wiki.apache.org/solr/SolrSecurity#Document_Level_Security

There are few suggestions - either use Manifold CF (which is highly undocumented and seems in a very pre-beta stage) or write your own request handler/search component (that part is marked as stub) - I guess that the later one would have bigger impact on performance.

So I assume not much is being done in this field.

In the recently released 4.0 version of Solr, they have introduced joining two indexed entities. Joining might seem a nice idea, since our framework also does a join to know whether the record is accessible for the user. The problem here is that sometimes we do a inner join, and sometimes and outer (depending on the optimistic (everything what's not forbidden is allowed) or pessimistic (everything is forbidden only what is explicitly allowed) security setting in the scope).

To give a better understanding of what our structure looks like:

Documents

DocumentNr | Name
------------------
1          | Foo
2          | Bar

DocumentRecordAccess

DocumentNr | UserNr | AllowRead | AllowUpdate | AllowDelete
------------------------------------------------------------
1          | 1      | 1         | 1           | 0

So for example the generated query for the Documents in pessimistic security setting would be:

SELECT * FROM Documents AS d 
INNER JOIN DocumentRecordAccess AS dra ON dra.DocumentNr=d.DocumentNr AND dra.AllowRead=1 AND dra.UserNr=1

This would return only the foo, but not the bar. And in optimistic setting:

SELECT * FROM Documents AS d 
LEFT JOIN DocumentRecordAccess AS dra ON dra.DocumentNr=d.DocumentNr AND dra.AllowRead=1 AND dra.UserNr=1

Returning both - the Foo and the Bar.

Coming back to my question - maybe someone has already done this and can share their insight and experience?

解决方案

I am afraid there's no easy solution here. You will have to sacrifice something to get ACLs working together with the search.

  1. If your corpus size is small (I'd say up to 10K documents), you could create a cached bit set of forbidden (or allowed, whichever less verbose) documents and send relevant filter query (+*:* -DocumentNr:1 ... -DocumentNr:X). Needless to say, this doesn't scale. Sending large queries will make the search a bit slower, but this is manageable (up to a point of course). Query parsing is cheap.

  2. If you can somehow group these documents and apply ACLs on document groups, this would allow cutting on query length and the above approach would fit perfectly. This is pretty much what we are using - our solution implements taxonomy and has taxonomy permissions done via fq query.

  3. If you don't need to show the overall result set count, you can run your query and filter the result set on the client side. Again, not perfect.

  4. You can also denormalize your data structures and store both tables flattened in a single document like this:

    DocumentNr: 1
    Name: Foo
    Allowed_users: u1, u2, u3 (or Forbidden_users: ...)

    The rest is as easy as sending user id with your query.

    Above is only viable if the ACLs are rarely changing and you can afford reindexing the entire corpus when they do.

  5. You could write a custom query filter which would have cached BitSets of allowed or forbidden documents by user(group?) retrieved from the database. This would require not only providing DB access for Solr webapp but also extending/repackaging the .war which comes with Solr. While this is relatively easy, the harder part would be cache invalidation: main app should somehow signal Solr app when ACL data gets changed.

Options 1 and 2 are probably more reasonable if you can put Solr and your app onto the same JVM and use javabin driver.

It's hard to advice more without knowing the specifics of the corpus/ACLs.

这篇关于正确的记录访问实施的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆