正确的记录访问实现 [英] The right record access implementation

查看:56
本文介绍了正确的记录访问实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究索引引擎,特别是Apache Lucene Solr.我们愿意将其用于搜索,但是框架搜索解决的问题之一是行级访问.

Solr不提供现成的记录访问权限:

< ...>在文档级别或通信级别,Solr都不关心安全性.

以及在有关文档级安全性的部分中: http://wiki.apache.org/solr/SolrSecurity#Document_Level_Security

几乎没有建议-使用Manifold CF(高度未记录,并且似乎处于beta之前的阶段),或者编写自己的请求处理程序/搜索组件(该部分标记为存根)-我想稍后这样会对性能产生更大的影响.

所以我认为在该领域没有做太多事情.

在最近发布的4.0版本的Solr中,他们引入了连接两个索引实体的功能.加入似乎是一个好主意,因为我们的框架也进行了加入以了解记录是否可供用户访问.这里的问题是,有时我们在范围内进行内部联接,有时又进行外部联接(取决于乐观性(允许一切不被禁止的一切)或悲观主义(仅允许明确允许的一切禁止)范围内的安全设置).

可以更好地了解我们的结构:

文档

DocumentNr | Name
------------------
1          | Foo
2          | Bar

DocumentRecordAccess

DocumentNr | UserNr | AllowRead | AllowUpdate | AllowDelete
------------------------------------------------------------
1          | 1      | 1         | 1           | 0

例如,对于悲观安全性文档"设置,生成的查询将为:

SELECT * FROM Documents AS d 
INNER JOIN DocumentRecordAccess AS dra ON dra.DocumentNr=d.DocumentNr AND dra.AllowRead=1 AND dra.UserNr=1

这将仅返回foo,而不返回bar.在乐观的环境中:

SELECT * FROM Documents AS d 
LEFT JOIN DocumentRecordAccess AS dra ON dra.DocumentNr=d.DocumentNr AND dra.AllowRead=1 AND dra.UserNr=1

同时返回-Foo和Bar.

回到我的问题-也许有人已经做到了,可以分享他们的见解和经验?

解决方案

恐怕这里没有简单的解决方案.您将不得不付出一些努力才能使ACL与搜索一起工作.

  1. 如果您的语料库很小(我说最多10K个文档),则可以创建一个禁止(或允许,以较不详细的形式为准)文档的缓存位集,并发送相关的过滤器查询(+*:* -DocumentNr:1 ... -DocumentNr:X).不用说,这不会扩展.发送大查询会使搜索速度变慢,但这是可以管理的(当然是可以做到的).查询解析更难的部分将是缓存失效:主应用程序应该以某种方式向Solr应用程序发出信号当ACL数据更改时.

如果您可以将Solr和您的应用程序放在同一JVM上并使用 javabin 驱动程序.

在不了解语料库/ACL细节的情况下很难提供更多建议.

I am looking into indexing engines, specifically Apache Lucene Solr. We are willing to use it for our searches, yet one of the problems solved by our frameworks search is row-level access.

Solr does not provide record access out of the box:

<...> Solr does not concern itself with security either at the document level or the communication level.

And in the section about document level security: http://wiki.apache.org/solr/SolrSecurity#Document_Level_Security

There are few suggestions - either use Manifold CF (which is highly undocumented and seems in a very pre-beta stage) or write your own request handler/search component (that part is marked as stub) - I guess that the later one would have bigger impact on performance.

So I assume not much is being done in this field.

In the recently released 4.0 version of Solr, they have introduced joining two indexed entities. Joining might seem a nice idea, since our framework also does a join to know whether the record is accessible for the user. The problem here is that sometimes we do a inner join, and sometimes and outer (depending on the optimistic (everything what's not forbidden is allowed) or pessimistic (everything is forbidden only what is explicitly allowed) security setting in the scope).

To give a better understanding of what our structure looks like:

Documents

DocumentNr | Name
------------------
1          | Foo
2          | Bar

DocumentRecordAccess

DocumentNr | UserNr | AllowRead | AllowUpdate | AllowDelete
------------------------------------------------------------
1          | 1      | 1         | 1           | 0

So for example the generated query for the Documents in pessimistic security setting would be:

SELECT * FROM Documents AS d 
INNER JOIN DocumentRecordAccess AS dra ON dra.DocumentNr=d.DocumentNr AND dra.AllowRead=1 AND dra.UserNr=1

This would return only the foo, but not the bar. And in optimistic setting:

SELECT * FROM Documents AS d 
LEFT JOIN DocumentRecordAccess AS dra ON dra.DocumentNr=d.DocumentNr AND dra.AllowRead=1 AND dra.UserNr=1

Returning both - the Foo and the Bar.

Coming back to my question - maybe someone has already done this and can share their insight and experience?

解决方案

I am afraid there's no easy solution here. You will have to sacrifice something to get ACLs working together with the search.

  1. If your corpus size is small (I'd say up to 10K documents), you could create a cached bit set of forbidden (or allowed, whichever less verbose) documents and send relevant filter query (+*:* -DocumentNr:1 ... -DocumentNr:X). Needless to say, this doesn't scale. Sending large queries will make the search a bit slower, but this is manageable (up to a point of course). Query parsing is cheap.

  2. If you can somehow group these documents and apply ACLs on document groups, this would allow cutting on query length and the above approach would fit perfectly. This is pretty much what we are using - our solution implements taxonomy and has taxonomy permissions done via fq query.

  3. If you don't need to show the overall result set count, you can run your query and filter the result set on the client side. Again, not perfect.

  4. You can also denormalize your data structures and store both tables flattened in a single document like this:

    DocumentNr: 1
    Name: Foo
    Allowed_users: u1, u2, u3 (or Forbidden_users: ...)

    The rest is as easy as sending user id with your query.

    Above is only viable if the ACLs are rarely changing and you can afford reindexing the entire corpus when they do.

  5. You could write a custom query filter which would have cached BitSets of allowed or forbidden documents by user(group?) retrieved from the database. This would require not only providing DB access for Solr webapp but also extending/repackaging the .war which comes with Solr. While this is relatively easy, the harder part would be cache invalidation: main app should somehow signal Solr app when ACL data gets changed.

Options 1 and 2 are probably more reasonable if you can put Solr and your app onto the same JVM and use javabin driver.

It's hard to advice more without knowing the specifics of the corpus/ACLs.

这篇关于正确的记录访问实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆