Spark SQL安全性注意事项 [英] Spark SQL security considerations

查看:144
本文介绍了Spark SQL安全性注意事项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

接受和执行任意spark SQL查询时有哪些安全注意事项?

What are the security considerations when accepting and executing arbitrary spark SQL queries?

想象一下以下设置:

hdfs上的两个文件被注册为表a_secretsb_secrets:

Two files on hdfs are registered as tables a_secrets and b_secrets:

# must only be accessed by clients with access to all of customer a' data
spark.read.csv("/customer_a/secrets.csv").createTempView("a_secrets")

# must only be accessed by clients with access to all of customer b's data
spark.read.csv("/customer_b/secrets.csv").createTempView("b_secrets")

在这两个视图中,我可以使用简单的hdfs文件权限进行保护.但是说一下,我想对这些表具有以下逻辑视图:

These two views, I could secure using simple hdfs file permissions. But say I have the following logical views of these tables, that I'd like to expose:

# only access for clients with access to customer a's account no 1
spark.sql("SELECT * FROM a_secrets WHERE account = 1").createTempView("a1_secrets")

# only access for clients with access to customer a's account no 2
spark.sql("SELECT * FROM a_secrets WHERE account = 2").createTempView("a2_secrets")


# only access for clients with access to customer b's account no 1
spark.sql("SELECT * FROM b_secrets WHERE account = 1").createTempView("b1_secrets")

# only access for clients with access to customer b's account no 2
spark.sql("SELECT * FROM b_secrets WHERE account = 2").createTempView("b2_secrets")

现在假设我收到一个任意的(user, pass, query)集.我得到了用户可以访问的帐户列表:

Now assume I receive an arbitrary (user, pass, query) set. I get a list of accounts the user can access:

groups = get_groups(user, pass)

并提取用户查询的逻辑查询计划:

and extract the logical query plan of the user's query:

spark.sql(query).explain(true)

根据以下内容为我提供了一个查询计划(该确切的查询计划已组成)

giving me a query plan along the lines of (this exact query plan is made up)

== Analyzed Logical Plan ==
account: int, ... more fields
Project [account#0 ... more fields]
+- SubqueryAlias a1_secrets
   +- Relation [... more fields]
      +- Join Inner, (some_col#0 = another_col#67)
         :- SubqueryAlias a2_secrets
         :  +- Relation[... more fields] csv
== Physical Plan ==
... InputPaths: hdfs:/customer_a/secrets.csv ...

假设我可以解析一个逻辑查询计划来确定要访问的表和文件的确切位置,授予对查询产生的数据的访问权限是否安全?我正在考虑潜在的问题,例如:

Assuming I can parse a logical query plan to determine exactly which tables and files are being accessed, is it safe to grant access to the data produced by the query? I'm thinking of potential problems like:

  • 有没有什么方法可以访问已注册的表而又不会出现在逻辑查询计划中?
  • 是否有通过纯Spark SQL加载新数据并将其注册为表的方法? (输入spark.sql(1))?
  • 用户是否可以访问具有副作用(修改或访问未授权数据)的任何sql函数?
  • 有没有办法仅通过spark.sql(1)注册UDF/执行任意代码?
  • Are there ways to access registered tables without them showing up in a logical query plan?
  • Are the ways to load new data and register it as tables through pure spark SQL? (input to spark.sql(1))?
  • Do users have access to any sql functions with side effects (that modifies or accesses unathorized data)?
  • Are there ways to register UDFs/execute arbitrary code purely through spark.sql(1)?

总结一下:我可以安全地接受任意SQL,在df = spark.sql(1)中注册它,使用df.explain(True)分析数据访问,然后使用例如返回结果. df.collect()?

To summarise: Can I safely accept arbitrary SQL, register it with df = spark.sql(1), analyse data access using df.explain(True), and then return results using e.g. df.collect()?

修改: - 1月23日15:29:经过修改,在

推荐答案

TL; DR 永远不要在Spark集群上执行任何不受信任的代码.

TL;DR You should never execute any untrusted code on your Spark cluster.

是否有通过纯Spark SQL加载新数据并将其注册为表的方法?

Are the ways to load new data and register it as tables through pure spark SQL?

.可以使用sql方法执行CREATE TABLE,因此只要用户有权访问文件系统,他们就可以创建表.

Yes. CREATE TABLE can be executed using sql method so if as long as users have permissions to access filesystem they can create tables.

是否有方法仅通过spark.sql(1)注册UDF/执行任意代码?

Are there ways to register UDFs/execute arbitrary code purely through spark.sql(1)?

,只要他们可以控制可以用SQL修改的类路径.

Yes, as long they can control classpath which, can be modified with SQL.

spark.sql("""add jar URI""")

用户是否可以访问具有副作用(修改或访问未授权数据)的任何sql函数?

Do users have access to any sql functions with side effects (that modifies or accesses unathorized data)?

有效地(通过扩展上一点).

Effectively yes (by extension of the previous point).

我可以安全地接受任意SQL吗

Can I safely accept arbitrary SQL,

.

这篇关于Spark SQL安全性注意事项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆