与SAS的Hadoop连接 [英] Hadoop connectivity with SAS

查看:800
本文介绍了与SAS的Hadoop连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用SAS / ACESS 9.3M2接口将sas连接到我的Hive。
我的问题是,
sas是否将配置单元导入sas环境并在其中查询?

为了报告的目的,它再次打开配置单元,使其运行MR,从而使我的报告性能降低到超过2-4秒。

如果它将hive表导入其环境,与普通sql立方体相比,它的性能如何?



我对sas完全陌生我希望我的报告2-4秒我汇总的数据在Hive表格中,然后我创建了多维数据集维度。



谢谢...


<解决方案

SAS / ACCESS的服务内容是:
- 为您提供读取数据和写入数据源的能力,处理数据类型转换
- 提供关于数据存储的元数据(表,字段,数据类型列表)
- 提供将(隐式直接传递)SAS代码转换为数据源特定代码的平均值(通常为SQL变体等等)
- 为您提供一种写数据源特定代码的方式,并从SAS发送它执行在数据源中

我完全不熟悉Hadoop :-)所以我只是猜测SAS / Hadoop访问(通过LIBNAME语句)从Hadoop读取关系数据,该文档提到JDBC,所以我想这用于数据访问。
我怀疑SAS / Access能够从Hadoop查询多维数据集(是您的问题吗?
- 我已经创建了多维数据集维度 - 意味着在Hadoop中?)。



通常,SAS / Access会尽量减少数据源的数据传输,并尝试将处理过程推送到数据源。



SAS / ACCESS to Hadoop



SAS / ACCESS提供了以本地方式访问存储在Hadoop中的数据集的功能。使用SAS /访问Hadoop:

  LIBNAME语句可用于使Hive表看起来像SAS数据集,其上SAS程序和SAS DATA步骤可以交互。 
PROC SQL命令提供了在Hadoop上执行直接Hive SQL命令的功能。
PROC HADOOP提供了将SAS执行环境中的MapReduce,Apache Pig和HDFS命令直接提交给CDH群集的功能。



SAS / ACCESS接口可从SAS 9.3M2版本获得,并支持CDH 3U2以及CDH 4.01及更高版本。



也可能有帮助PROC HADOOP在
http://support.sas.com/documentation/cdl/zh-CN/proc/65145/HTML/default/viewer.htm #p1esotuxnkbuepn1w443ueufw8in.htm


I want to use SAS/ACESS 9.3M2 Interface for connecting sas with my Hive. My question is, whether sas imports hive cubes into sas environment and queries there? or, It again hits hive for the purpose of reporting so that it runs MR which degrades my reporting performance to more than 2-4 secs.

If it imports hive tables to its environment what would be its performance when compared to normal sql cubes?

I am totally new to sas i want my reports generated with in 2-4 secs where my aggregated data is in Hive tables and then I have created cube dimensions over that.

Thanks...

解决方案

What SAS/ACCESS serves for is to: - provide you with ability to read data and write from/to a datasource, take care of data type conversions - provides metadata about a datastore (list of tables, fields, datatypes) - provide a mean to (also partially) translate (implicit pass-through) SAS code to datasource specific code (usually SQL variant etc) - provide a mean for you to write a datasource specific code and sent it from SAS for execution in datasource

I'm totally new to Hadoop :-) so I'll just guess that SAS/Access to Hadoop (via LIBNAME statement) reads relational data from Hadoop, the documentation mentions JDBC, so I guess that's used for data access. I'd doubt SAS/Access is able to query the cubes from Hadoop (is that your question? - "I have created cube dimensions over that" - meaning in Hadoop?).

Generally SAS/Access tries to minimize data transfers from datasources and tries to push the processing to the datasource.

From http://blog.cloudera.com/blog/2013/05/how-the-sas-and-cloudera-platforms-work-together:

SAS/ACCESS to Hadoop

SAS/ACCESS provides the ability to access data sets stored in Hadoop in SAS natively. With SAS/Access to Hadoop:

LIBNAME statements can be used to make Hive tables look like SAS data sets on top of which SAS Procedures and SAS DATA steps can interact.
PROC SQL commands provide the ability to execute direct Hive SQL commands on Hadoop.
PROC HADOOP provides the ability to directly submit MapReduce, Apache Pig, and HDFS commands from the SAS execution environment to your CDH cluster.

The SAS/ACCESS interface is available from the SAS 9.3M2 release and supports CDH 3U2 as well as CDH 4.01 and higher.

Also might be helpful PROC HADOOP at http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#p1esotuxnkbuepn1w443ueufw8in.htm

这篇关于与SAS的Hadoop连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
其他开发最新文章
热门教程
热门工具
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆