如何访问天青datalake使用webhdfs API [英] How to access Azure datalake using the webhdfs API

查看:247
本文介绍了如何访问天青datalake使用webhdfs API的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们才刚刚开始评估在Azure中datalake服务。我们创造了我们的湖泊,并通过门户网站,我们可以看到该服务的两个公共的URL。 (一个是https://开头方案,另一个是ADL://方案)

We're just getting started evaluating the datalake service at Azure. We created our lake, and via the portal we can see the two public URLs for the service. (One is an https:// scheme, the other an adl:// scheme)

该datalake文档指出确实有两个接口:webHDFS REST API和ADL。所以,我假设的https://开头方案让我的wehHDFS接口。不过,我可以找到关于使用这个接口在Azure中没有更多的信息。

The datalake documentation states that there are indeed two interfaces: webHDFS REST API, and ADL. So, I am assuming the https:// scheme gets me the wehHDFS interface. However, I can find no more information at Azure about using this interface.

我试过在给定的HTTPS戳:// URL,与Web浏览器和卷曲。该服务响应。回帖是JSON,这是符合市场预期,因为datalake是Hadoop的一个实例。不过,我似乎无法让我的文件[我上传到通过门户网站我们的湖]的访问。

I tried poking at the given https:// URL, with web browser and curl. The service is responding. Replies are JSON, which is as expected, since a datalake is an instance of Hadoop. However, I cannot seem to get access to my files [which I uploaded into our lake via the portal].

如果我做一个GET为/foo.txt,例如,得到的答复是一个错误,ResourceNotFound。

If I do a GET to "/foo.txt", for example, the reply is an error, ResourceNotFound.

如果我不使用典型的Hadoop HDFS语法中,/webhdfs/v1/foo.txt一个GET,得到的答复是一个错误,AuthenticationFailed。附加文本表示缺少访问令牌。这似乎更有前途。但是,找不到有关生成这样一个访问令牌东西。

If I do a GET using the typical Hadoop HDFS syntax, "/webhdfs/v1/foo.txt", the reply is an error, AuthenticationFailed. Additional text indicates a missing access token. This seems more promising. However, can't find anything about generating such an access token.

有是在使用了ADL接口,.NET和Visual Studio的一些文档,但是这不是我想要的,最初。

There is some documentation on using the ADL interface, and .NET and Visual Studio, but this is not what I want, initially.

任何帮助非常AP preciated!

Any help much appreciated!

推荐答案

我要感谢<一个href=\"https://social.msdn.microsoft.com/Forums/azure/en-US/cd7dee04-19a4-4304-8e2c-20c70bc8a5b9/access-to-adl-store-by-webhdfs?forum=AzureDataLake\"相对=nofollow>这个论坛帖子由马修·希克斯其中概述了如何使用卷曲。我把它包裹它在PowerShell中。我敢肯定有很多方法可以做到这一点,但这里的一个工程。

I am indebted to this forum post by Matthew Hicks which outlined how to do this with curl. I took it and wrapped it in PowerShell. I'm sure there are many ways to accomplish this, but here's one that works.

首先设置一个AAD申请这样就可以填补在CLIENT_ID和client_secret下面提到。 (假定你想而自动执行此不必交互式登录。如果你想要一个交互式登录,那么就在论坛上发帖上述这种做法的链接。)

First setup an AAD application so that you can fill in the client_id and client_secret mentioned below. (That assumes you want to automate this rather than having an interactive login. If you want an interactive login, then there's a link to that approach in the forum post above.)

然后在设置填写的第5行并运行以下PowerShell脚本:

Then fill in the settings in the first 5 lines and run the following PowerShell script:

$client_id = "<client id>";
$client_secret = "<secret>";
$tenant = "<tenant>";
$adlsAccount = "<account>";
cd D:\path\to\curl

#authenticate
$cmd = { .\curl.exe -X POST https://login.microsoftonline.com/$tenant/oauth2/token  -F grant_type=client_credentials       -F resource=https://management.core.windows.net/       -F client_id=$client_id       -F client_secret=$client_secret };
$responseToken = Invoke-Command -scriptblock $cmd;
$accessToken = (ConvertFrom-Json $responseToken).access_token;

#list root folders
$cmd = {.\curl.exe -X GET -H "Authorization: Bearer $accessToken" https://$adlsAccount.azuredatalakestore.net/webhdfs/v1/?op=LISTSTATUS };
$foldersResponse = Invoke-Command -scriptblock $cmd;
#loop through directories directories
(ConvertFrom-Json $foldersResponse).FileStatuses.FileStatus | ForEach-Object { $_.pathSuffix }

#list files in one folder
$cmd = {.\curl.exe -X GET -H "Authorization: Bearer $accessToken" https://$adlsAccount.azuredatalakestore.net/webhdfs/v1/weather/?op=LISTSTATUS };
$weatherResponse = Invoke-Command -scriptblock $cmd;
(ConvertFrom-Json $weatherResponse).FileStatuses.FileStatus | ForEach-Object { $_.pathSuffix }

#download one file
$cmd = {.\curl.exe -L "https://$adlsAccount.azuredatalakestore.net/webhdfs/v1/weather/2007small.csv?op=OPEN" -H "Authorization: Bearer $accessToken" -o d:\temp\curl\2007small.csv };
Invoke-Command -scriptblock $cmd;


#upload one file
$cmd = {.\curl.exe -i -X PUT -L "https://$adlsAccount.azuredatalakestore.net/webhdfs/v1/weather/new2007small.csv?op=CREATE" -T "D:\temp\weather\smallcsv\new2007small.csv" -H "Authorization: Bearer $accessToken" };
Invoke-Command -scriptblock $cmd;

这篇关于如何访问天青datalake使用webhdfs API的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆