使用Azure函数调用REST API并将结果保存在Azure Data Lake Gen2中 [英] Using Azure Functions to call REST API and save results in Azure Data Lake gen2
问题描述
我想调用rest api并将结果另存为Azure Data Lake Gen2中的csv或json文件. 根据我已阅读的内容,Azure Functions是解决之道.
I want to call a rest api and save the results as a csv or json file in Azure Data Lake Gen2. Based on what I have read Azure Functions is the way to go.
Web服务返回的数据类似于以下格式:
The webservice returns data like the following format:
"ID","ProductName","Company"
"1","Apples","Alfreds futterkiste"
"2","Oranges","Alfreds futterkiste"
"3","Bananas","Alfreds futterkiste"
"4","Salad","Alfreds futterkiste"
...next rows
我已经用C#编写了一个控制台应用程序,此刻将数据输出到控制台.该Web服务使用分页并返回1000行(由#参数决定,最大值为1000).在第一个请求之后,我可以使用& next-parameter根据ID提取接下来的1000行.例如网址
I have written a console app in C# which at the moment outputs the data to a console. The webservice uses pagination and returns 1000 rows (determined by the &num-parameter with a max of 1000). After the first request i can use the &next-parameter to fetch the next 1000 rows based on ID. For instance the url
http://testWebservice123.com/Example.csv?auth=abc&number=1000&next=1000
会让我得到ID从1001到2000的行. (实际上,API和分页的调用要复杂一些,因此我无法使用例如Azure Data Factory_v2加载到Azure Data Lake-这就是为什么我认为我需要Azure函数的原因-除非我忽略了另一个功能servic ??.因此,下面仅是一个演示示例,用于学习如何写入Azure Data Lake.)
will get me rows from ID 1001 to 2000. (the call of the API and the pagination in reality is a bit more complex and thus I cannot use for instance Azure Data Factory_v2 to do the load to Azure Data Lake - this is why I think i need Azure Functions - unless I have overlooked another servic??. So the following is just a demo to learn how to write to Azure Data Lake.)
我有以下C#:
static void Main(string[] args)
{
string startUrl = "http://testWebservice123.com/Example.csv?auth=abc&number=1000";
string url = "";
string deltaRequestParameter = "";
string lastLine;
int numberOfLines = 0;
do
{
url = startUrl + deltaRequestParameter;
WebClient myWebClient = new WebClient();
using (Stream myStream = myWebClient.OpenRead(url))
{
using (StreamReader sr = new StreamReader(myStream))
{
numberOfLines = 0;
while (!sr.EndOfStream)
{
var row = sr.ReadLine();
var values = row.Split(',');
//do whatever with the rows by now - i.e. write to console
Console.WriteLine(values[0] + " " + values[1]);
lastLine = values[0].Replace("\"", ""); //last line in the loop - get the last ID.
numberOfLines++;
deltaRequestParameter = "&next=" + lastLine;
}
}
}
} while (numberOfLines == 1001); //since the header is returned each time the number of rows will be 1001 until we get to the last request
}
我想以最有效的方式将数据写入csv文件到数据湖. 我将如何重写以上代码以在Azure Function中工作并保存到Azure数据湖gen2中的csv中?
I want to write the data to a csv-file to the data-lake in the most effective way. How would I rewrite the above code to work in Azure Function and save to a csv in Azure data lake gen2?
推荐答案
以下是获得结果所需要做的步骤:
Here are the steps which you need to do for achieving the result:
1)创建一个azure函数并触发,您可以根据需要保留它HTTPTrigger/TimerTrigger.
1) Create an azure function and trigger you can keep it HTTPTrigger/TimerTrigger, or as per your need.
2)我假设您有循环调用api的代码,直到获得所需的结果为止.
2) I am assuming you have the code to call api in loop until it gives you desired result.
3)一旦将数据存储在内存中,就必须编写以下代码以将其写入Azure数据湖.
3) Once you have the Data in memory , you have to write following code to write it in Azure data lake.
使用c#代码访问ADLS的前提条件:
1)在Azure AD中注册应用
1) Register an app in Azure AD
在数据湖存储中授予许可
Grant permission in data lake store
下面是用于创建ADLS客户端的代码.
Below is the code for creating ADLS client.
// ADLS connection
var adlCreds = GetCreds_SPI_SecretKey(tenantId, ADL_TOKEN_AUDIENCE, serviceAppIDADLS, servicePrincipalSecretADLS);
var adlsClient = AdlsClient.CreateClient(adlsName, adlCreds);
private static ServiceClientCredentials GetCreds_SPI_SecretKey(string tenant,Uri tokenAudience,string clientId,string secretKey)
{
SynchronizationContext.SetSynchronizationContext(new SynchronizationContext());
var serviceSettings = ActiveDirectoryServiceSettings.Azure;
serviceSettings.TokenAudience = tokenAudience;
var creds = ApplicationTokenProvider.LoginSilentAsync(tenant,clientId,secretKey,serviceSettings).GetAwaiter().GetResult();
return creds;
}
最后编写将文件保存在Azure数据湖中的实现
Finally write the implementation to save the file in Azure data lake
const string delim = ",";
static string adlsInputPath = ConfigurationManager.AppSettings.Get("AdlsInputPath");
public static void ProcessUserProfile(this SampleProfile, AdlsClient adlsClient, string fileNameExtension = "")
{
using (MemoryStream memStreamProfile = new MemoryStream())
{
using (TextWriter textWriter = new StreamWriter(memStreamProfile))
{
string profile;
string header = Helper.GetHeader(delim, Entities.FBEnitities.Profile);
string fileName = adlsInputPath + fileNameExtension + "/profile.csv";
adlsClient.DataLakeFileHandler(textWriter, header, fileName);
profile = socialProfile.UserID
+ delim + socialProfile.Profile.First_Name
+ delim + socialProfile.Profile.Last_Name
+ delim + socialProfile.Profile.Name
+ delim + socialProfile.Profile.Age_Range_Min
+ delim + socialProfile.Profile.Age_Range_Max
+ delim + socialProfile.Profile.Birthday
;
textWriter.WriteLine(profile);
textWriter.Flush();
memStreamProfile.Flush();
adlsClient.DataLakeUpdateHandler(fileName, memStreamProfile);
}
}
}
希望有帮助.
这篇关于使用Azure函数调用REST API并将结果保存在Azure Data Lake Gen2中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!