如何使用Data Factory创建Azure点播高清见解Spark群集 [英] How to create Azure on demand HD insight Spark cluster using Data Factory

查看:61
本文介绍了如何使用Data Factory创建Azure点播高清见解Spark群集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Azure数据工厂使用Hdi版本3.5创建按需高清见解Spark群集.数据工厂拒绝创建并显示错误消息

I am trying to use Azure Data factory to create an on demand HD insight Spark cluster using Hdi Version 3.5. The data factory is refusing to create with an error message

HdiVersion:"3.5"不受支持

HdiVersion:'3.5' is not supported

如果当前无法创建点播高清洞察火花集群,那么另一个明智的选择是什么?对于我来说,为什么Microsoft尚未向Azure数据工厂添加按需高清见解Spark群集,这让我感到很奇怪.

If currently there is no way of creating an on Demand HD insight spark cluster, then what is the other sensible option? It seems very strange to me why Microsoft hasn't added an on Demand HD insight Spark Cluster to the Azure Data factory.

推荐答案

这里是一个完整的解决方案,它使用ADF计划C#中的Custom .NET活动,该活动随后使用ARM模板和SSH.NET来执行.运行R脚本的命令.

Here is a full solution, which uses ADF to schedule a Custom .NET activity in C#, which in turn uses the ARM templates, and SSH.NET to execute the command which runs the R script.

因此,ADF用于调度.NET活动,批处理服务用于运行dll中的代码,然后将HDInsight群集的json模板文件放在blob中,并可以根据需要进行配置.

So, ADF is used to schedule the .NET Activity, the Batch service is used to run the code in the dll, then the json template file for the HDInsight cluster is sored in blob and can be configured as needed.

完整的描述在文章"

The full description is in the article "Automating Azure: Creating an On-Demand HDInsight Cluster", but here is the C# code which is the essence of the automation (everything else is just admin work to setup the bits):

using System;
  using System.Collections.Generic;
using Microsoft.Azure.Management.DataFactories.Models;
using Microsoft.Azure.Management.DataFactories.Runtime;

using Microsoft.Azure.Management.ResourceManager.Fluent;

using Microsoft.Azure.Management.ResourceManager.Fluent.Core;

 using Renci.SshNet;

 namespace VM

 {
  public class StartVM : IDotNetActivity
  {
      private IActivityLogger _logger;
      public IDictionary<string, string> Execute(
          IEnumerable<LinkedService> linkedServices,
          IEnumerable<Dataset> datasets,
          Activity activity,
          IActivityLogger logger)
      {
          _logger = logger;
          _logger.Write("Starting execution...");
          var credentials = SdkContext.AzureCredentialsFactory.FromServicePrincipal(
               "" // enter clientId here, this is the ApplicationID
               , "" // this is the Application secret key
               , "" // this is the tenant id 
               , AzureEnvironment.AzureGlobalCloud);
          var azure = Microsoft.Azure.Management.Fluent.Azure
              .Configure()
              .WithLogLevel(HttpLoggingDelegatingHandler.Level.Basic)
              .Authenticate(credentials)
              .WithDefaultSubscription();
          var groupName = "myResourceGroup";
          var location = Region.EuropeNorth;
          // create the resource group
          var resourceGroup = azure.ResourceGroups.Define(groupName)
              .WithRegion(location)
              .Create();
          // deploy the template
          var templatePath = "https://myblob.blob.core.windows.net/blobcontainer/myHDI_template.JSON";
          var paramPath = "https:// myblob.blob.core.windows.net/blobcontainer /myHDI_parameters.JSON";
          var deployment = azure.Deployments.Define("myDeployment")
              .WithExistingResourceGroup(groupName)
              .WithTemplateLink(templatePath, "0.9.0.0") // make sure it matches the file
              .WithParametersLink(paramPath, "1.0.0.0") // make sure it matches the file
              .WithMode(Microsoft.Azure.Management.ResourceManager.Fluent.Models.DeploymentMode.Incremental)
              .Create();
    _logger.Write("The cluster is ready...");
          executeSSHCommand();
          _logger.Write("The SSH command was executed...");
          _logger.Write("Deleting the cluster...");
          // delete the resource group
          azure.ResourceGroups.DeleteByName(groupName);

          return new Dictionary<string, string>();
      }
      private void executeSSHCommand()
      {
          ConnectionInfo ConnNfo = new ConnectionInfo("myhdi-ssh.azurehdinsight.net", "sshuser",
              new AuthenticationMethod[]{
              // Pasword based Authentication
              new PasswordAuthenticationMethod("sshuser","Addso@1234523123"),
              }
          );
          // Execute a (SHELL) Command - prepare upload directory
          using (var sshclient = new SshClient(ConnNfo))
          {
              sshclient.Connect();
              using (var cmd = sshclient.CreateCommand(
                  "hdfs dfs -copyToLocal \"wasbs:///rscript/test.R\";env -i R CMD BATCH --no-save --no-restore \"test.R\"; hdfs dfs -copyFromLocal -f \"test-output.txt\" \"wasbs:///rscript/test-output.txt\" "))
              {
                  cmd.Execute();

              }
              sshclient.Disconnect();
          }
      }
  }

  }

祝你好运!

Feodor

这篇关于如何使用Data Factory创建Azure点播高清见解Spark群集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆