谷歌DataProc API火花集群与C# [英] Google DataProc API spark cluster with c#

查看:92
本文介绍了谷歌DataProc API火花集群与C#的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有Big Query中的数据我想在Spark集群中运行分析。每个文档如果我实例化一个Spark集群,它应该带有一个Big Query连接器。我正在寻找任何示例代码来执行此操作,在 pyspark 中找到。找不到任何C#示例。还发现了一些文档在DataProc API中的函数nuget包中。



寻找一个样本,使用c#在Google云中启动一个Spark集群。 安装Google.Apis.Dataproc.v1版本1.10.0.40(或更高版本)后:



以下是用于创建Dataproc集群的快速示例控制台应用程序在C#中:

使用Google.Apis.Auth.OAuth2;

  
使用Google.Apis.Services;
使用Google.Apis.Dataproc.v1;
使用Google.Apis.Dataproc.v1.Data;

使用System;
使用System.Threading;

名称空间DataprocSample {
类程序
{
static void Main(string [] args)
{
string project =您的项目这里;
字符串dataprocGlobalRegion =global;
string zone =us-east1-b;
string machineType =n1-standard-4;
string clusterName =sample-cluster;
int numWorkers = 2;
//请参阅应用程序默认凭证的文档:
// https://developers.google.com/identity/protocols/application-default-credentials
//通常,如果像自己一样运行,'gcloud auth login'就足够了。
//如果从虚拟机运行,请确保虚拟机已启动,以便服务帐户具有
// CLOUD_PLATFORM范围。
GoogleCredential凭证= GoogleCredential.GetApplicationDefaultAsync()。
if(credential.IsCreateScopedRequired)
{
credential = credential.CreateScoped(new [] {DataprocService.Scope.CloudPlatform});
}

DataprocService service = new DataprocService(
new BaseClientService.Initializer()
{
HttpClientInitializer =凭证,
ApplicationName =Dataproc示例,
});

//创建一个新的集群:
Cluster newCluster = new Cluster
{
ClusterName = clusterName,
Config = new ClusterConfig
{
GceClusterConfig = new GceClusterConfig
{
ZoneUri = String.Format(
https://www.googleapis.com/compute/v1/projects/{0}/zones/ {1},
project,zone),
},
MasterConfig = new InstanceGroupConfig
{
NumInstances = 1,
MachineTypeUri = String.Format (
https://www.googleapis.com/compute/v1/projects/{0}/zones/{1}/machineTypes/{2},
project,zone,machineType),
},
WorkerConfig = n ew InstanceGroupConfig
{
NumInstances = numWorkers,
MachineTypeUri = String.Format(
https://www.googleapis.com/compute/v1/projects/{0}/ zone / {1} / machineTypes / {2},
project,zone,machineType),
},
},
};
操作createOperation =
service.Projects.Regions.Clusters.Create(newCluster,project,dataprocGlobalRegion).Execute();
//轮询操作:
while(!IsDone(createOperation))
{
Console.WriteLine(Polling operation {0},createOperation.Name);
createOperation =
service.Projects.Regions.Operations.Get(createOperation.Name).Execute();
Thread.Sleep(1000);
}

Console.WriteLine(完成创建集群{0},newCluster.ClusterName);
}
static bool IsDone(Operation op)
{
return op.Done ??假;
}
}
}


I have data in Big Query I want to run analytics on in a spark cluster. Per documentation if I instantiate a spark cluster it should come with a Big Query connector. I was looking for any sample code to do this, found one in pyspark. Could not find any c# examples. Also found some documentation on the functions in DataProc APIs nuget package.

Looking for a sample to start a spark cluster in Google cloud using c#.

解决方案

After installing Google.Apis.Dataproc.v1 version 1.10.0.40 (or higher):

Below is a quick sample console app for creating a Dataproc cluster in C#:

using Google.Apis.Auth.OAuth2; 
using Google.Apis.Services;
using Google.Apis.Dataproc.v1; 
using Google.Apis.Dataproc.v1.Data;

using System; 
using System.Threading;

namespace DataprocSample {
    class Program
    {
        static void Main(string[] args)
        {
            string project = "YOUR PROJECT HERE";
            string dataprocGlobalRegion = "global";
            string zone = "us-east1-b";
            string machineType = "n1-standard-4";
            string clusterName = "sample-cluster";
            int numWorkers = 2;
            // See the docs for Application Default Credentials:
            // https://developers.google.com/identity/protocols/application-default-credentials
            // In general, a previous 'gcloud auth login' will suffice if running as yourself.
            // If running from a VM, ensure the VM was started such that the service account has
            // the CLOUD_PLATFORM scope. 
            GoogleCredential credential = GoogleCredential.GetApplicationDefaultAsync().Result;
            if (credential.IsCreateScopedRequired)
            {
                credential = credential.CreateScoped(new[] { DataprocService.Scope.CloudPlatform });
            }

            DataprocService service = new DataprocService(
                new BaseClientService.Initializer()
                {
                    HttpClientInitializer = credential,
                    ApplicationName = "Dataproc Sample",
                });

            // Create a new cluster:
            Cluster newCluster = new Cluster
            {
                ClusterName = clusterName,
                Config = new ClusterConfig
                {
                    GceClusterConfig = new GceClusterConfig
                    {
                        ZoneUri = String.Format(
                            "https://www.googleapis.com/compute/v1/projects/{0}/zones/{1}",
                            project, zone),
                    },
                    MasterConfig = new InstanceGroupConfig
                    {
                        NumInstances = 1,
                        MachineTypeUri = String.Format(
                            "https://www.googleapis.com/compute/v1/projects/{0}/zones/{1}/machineTypes/{2}",
                            project, zone, machineType),
                    },
                    WorkerConfig = new InstanceGroupConfig
                    {
                        NumInstances = numWorkers,
                        MachineTypeUri = String.Format(
                            "https://www.googleapis.com/compute/v1/projects/{0}/zones/{1}/machineTypes/{2}",
                            project, zone, machineType),
                    },
                },
            };
            Operation createOperation = 
                service.Projects.Regions.Clusters.Create(newCluster, project, dataprocGlobalRegion).Execute();
            // Poll the operation:
            while (!IsDone(createOperation))
            {
                Console.WriteLine("Polling operation {0}", createOperation.Name);
                createOperation =
                    service.Projects.Regions.Operations.Get(createOperation.Name).Execute();
                Thread.Sleep(1000);
            }

            Console.WriteLine("Done creating cluster {0}", newCluster.ClusterName);
        }
        static bool IsDone(Operation op)
        {
            return op.Done ?? false;
        }
    }
 }

这篇关于谷歌DataProc API火花集群与C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆