创建本地多机器Service Fabric群集时出错 [英] Error creating an on-premise multi-machine Service Fabric Cluster

查看:91
本文介绍了创建本地多机器Service Fabric群集时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了测试和评估SF供生产使用,我在具有三个节点的生产计算机上创建了一个(单机)测试集群,效果很好.但是,我无法创建具有三个节点的多计算机集群.

In order to test and evaluate SF for production use, I created one (single-machine) test cluster on a production machine with three nodes, which worked fine. However, I failed to create a multi-machine cluster with three nodes.

我按照以下说明进行操作:

I followed these instructions: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-cluster-creation-for-windows-server/

所有机器:

  • 位于具有以下IP的同一(安全)网络上:10.0.10.12、10.0.11.12、10.0.12.12.
  • 是虚拟的,是从同一张图片重新创建的.
  • 不属于域.在所有计算机上使用具有相同密码的管理员帐户完成设置.
  • 将Windows Server 2012 R2与PowerShell 4.0一起使用.
  • 已禁用防火墙(公共和私有).

这是clusterConfig.json:

This is the clusterConfig.json:

{
   "name":"SampleCluster",
   "clusterManifestVersion":"1.0.0",
   "apiVersion":"2015-01-01-alpha",
   "nodes":[
      {
         "nodeName":"vm1",
         "iPAddress":"10.0.10.12",
         "nodeTypeRef":"NodeType0",
         "faultDomain":"fd:/dc1/fd1",
         "upgradeDomain":"UD0"
      },
      {
         "nodeName":"vm2",
         "iPAddress":"10.0.11.12",
         "nodeTypeRef":"NodeType0",
         "faultDomain":"fd:/dc1/fd2",
         "upgradeDomain":"UD1"
      },
      {
         "nodeName":"vm3",
         "iPAddress":"10.0.12.12",
         "nodeTypeRef":"NodeType0",
         "faultDomain":"fd:/dc1/fd3",
         "upgradeDomain":"UD2"
      }
   ],
   "diagnosticsFileShare": {
        "etlReadIntervalInMinutes": "5",
        "uploadIntervalInMinutes": "10",
        "dataDeletionAgeInDays": "7",
        "etwStoreConnectionString": "file:c:\\ProgramData\\SF\\FileshareETW",
        "crashDumpConnectionString": "file:c:\\ProgramData\\SF\\FileshareCrashDump",
        "perfCtrConnectionString": "file:c:\\ProgramData\\SF\\FilesharePerfCtr"
    },
   "properties":{
       "reliabilityLevel": "Bronze",
      "nodeTypes": [
          {
            "name": "NodeType0",
            "clientConnectionEndpointPort": "19000",
            "clusterConnectionEndpoint": "19001",
            "httpGatewayEndpointPort": "19080",
            "applicationPorts": {
                "startPort": "20001",
                "endPort": "20031"
            },
            "ephemeralPorts": {
                "startPort": "20032",
                "endPort": "20062"
            },
            "isPrimary": true
          }
      ],
      "fabricSettings": [
        {
          "name": "Setup",
          "parameters": [
            {
                "name": "FabricDataRoot",
                "value": "C:\\ProgramData\\SF"
            },
            {
                "name": "FabricLogRoot",
                "value": "C:\\ProgramData\\SF\\Log"
            }
          ]
        }
      ]
   }
}

当我从其中一台计算机(为10.0.10.12)启动群集设置时,这将写入PowerShell控制台:

When I start the cluster setup from one of the machines (it was 10.0.10.12), this is written to the PowerShell console:

Cab extracted.
Creating Service Fabric Cluster...
If it's taking too long, please check in Task Manager details and see if Fabric.exe for each node is running. If not, p
lease look at: 1. traces in DeploymentTraces directory and 2. traces in FabricLogRoot configured in ClusterConfig.json.
Trace folder doesn't exist. Creating trace folder: C:\copy\DeploymentTraces
Verifying remote procedure call access against cluster machines.
Processing and validating cluster config.
Creating FabricSettingsMetadata from C:\copy\ServiceFabricPackage\bin\Fabric\Fabric.Code\Configurations.csv
Configuring nodes.
Copying installer & package to all machines.
Configuring machine 10.0.10.12
Configuring machine 10.0.11.12

此处设置将保留几分钟.然后发生超时:

Here the setup remains for a few minutes. Then a timeout occurs:

Timed out waiting for Installer Service to start for machine 10.0.11.12.
CreateCluster Error: System.InvalidOperationException: Cannot start service FabricInstallerSvc on computer '10.0.11.12'.
 ---> System.ComponentModel.Win32Exception: The system cannot find the file specified
   --- End of inner exception stack trace ---
   at System.ServiceProcess.ServiceController.Start(String[] args)
   at System.Fabric.DeploymentManager.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController i
nstallerSvc)
   at System.Threading.Tasks.Task.Execute()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Fabric.DeploymentManager.<CreateClusterAsyncInternal>d__a.MoveNext()
Errors occurred during cluster creation.
CreateCluster Exception 0: System.AggregateException: One or more errors occurred. ---> System.InvalidOperationException
: Cannot start service FabricInstallerSvc on computer '10.0.11.12'. ---> System.ComponentModel.Win32Exception: The syste
m cannot find the file specified
   --- End of inner exception stack trace ---
   at System.ServiceProcess.ServiceController.Start(String[] args)
   at System.Fabric.DeploymentManager.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController i
nstallerSvc)
   at System.Threading.Tasks.Task.Execute()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Fabric.DeploymentManager.<CreateClusterAsyncInternal>d__a.MoveNext()
   --- End of inner exception stack trace ---
---> (Inner Exception #0) System.InvalidOperationException: Cannot start service FabricInstallerSvc on computer '10.0.11
.12'. ---> System.ComponentModel.Win32Exception: The system cannot find the file specified
   --- End of inner exception stack trace ---
   at System.ServiceProcess.ServiceController.Start(String[] args)
   at System.Fabric.DeploymentManager.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController i
nstallerSvc)
   at System.Threading.Tasks.Task.Execute()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Fabric.DeploymentManager.<CreateClusterAsyncInternal>d__a.MoveNext()<---

当我检查特定计算机(10.0.11.12)上的服务时,我在列表中找到了 Service Fabric安装程序服务,但该服务未运行.此外,我可以在Windows事件日志中找到一个显示此错误的信息(与上面的错误消息一致):

When I check the Services on the particular machine (10.0.11.12), I found the Service Fabric Installer Service in the list, but which is not running. Further I can find an error in the Windows Event Log showing this (which is in line with the error message above):

The Service Fabric Installer Service service failed to start due to the following error: 
The system cannot find the file specified.

在特定计算机上,我找到了以下日志文​​件:C:\ ProgramData \ SF \ Log \ traces \ FabricInstallerService_5.1.150.9590_131111077992093094.trace.它包含以下内容:

On the particular machine, I located the following log file: C:\ProgramData\SF\Log\traces\FabricInstallerService_5.1.150.9590_131111077992093094.trace. It contains this:

2016-06-22 22:23:19.224,Info    ,708,General.FabricInstallerServiceImpl,FabricInstallerService starting ...
2016-06-22 22:23:19.224,Noise   ,1652,General.AsyncOperation@3b4480bcf0,Attempting to attach child AsyncOperation 3b4480bdf0.
2016-06-22 22:23:19.224,Noise   ,1652,General.AsyncOperation@3b4480bdf0,Calling OnStart
2016-06-22 22:23:19.224,Noise   ,1652,General.AsyncOperation@3b4480bdf0,Attempting to attach child AsyncOperation 3b4480c9b0.
2016-06-22 22:23:19.224,Noise   ,1652,General.AsyncOperation@3b4480c9b0,Calling OnStart
2016-06-22 22:23:19.224,Noise   ,1652,General.AsyncOperation@3b4480c9b0,FinishComplete called with S_OK
2016-06-22 22:23:19.224,Noise   ,1652,General.AsyncOperation@3b44811270,Attempting to attach child AsyncOperation 3b44811630.
2016-06-22 22:23:19.224,Noise   ,1652,General.AsyncOperation@3b44811630,Calling OnStart
2016-06-22 22:23:19.224,Noise   ,1652,General.AsyncOperation@3b4480bdf0,FinishComplete called with S_OK
2016-06-22 22:23:19.224,Noise   ,1652,General.FabricInstallerServiceImpl,FabricUpgradeManager open returned S_OK
2016-06-22 22:23:19.224,Noise   ,1652,General.AsyncOperation@3b4480bcf0,Detaching child AsyncOperation 3b4480bdf0.
2016-06-22 22:23:19.224,Noise   ,1652,General.AsyncOperation@3b4480bdf0,Detaching child AsyncOperation 3b4480c9b0.
2016-06-22 22:23:19.224,Info    ,1652,FabricInstallerService.FabricUpgradeManager,Upgrade started with FabricDataRoot:C:\ProgramData\SF, FabricLogRoot:C:\ProgramData\SF\Log, FabricCodePath:C:\Program Files\Microsoft Service Fabric\bin\fabric\fabric.code, FabricRoot:C:\Program Files\Microsoft Service Fabric, TargetInformationFilePath:C:\ProgramData\SF\TargetInformation.xml, TargetInformationDescription:TargetInformationFileDescription { CurrentInstallation = WindowsFabricDeploymentDescription { IsValid = true, InstanceId = 0, MSILocation = , ClusterManifestLocation = , InfrastructureManifestLocation = , NodeName = , UpgradeEntryPointExe = , UpgradeEntryPointExeParameters = , UndoUpgradeEntryPointExe = FabricSetup.exe, UndoUpgradeEntryPointExeParameters = /operation:Uninstall , }TargetInstallation = WindowsFabricDeploymentDescription { IsValid = false, InstanceId = , MSILocation = , ClusterManifestLocation = , InfrastructureManifestLocation = , NodeName = , UpgradeEntryPointExe = , UpgradeEntryPointExeParameters = , UndoUpgradeEntryPointExe = , UndoUpgradeEntryPointExeParameters = , }}
2016-06-22 22:23:19.224,Info    ,1652,FabricInstallerService.FabricUpgradeManager,Stopping fabric host
2016-06-22 22:23:19.224,Info    ,1652,FabricInstallerService.FabricUpgradeManager,Error 0x80070424 while waiting for fabric host service to stop.
2016-06-22 22:23:19.224,Error   ,1652,FabricInstallerService.FabricUpgradeManager,Unable to stop fabric host service; error 
2016-06-22 22:23:19.224,Error   ,1652,FabricInstallerService.FabricUpgradeManager,Error E_FAIL while trying to stop fabric host service
2016-06-22 22:23:19.224,Noise   ,1652,General.AsyncOperation@3b44811630,FinishComplete called with E_FAIL
2016-06-22 22:23:19.224,Warning ,1652,FabricInstallerService.FabricUpgradeManager,Upgrade finished with error E_FAIL
2016-06-22 22:23:19.224,Info    ,1636,General.FabricInstallerServiceImpl,service stopping (shutdown = false) ...
2016-06-22 22:23:19.224,Info    ,1636,General.FabricInstallerServiceImpl,Stop FabricUpgradeManager called
2016-06-22 22:23:19.240,Info    ,2472,General.FabricInstallerServiceImpl,Close FabricUpgradeManager, with timeout 5:00.000 
2016-06-22 22:23:19.240,Noise   ,2472,General.AsyncOperation@3b4480be00,Attempting to attach child AsyncOperation 3b4480c4d0.
2016-06-22 22:23:19.240,Noise   ,2472,General.AsyncOperation@3b4480c4d0,Calling OnStart
2016-06-22 22:23:19.240,Noise   ,2472,General.AsyncOperation@3b4480c4d0,Attempting to attach child AsyncOperation 3b4480c5d0.
2016-06-22 22:23:19.240,Noise   ,2472,General.AsyncOperation@3b4480c5d0,Calling OnStart
2016-06-22 22:23:19.240,Noise   ,2472,General.AsyncOperation@3b4480c5d0,FinishComplete called with S_OK
2016-06-22 22:23:19.240,Noise   ,2472,General.AsyncOperation@3b4480c4d0,FinishComplete called with S_OK
2016-06-22 22:23:19.240,Noise   ,2472,General.FabricInstallerServiceImpl,Close FabricUpgradeManager returned S_OK
2016-06-22 22:23:19.240,Noise   ,2472,General.AsyncOperation@3b4480be00,Detaching child AsyncOperation 3b4480c4d0.
2016-06-22 22:23:19.240,Noise   ,2472,General.AsyncOperation@3b4480c4d0,Detaching child AsyncOperation 3b4480c5d0.

这就是我被困住的地方.我的想法是:

This is the point where I am stuck. My thoughts are:

  • 由于复制了安装文件并开始了安装过程,因此机器之间的通信和可访问性似乎还可以.
  • Service Fabric安装程序服务似乎在这里起着重要作用.
  • 在我开始设置过程的计算机上,Service Fabric安装程序服务似乎正常工作,但是在远程计算机上,它们失败了.

有什么想法吗?谢谢.

推荐答案

我不确定您的方案,因为我只有使用Windows Machine组或gMSA帐户安装群集的经验-这一直很痛苦经验...但是如果坚持下去,最终会到达目的地!

I'm not exactly sure of your scenario as I've only had experience installing cluster using Windows Machine group or gMSA account - and it's always been a rather painful experience...but if you persist, you'll get there in the end!

您提到它正在安全的网络中运行,但不是域的一部分吗?通常,在Active Directory环境中,SF在NETWORK SERVICE帐户下运行-因此,可能您可以尝试将所有计算机添加到每台计算机的本地Administrators组中.

You mention that it is running in a secured network, but not part of a domain? Typically in Active directory environments, SF runs under NETWORK SERVICE account - so potentially you can try adding all the machines to local Administrators group of each machine.

我知道,使用gMSA帐户,我必须将此帐户添加到每台计算机上的本地管理员组中-并授予其登录服务的权限.

I know that with gMSA account, I had to add this to local admin group on each machine - as well as grant it logon as a service.

除此之外,我还建议您检查事件日志-特别是管理员和安全审核日志

Other than that, I also suggest you check the event log - Administrators and Security Audit logs, in particular

这篇关于创建本地多机器Service Fabric群集时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆