Azure Data Lake Store-远程主机强行关闭了现有连接 [英] Azure Data Lake Store - existing connection was forcibly closed by the remote host

查看:99
本文介绍了Azure Data Lake Store-远程主机强行关闭了现有连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用DataLakeStoreFileSystemManagementClient类从Data Lake Store中读取文件.我们使用类似的代码打开文件的蒸汽,逐字节读取并处理它.在特定情况下,我们不能使用U-SQL进行数据处理.

I use DataLakeStoreFileSystemManagementClient class for reading files from Data Lake Store. We open a steam for the file with the code like that, read it byte by byte and process it. it is a specific case where we can not use U-SQL for data processing.

m_adlsFileSystemClient = new DataLakeStoreFileSystemManagementClient(…);
return m_adlsFileSystemClient.FileSystem.OpenAsync(m_connection.AccountName, path);

该过程最多可能需要60分钟才能读取和处理文件. 问题是:我经常收到远程主机强行关闭了现有连接".流读取过程中发生异常.尤其是当阅读需要20分钟或更长时间时.这不应该是超时,因为我使用正确的客户端超时设置创建了DataLakeStoreFileSystemManagementClient.您可以在下面找到异常详细信息.该异常看起来是随机的,很难预测何时获得.处理时间可能是第15分钟,也可能是第50分钟.

The process may take up to 60 minutes for reading and processing the file. The problem is: I am frequently getting "An existing connection was forcibly closed by the remote host." exception during the stream reading process. Especially when the reading takes 20 minutes and more. It should not be a timeout, because I create DataLakeStoreFileSystemManagementClient with a correct client timeout setting. You can find exception details below. The exception looks random and it’s difficult to predict when you get it. It can be 15th minute as well as 50th minute of processing time.

从Data Lake Store读取文件是否正常?在Data Lake Store中保持打开流的总时间是否有任何限制(或建议)?

Is it a normal situation for reading files from Data Lake Store? Is there any restrictions (or recommendations) for the total time of keeping open stream for a file it Data Lake Store?

例外:

   System.AggregateException: One or more errors occurred. -
--> System.IO.IOException: Unable to read data from the transport connection: An
existing connection was forcibly closed by the remote host. ---> System.Net.Soc
kets.SocketException: An existing connection was forcibly closed by the remote h
ost
   at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size,
SocketFlags socketFlags)
   at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 s
ize)
   --- End of inner exception stack trace ---
   at System.Net.ConnectStream.Read(Byte[] buffer, Int32 offset, Int32 size)
   at System.Net.Http.HttpClientHandler.WebExceptionWrapperStream.Read(Byte[] bu
ffer, Int32 offset, Int32 count)
   at System.Net.Http.DelegatingStream.Read(Byte[] buffer, Int32 offset, Int32 c
ount)
   at DataLake.Timeout.Research.FileDownloader.CopyStream(Stream input, Stream o
utput) in C:\TFS-SED\Main\Platform\DataNode\DataLake\DataLake.Timeout.Research\F
ileDownloader.cs:line 107
   at DataLake.Timeout.Research.FileDownloader.<DownloadFileAsync>d__6.MoveNext(
) in C:\TFS-SED\Main\Platform\DataNode\DataLake\DataLake.Timeout.Research\FileDo
wnloader.cs:line 96

推荐答案

为避免这些类型的问题,建议采取以下措施:

To avoid these types of issues, the following is recommended:

  • 读取较小的可重试块.根据我的经验,我发现 4MB块效果最好,并提供最佳性能.另外,通过 以较小的增量读取,您可以将重试逻辑合并到 如果发生故障,请从相同的偏移量重试.
  • Read in smaller, retriable chunks. In my experience I have found that 4MB chunks work best and offer the best performance. Additionally, by reading in smaller increments, you can incorporate retry logic to retry from the same offset in the event of a failure.

如果您不知道自己的信息流有多大(例如, 在阅读时由其他工作人员附加),您可以检查 的有效载荷中存在RemoteException的400错误 "BadOffsetException".这将表明您已经开始 超出文件末尾的偏移量.

If you do not know how large your stream is (for example, it is being appended to by another worker while you are reading), you can check for a 400 error with a RemoteException in the payload of "BadOffsetException". This will indicate that you have started at an offset that is beyond the end of the file.

const int MAX_BYTES_TO_READ = 4 * 1024 * 1024; //4MB
…
long offset = 0;
while(notDone)
{
try
{
                var myStream = client.Read(accountName, offset, MAX_BYTES_TO_READ)
                // do stuff with stream
}
catch(WebException ex)
{
                // read the web exception response
                If (response.contains("BadOffsetException"))
                                noteDone = false;
}
}

这篇关于Azure Data Lake Store-远程主机强行关闭了现有连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆