WCF超时异常详查 [英] WCF timeout exception detailed investigation

查看:22
本文介绍了WCF超时异常详查的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个应用程序,它具有在 IIS7 上运行的 WCF 服务 (*.svc) 和查询该服务的各种客户端.服务器正在运行 Win 2008 Server.客户端运行 Windows 2008 Server 或 Windows 2003 Server.我收到以下异常,我已经看到这实际上可能与大量潜在的 WCF 问题有关.

We have an application that has a WCF service (*.svc) running on IIS7 and various clients querying the service. The server is running Win 2008 Server. The clients are running either Windows 2008 Server or Windows 2003 server. I am getting the following exception, which I have seen can in fact be related to a large number of potential WCF issues.

System.TimeoutException: The request channel timed out while waiting for a reply after 00:00:59.9320000. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. The time allotted to this operation may have been a portion of a longer timeout. ---> System.TimeoutException: The HTTP request to 'http://www.domain.com/WebServices/myservice.svc/gzip' has exceeded the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout. 

我已将超时增加到 30 分钟,但错误仍然发生.这告诉我还有其他东西在起作用,因为上传或下载的数据量永远不会花费 30 分钟.

I have increased the timeout to 30min and the error still occurred. This tells me that something else is at play, because the quantity of data could never take 30min to upload or download.

错误来来去去.目前,它更频繁.如果我同时运行 3 个客户端或 100 个客户端似乎并不重要,它仍然会偶尔发生.大多数时候,没有超时,但我仍然每小时有几次.错误来自调用的任何方法.其中一个方法没有参数并返回一些数据.另一个接受大量数据作为参数但异步执行.错误总是源自客户端,并且从不在堆栈跟踪中引用服务器上的任何代码.它总是以:

The error comes and goes. At the moment, it is more frequent. It does not seem to matter if I have 3 clients running simultaneously or 100, it still occurs once in a while. Most of the time, there are no timeouts but I still get a few per hour. The error comes from any of the methods that are invoked. One of these methods does not have parameters and returns a bit of data. Another takes in lots of data as a parameter but executes asynchronously. The errors always originate from the client and never reference any code on the server in the stack trace. It always ends with:

 at System.Net.HttpWebRequest.GetResponse()
  at System.ServiceModel.Channels.HttpChannelFactory.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout)

在服务器上:我已经尝试过(目前有)以下绑定设置:

On the server: I've tried (and currently have) the following binding settings:

maxBufferSize="2147483647" maxReceivedMessageSize="2147483647" maxBufferPoolSize="2147483647"

似乎没有影响.

我已经尝试过(目前有)以下限制设置:

I've tried (and currently have) the following throttling settings:

<serviceThrottling maxConcurrentCalls="1500"   maxConcurrentInstances="1500"    maxConcurrentSessions="1500"/>

似乎没有影响.

我目前对 WCF 服务有以下设置.

I currently have the following settings for the WCF service.

[ServiceBehavior(InstanceContextMode = InstanceContextMode.Single, ConcurrencyMode = ConcurrencyMode.Single)]

我用ConcurrencyMode.Multiple 跑了一会,还是出现错误.

I ran with ConcurrencyMode.Multiple for a while, and the error still occurred.

我尝试重新启动 IIS,重新启动我的底层 SQL Server,重新启动机器.所有这些似乎都没有影响.

I've tried restarting IIS, restarting my underlying SQL Server, restarting the machine. All of these don't seem to have an impact.

我已尝试禁用 Windows 防火墙.它似乎没有影响.

I've tried disabling the Windows firewall. It does not seem to have an impact.

在客户端,我有以下设置:

On the client, I have these settings:

maxReceivedMessageSize="2147483647"

<system.net>
    <connectionManagement>
    <add address="*" maxconnection="16"/>
</connectionManagement> 
</system.net>

我的客户端关闭了它的连接:

My client closes its connections:

var client = new MyClient();

try
{
    return client.GetConfigurationOptions();
}
finally
{
    client.Close();
}

我已更改注册表设置以允许更多传出连接:

I have changed the registry settings to allow more outgoing connections:

MaxConnectionsPerServer=24, MaxConnectionsPer1_0Server=32.

我最近刚刚尝试了 SvcTraceViewer.exe.我设法在客户端捕获了一个异常.我看到它的持续时间是 1 分钟.查看服务器端跟踪,我可以看到服务器不知道此异常.我能看到的最长持续时间是 10 秒.

I have now just recently tried SvcTraceViewer.exe. I managed to catch one exception on the client end. I see that its duration is 1 minute. Looking at the server side trace, I can see that the server is not aware of this exception. The maximum duration I can see is 10 seconds.

我在服务器上使用 exec sp_who 查看了活动的数据库连接.我只有几个(2-3).我已经使用 TCPview 查看了来自一个客户端的 TCP 连接.它通常在 2-3 个左右,我见过最多 5 个或 6 个.

I have looked at active database connections using exec sp_who on the server. I only have a few (2-3). I have looked at TCP connections from one client using TCPview. It usually is around 2-3 and I have seen up to 5 or 6.

简单地说,我被难住了.我已经尝试了我能找到的所有东西,并且一定遗漏了 WCF 专家能够看到的非常简单的东西.我的直觉是,在服务器实际接收消息之前,某些东西在低级别 (TCP) 阻塞了我的客户端,和/或某些东西在服务器级别对消息进行排队,并且从不让它们处理.

Simply put, I am stumped. I have tried everything I could find, and must be missing something very simple that a WCF expert would be able to see. It is my gut feeling that something is blocking my clients at the low-level (TCP), before the server actually receives the message and/or that something is queuing the messages at the server level and never letting them process.

如果您有任何我应该查看的性能计数器,请告诉我.(请指出哪些值是错误的,因为其中一些计数器难以破解).另外,如何记录 WCF 消息大小?最后,是否有任何工具可以让我测试我可以在客户端和服务器之间建立多少连接(独立于我的应用程序)

If you have any performance counters I should look at, please let me know. (please indicate what values are bad, as some of these counters are hard to decypher). Also, how could I log the WCF message size? Finally, are there any tools our there that would allow me to test how many connections I can establish between my client and server (independently from my application)

感谢您的时间!

6 月 20 日添加的额外信息:

Extra information added June 20th:

我的 WCF 应用程序执行类似于以下的操作.

My WCF application does something similar to the following.

while (true)
{
   Step1GetConfigurationSettingsFromServerViaWCF(); // can change between calls
   Step2GetWorkUnitFromServerViaWCF();
   DoWorkLocally(); // takes 5-15minutes. 
   Step3SendBackResultsToServerViaWCF();
}

使用 WireShark,我确实看到当错误发生时,我进行了 5 次 TCP 重传,随后进行了 TCP 重置.我的猜测是 RST 来自 WCF 终止连接.我得到的异常报告来自 Step3 超时.

Using WireShark, I did see that when the error occurs, I have a five TCP retransmissions followed by a TCP reset later on. My guess is the RST is coming from WCF killing the connection. The exception report I get is from Step3 timing out.

我通过查看 tcp 流tcp.stream eq 192"发现了这一点.然后我将过滤器扩展为tcp.stream eq 192 and http and http.request.method eq POST",并在此流中看到了 6 个 POST.这看起来很奇怪,所以我检查了另一个流,例如 tcp.stream eq 100.我有三个 POST,这似乎更正常一些,因为我正在执行三个调用.但是,我会在每次 WCF 调用后关闭我的连接,因此我预计每个流都会调用一次(但我对 TCP 了解不多).

I discovered this by looking at the tcp stream "tcp.stream eq 192". I then expanded my filter to "tcp.stream eq 192 and http and http.request.method eq POST" and saw 6 POSTs during this stream. This seemed odd, so I checked with another stream such as tcp.stream eq 100. I had three POSTs, which seems a bit more normal because I am doing three calls. However, I do close my connection after every WCF call, so I would have expected one call per stream (but I don't know much about TCP).

进一步调查,我将 http 数据包负载转储到磁盘以查看这六个调用的内容.

Investigating a bit more, I dumped the http packet load to disk to look at what these six calls where.

1) Step3
2) Step1
3) Step2
4) Step3 - corrupted
5) Step1
6) Step2

我的猜测是两个并发客户端使用相同的连接,这就是我看到重复的原因.但是,我还有一些我无法理解的问题:

My guess is two concurrent clients are using the same connection, that is why I saw duplicates. However, I still have a few more issues that I can't comprehend:

a) 为什么数据包损坏了?随机网络侥幸 - 也许?使用以下示例代码压缩加载:http://msdn.microsoft.com/en-us/library/ms751458.aspx - 代码在同时使用时是否偶尔会出错?我应该在没有 gzip 库的情况下进行测试.

a) Why is the packet corrupted? Random network fluke - maybe? The load is gzipped using this sample code: http://msdn.microsoft.com/en-us/library/ms751458.aspx - Could the code be buggy once in a while when used concurrently? I should test without the gzip library.

b) 为什么我会看到步骤 1 &在损坏的操作超时后运行的第 2 步?在我看来,这些操作似乎不应该发生.也许我看的不是正确的流,因为我对 TCP 的理解有缺陷.我还有其他流同时发生.我应该调查其他流 - 快速浏览流 190-194 表明 Step3 POST 具有正确的有效负载数据(未损坏).逼我再次查看 gzip 库.

b) Why would I see step 1 & step 2 running AFTER the corrupted operation timed out? It seems to me as if these operations should not have occurred. Maybe I am not looking at the right stream because my understanding of TCP is flawed. I have other streams that occur at the same time. I should investigate other streams - a quick glance at streams 190-194 show that the Step3 POST have proper payload data (not corrupted). Pushing me to look at the gzip library again.

推荐答案

如果您使用的是 .Net 客户端,那么您可能没有设置

If you are using .Net client then you may not have set

//This says how many outgoing connection you can make to a single endpoint. Default Value is 2
System.Net.ServicePointManager.DefaultConnectionLimit = 200;

这是原始问题和答案 WCF 服务限制

更新:

此配置进入 .Net 客户端应用程序可能在启动时或任何时候但在开始测试之前.

This config goes in .Net client application may be on start up or whenever but before starting your tests.

此外,您还可以在 app.config 文件中使用它,如下所示

Moreover you can have it in app.config file as well like following

<system.net>
    <connectionManagement>
      <add maxconnection = "200" address ="*" />
    </connectionManagement>
  </system.net>

这篇关于WCF超时异常详查的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆