WCF超时是一场噩梦 [英] WCF timeouts are a nightmare

查看:238
本文介绍了WCF超时是一场噩梦的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一大堆的WCF服务,工作的时间都差不多,使用各种绑定,端口,最大尺寸等有关WCF的超沮丧的是,当它(很少)失败,我们都无能为力找到走出失败的原因。有时你会得到一个消息,如下所示:

We have a bunch of WCF services that work almost all of the time, using various bindings, ports, max sizes, etc. The super-frustrating thing about WCF is that when it (rarely) fails, we are powerless to find out why it failed. Sometimes you will get a message that looks like this:

System.ServiceModel.CommunicationException:   套接字连接被中止。   这可能是由错误引起的   处理您的信息或接收   被超过远程超时   主机,或者一个基础网络   资源问题。本地套接字超时   为'01:00:00'。 --->   System.IO.IOException:无法读取   从传输连接数据:一个   现有的连接被强行   由远程主机关闭。

System.ServiceModel.CommunicationException: The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '01:00:00'. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.

的问题是,本地套接字超时它给你仅是试图要方便。它可能是也可能不是问题的原因。但是OK,有时网络有问题。没什么大不了。我们可以重试什么的。但这里有一个很大的问题。在没有告诉你哪些precisely之上设定超时(如果有的话),导致失败的(你的服务器端接收超时被突破,什么的,将是有益的),WCF似乎有两种类型的超时

The problem is that the local socket timeout it's giving you is merely an attempt to be convenient. It may or may not be the cause of the problem. But OK, sometimes networks have issues. No big deal. We can retry or something. But here's the huge problem. On top of failing to tell you which precisely which timeout (if any) resulted in the failure ("your server-side receive timeout was exceeded," or something, would be helpful), WCF seems to have two types of timeouts.

超时类型#1)超时,即,若增加,会增加你的操作成功的机会。因此,相关的超时时间是一个小时,你上传一个巨大的文件,将采取一小时二十分钟。它失败。您增加超时,它成功。我没有任何与这种类型的超时的问题。

Timeout Type #1) A timeout, that, if increased, would increase the chance of your operation's success. So, the pertinent timeout is an hour, you are uploading a huge file that will take an hour and twenty minutes. It fails. You increase the timeout, it succeeds. I have no no problem with this type of timeout.

超时类型#2)超时它仅仅定义你有多久等待服务实际上失败,并给你一个错误,但修改的值此超时有成功的机会没有影响。基本上,一些东西,渣土的事情了服务请求的第一秒过程中发生。它永远不会恢复。 WCF不会奇迹般地重新尝试为您的网络连接。好吧,有时候建立网络连接不顺利。但是,如果你的超时时间为2小时,你必须为等2整个小时,没有机会以往任何时候都努力才最终承认,它没有工作,给你错误的

但你在这两种情况下看到错误看起来是一样的。由于超时类型#2,它仍然看起来像你正在运行到超时。但是,你可以增加你所有的超时4年,和所有它会做的是让它需要4年才能得到一个错误信息。我知道类型#2的存在,因为我能做到这一点是众所周知的完成操作不到一分钟时取得成功,并要花2个小时失败。但是,如果我杀了它,并重新尝试,它成功迅速。 (如果你想知道为什么有可能是2个小时的超时时间上的操作,不到一分钟花费较少的,也有我运行操作与文件大很多倍,它可以每小时接管。)

But the error you see in both cases looks the same. With timeout Type #2, it still looks like you are running into a timeout. But, you could increase all of your timeouts to 4 years, and all it would do is make it take 4 years to get an error message. I know that Type #2 exists because I can do an operation that is known to complete in less than a minute when successful, and have it take 2 hours to fail. But, if I kill it and retry, it succeeds quickly. (If you are wondering why there might be a 2 hour timeout on an operation that takes less than a minute, there are times I run the operation with a much larger file and it could take over an hour.)

所以,打击与类型#2的问题,你会希望你的超时时间,真正做到快让你马上知道是否有问题。然后,你可以重试。但不可逾越的问题是,因为我不知道哪个超时是失败的原因,我不知道什么是超时类型#1,哪些类型#2。可能有一个超时(假设客户端发送超时),其作用类似于在某些情况下,键入#1,而在其他类型#2。我不知道,我也查不到了。

So, to combat the problem with Type #2, you'd want your timeout to be really quick so you immediately know if there is a problem. Then you can retry. But the insurmountable problem is that because I don't know which timeouts are the cause of failure, I don't know what timeouts are Type #1 and which ones are Type #2. There may be one timeout (let's say the client-side send timeout) that acts like Type #1 in some cases and Type #2 in others. I have no idea, and I have no way of finding out.

有谁知道如何跟踪类型#2超时,所以我可以将它们设置为较低的值,而无需实际缩短(读:类型#1)超时,降低成功的机会

Does anyone know how to track down Type #2 timeouts so I can set them to low values without having to shorten actual (read: Type #1) timeouts and lower the chance of success?

感谢你。

类型#澄清2超时响应安德鲁·安德森的评论:

我的信念是,出现问题的客户端请求和code开始在服务器上执行的。在那里,我们有服务器code所有案例表明部分进展,它从来没有完成的一些操作没有完成整个事情。因此,服务器code从来没有得到执行,而且它需要多长时间来执行是无关的(除了它影响了我们,以适应其设置了超时值来摆在首位)。

My belief is that something goes wrong between the client request and the code starting to execute on the server. In all cases where we have the server code indicate partial progress, it's never finished some of the operation without finishing the whole thing. So, the server code never gets to execute, and how long it would take to execute is irrelevant (other than that it affects what we set our timeout values to in the first place in order to accommodate it).

推荐答案

我总是把我的长期运行的WCF服务心跳的消息。然后你就可以将Type#1超时为低值(2-3次心跳调用频率),和Type 2超时变得明显。

I always put a "heartbeat" message in my long-running WCF services. Then you can set Type #1 timeouts to a low value (2-3 times the heartbeat call frequency), and Type #2 timeouts become obvious.

这篇关于WCF超时是一场噩梦的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆