迁移从.NET 1.1到3.5后意外行为 [英] Unexpected behaviour after migrating from .NET 1.1 to 3.5

查看:167
本文介绍了迁移从.NET 1.1到3.5后意外行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在工作中我们最近迁移从Framework 1.1的一整套服务,框架3.5。我们刚刚导入和重新编译旧的(1.1)的项目。

at work we recently migrated a set of services from framework 1.1 to framework 3.5. We just imported and recompiled old (1.1) projects.

这些服务检查每7秒的DB中,门控器多项任务做的,并为每个任务创建一个线程(使用线程池),使同步(在线程范围)的web服务调用(和其他一些东西)。呼叫完成之后该线程完成

Those services check every 7 seconds the DB, gater a number of task to do and for each task create a thread (using ThreadPool) that make a synchronous (in the thread scope) web service call (and some other thing). After the call is done the thread is completed.

我们创建新线程仅当线程池得到了可用的总线程的至少50%(我们使用ThreadPool.GetAvailableThreads检查)。如果是这样,则线程被创建,否则服务等待第二(Thread.sleep代码(1000))。然后进行新的检查

We create a new thread only if the ThreadPool got at least 50% of the total threads available (we use ThreadPool.GetAvailableThreads to check that). If so, the thread is created, otherwise the service wait a second (Thread.Sleep(1000);) and then make a new check.

在1.1所有这一切工作正常,并似乎仍然在3.5做工精细,但只有在试机。

All this worked fine in 1.1 and still seem to work fine in 3.5, but only in test machine.

所有的问题开始一经发布到生产机器:检查和线程做正确,但Web服务调用启动一个巨大的延迟(甚至20多分钟),并慢慢解决:虽然1.1版本完整的3个呼叫每秒(更多或更少),3.5版本,要解决的呼叫每20秒(并且,如所述的,每20分钟的延迟)。我们没有看到任何连接问题,也没有超时。

All the problem started once released to production machine: the check and the threads are correctly done, but web service call start with a huge delay (even 20 and more minutes) and are resolved slowly: while the 1.1 version complete 3 call per second (more or less), the 3.5 version resolve a call every 20 seconds (and, as said, each of them with 20 minute delay). We didn't see any connection problem nor timeouts.

我们看到的是,当我们启动服务大量的线程(大约30)被分配在很小的时候,每它立刻开始WS调用(即code执行。()同步调用)。现在的问题是将这些电话只有一个在时间和几分钟解决当作出他们之后

What we see is that when we start the service a number of thread (about 30) is allocated in very small time and each of it immediatly start a WS call (i.e. the code execute the .() synchronous call). The problem is that those calls are resolved only one at time and several minutes after when they are made.

现在:为什么这只会发生在生产机器?为什么在其他试验机不会出现这种情况?

Now: why this happen only in production machine? Why in other test machine this won't happen?

任何建议?

在此先感谢。

修改

我们在做服务的一些变化。现在我们最多允许螺纹(35)的固定数量的被分配的线程池。当这个数达到服务等待一个线程创建一个新的前完成。这个过程做工精细,但我们仍然注意到,在一个更​​大的时间跨度执行的每个Web请求:

We made some changes in the service. Now we allow at most a fixed number of thread (35) to be allocated in the threadpool. When this number is reached the service await for a thread to complete prior to create a new one. The process work fine, but we still notice that each web request is performed in a larger time span:

[11:48:40,365] [  336] INFO  MyClass.Call(): Time elapsed: 00:04:07.0172059
[11:49:14,771] [ 3052] INFO  MyClass.Call(): Time elapsed: 00:04:41.4236761
[11:49:27,256] [ 3928] INFO  MyClass.Call(): Time elapsed: 00:04:53.9081310
[11:49:51,303] [ 5444] INFO  MyClass.Call(): Time elapsed: 00:05:17.9551599
[11:50:34,787] [ 5332] INFO  MyClass.Call(): Time elapsed: 00:06:00.2054303
[11:50:55,366] [ 4672] INFO  MyClass.Call(): Time elapsed: 00:06:19.7211802
[11:51:15,288] [ 1248] INFO  MyClass.Call(): Time elapsed: 00:06:38.6588014
[11:51:34,975] [ 4168] INFO  MyClass.Call(): Time elapsed: 00:06:57.3307959
[11:53:00,429] [ 5224] INFO  MyClass.Call(): Time elapsed: 00:08:18.7531920
[11:53:18,367] [  908] INFO  MyClass.Call(): Time elapsed: 00:08:35.7064255
[11:53:44,398] [ 3616] INFO  MyClass.Call(): Time elapsed: 00:09:00.7222106
[11:54:04,523] [ 6108] INFO  MyClass.Call(): Time elapsed: 00:09:19.8942083
[11:54:29,523] [ 5536] INFO  MyClass.Call(): Time elapsed: 00:09:43.8943619
[11:54:47,242] [ 5048] INFO  MyClass.Call(): Time elapsed: 00:10:00.5819687
[11:55:05,992] [ 3756] INFO  MyClass.Call(): Time elapsed: 00:10:18.3164572
[11:55:26,508] [ 4628] INFO  MyClass.Call(): Time elapsed: 00:10:37.8322071
[11:55:52,493] [ 5812] INFO  MyClass.Call(): Time elapsed: 00:11:02.8167420
[11:56:22,305] [ 5752] INFO  MyClass.Call(): Time elapsed: 00:11:32.0356790
[11:56:31,680] [ 4688] INFO  MyClass.Call(): Time elapsed: 00:11:41.0513617
[11:57:00,556] [ 5844] INFO  MyClass.Call(): Time elapsed: 00:12:08.8952899
[11:57:10,759] [ 5760] INFO  MyClass.Call(): Time elapsed: 00:12:18.0203483
[11:57:49,321] [ 5684] INFO  MyClass.Call(): Time elapsed: 00:12:35.3017089
[11:57:59,056] [ 5920] INFO  MyClass.Call(): Time elapsed: 00:12:26.9110302

进一步的检查证明,有效的TCP最后调用在20秒左右,但只有澈code通话后进行几分钟的时间。

Further inspections evidenced that the effective TCP call last in about 20 seconds, but is performed only several minutes after che code call.

例如,下面的code:

For example, the following code:

slg.Info("Beginning connection.");

DateTime callStart = DateTime.Now;

odOutput = ws_Proxy.WSMethod(odInput);;

log.InfoFormat("Time elapsed: {0}", DateTime.Now - callStart);

产生该日志

[11:47:59,396] [ 5468] INFO  MyClass.Call(): Beginning connection.
[12:00:17,026] [ 5468] INFO  MyClass.Call(): Time elapsed: 00:12:17.6297208

但TCP连接只发11:59:56039,并在完成12:00:17404

but the TCP connection is made only at 11:59:56,039 and completed at 12:00:17,404

11:59:56.0399905 TCP Connect    SUCCESS Length: 0, mss: 1300, sackopt: 4, tsopt: 0, wsopt: 0, rcvwin: 64240, rcvwinscale: 0, sndwinscale: 0, connid: 2254643208, seqnum: 155953480  0
11:59:56.8962917 TCP Send...    SUCCESS Length: 302, startime: 1012803, endtime: 1012811, connid: 2254643208, seqnum: 0 0
11:59:56.9794909 TCP Receive    SUCCESS Length: 2, connid: 2254643208, seqnum: 0    0
11:59:57.7751084 TCP Send...    SUCCESS Length: 625, startime: 1012806, endtime: 1012820, connid: 2254643208, seqnum: 0 0
11:59:57.7751656 TCP Receive    SUCCESS Length: 2, connid: 2254643208, seqnum: 0    0
12:00:08.3723058 TCP Receive    SUCCESS Length: 1300, connid: 2254643208, seqnum: 0 0
12:00:16.5336435 TCP Unknown    SUCCESS Length: 0, connid: 2254643208, seqnum: 0    0
12:00:16.5337723 TCP Receive    SUCCESS Length: 1265, connid: 2254643208, seqnum: 0 0
12:00:17.4046865 TCP Disconnect SUCCESS Length: 0, connid: 2158355531, seqnum: 4133859281   0

11之间发生什么事:47:59396和11:59:56039?为什么呼叫和实际连接之间的间隔长了这么多?为什么这种行为只发生时都是异步执行的一些要求?这就像框架是无法执行的时间超过一个连接...

What happen between 11:47:59,396 and 11:59:56,039? Why the interval between the call and the actual connection grow so much? Why this behaviour happen only when a number of request are performed asynchronously? It's like the framework is unable to perform more than one connection at time...

推荐答案

最后,我们发现似乎是由代理造成的问题(系统管理员也说不清谁或为什么配置该代理)。

At last we found the problem that seem to be caused by a proxy (sysadmins cannot tell who or why configured this proxy).

此代理仅由进程在运行的系统(如服务)激活。运行过程中的用户(如用于测试小的应用程序)不使用这个代理,这就是为什么一般的小测试应用程序运行,而服务与一些问题上运行。

This proxy is activated only by process running as system (like the services). Process running as user (like the small app used to test) doesn't use this proxy, and this is why the small test app run normally while the services run with a number of issues.

我们解决了这个问题,只需添加如下的代码到服务配置:

We resolved the issue simply adding following lines to the services configuration:

<system.net>
  <defaultProxy enabled="false" />
</system.net>

这篇关于迁移从.NET 1.1到3.5后意外行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆