怪异的网络问题 [英] weired network problem

查看:143
本文介绍了怪异的网络问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们确实具有将原始图像数据处理成其他格式的软件.为了提高吞吐量,该软件安装在三台服务器上(假设它们分别是A,B和C),它们通过网络(10G,Cisco)共享存储.在每台服务器上,该软件的4个实例正在运行.这在2-3天内效果很好.之后,如果服务器上的一个实例曾经使用过源文件aaa,则没有其他实例无法使用该文件(不可访问或需要很长的响应时间).这对于MS应用程序(例如文件管理器)是相同的.我们可以看到目录结构没有任何问题.因此网络已连接.但是,一旦我们使用任何软件(例如记事本)打开任何文件,便无法在其他计算机上打开该文件.操作系统是Windows Server2003.
我们没有解决此问题的线索.一件事是思科网络交换机的电源不稳定.每月重置一次.但是,问题发生的频率与该事件不匹配.

是否有解决此问题的建议?

在此先感谢.

Moon

We do have a software which process raw image data into other format. In order to increase throughput this software was installed on three servers (let's say they are A,B, and C) and they share storage via network (10G, Cisco). On each server 4 instances of this software are running. This works very well for 2-3 days. Afterwards,  if one instance on A server once used a source file aaa, then no other instance cannot use that file (not accessable, or requires very long response time). This is same for MS application such as file manager. we can see the directory structure without any problem. So network is connected. However, once we open any file using any software (for example notepad) then we cannot open it on other machine.  Os is Windows server 2003.
We have no clue on this matter to resolve the problem. One thing is that the power supply to Cisco network switch is not stable. It was reset once in a month. However, the frequency the problem happens does not match to that event.

Any suggestion to resolve this problem?

Many thanks in advance.

Moon

推荐答案

好吧,首先让我说,这些论坛更适合于Network Monitor的特定问题.文件系统专家可能会更好地解决这个问题.但是,我们应该能够使用网络监视器来说明正在发生的事情.

我的猜测是,您遇到了机会锁定问题.使用机会锁定,文件可以被操作系统锁定.当另一台计算机想要打开文件并被锁定时,它将结束oplock中断,这可能需要35秒钟才能释放.这是一些相关文章

http://support.microsoft.com/kb/296264

http://support.microsoft.com/kb/885451

http://msdn.microsoft.com/en-us/library/aa365433(VS.85).aspx

您可以通过跟踪实例来使用网络监视器来证明这一点.然后,如果您知道正在暂停的文件,请执行过滤器来搜索该文件.我只搜索整个框架.

ContainsBin(FrameData,ASCII,"Sample")或ContainsBin(FrameData,UTF16BE,"Sample")

您也可以使用SimpleSearch专家来查找流量,如果您不知道文件名的大小写会有所帮助.

一旦找到相关的框架,由于您显然是打开的,因此可能会有多个框架它两次.对于每个实例,右键单击并选择查找对话",然后选择"TCP".然后查看时间偏移",并为每个偏移量寻找35秒的跳跃.

您可以关闭机会锁定,但是请务必确保并阅读相关文章,以确保不需要您的服务器.

,谢谢,

Paul
Well, first let me say that these forums are more geared towards Network Monitor specific issues.  This questions might be better handled by an expert in file systems.  However, we should be able to use Network Monitor to tell what is going on.

My guess here is that you are running into opportunistic locking issue.  With opportunistic locking, a file can be locked down by the OS.  When another machine wants to open the file and it's locked, it wills end an oplock break which can take up to 35 seconds to release.  Here are some related articles

http://support.microsoft.com/kb/296264

http://support.microsoft.com/kb/885451

http://msdn.microsoft.com/en-us/library/aa365433(VS.85).aspx

You can prove this with network monitor by taking a trace of the instance.  Then if you know the file that it's pausing on, do a filter to search for this file.  I would just search the whole frame.

ContainsBin(FrameData, ASCII, "Sample") OR ContainsBin(FrameData, UTF16BE, "Sample")

You could also use the SimpleSearch expert to find the traffic which will help if you don't know the case of the file name.

Once you find the related frames, there may be more than one since you are apparently openning it twice.  For each instance, right click and choose Find Conversation, and pick TCP.  Then look at the Time Offset and look for a 35 sec jump for each one.

You can turn off opportunistic locking, but you should be sure and read the related articles to make sure this is not required on your server.

Thanks,

Paul


这篇关于怪异的网络问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆