SQL Azure:更多的间歇性超时 [英] SQL Azure: More Intermittent Timeouts

查看:268
本文介绍了SQL Azure:更多的间歇性超时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一套5台在线拍卖系统,运行在Windows Azure& SQL Azure。每个系统由单个Web员工和一个或多个Web角色组成。每个系统都使用ASP.NET MVC 3和实体框架,存储库模式和StructureMap。



工作人员角色负责管理并运行两组进程。一组每十秒钟运行一次,另一组每秒运行一次。每个进程都可能运行数据库查询或存储过程。这些计划与Quartz.net计划



Web角色提供公共界面和后台。除了其他基本的原始功能之外,这两个都提供了屏幕,当打开时,将重复调用控制器方法,这将导致执行存储过程只读查询。每位客户的重复次数约为2-3秒。一个典型的用例将是5个后台窗口打开,25个最终用户窗口打开 - 全部重复使用系统。



很长一段时间以来,我们经历了间歇性SQL超时错误。最常见的三个是:


System.Data.SqlClient.SqlException:当从服务器。 (提供程序:TCP提供程序,错误:0 - 远程主机强制关闭现有连接。)



System.Data.SqlClient.SqlException:传输级错误在从服务器接收结果时发生。 (提供程序:TCP提供程序,错误:0 - 信号量超时时间已过期。)



System.Data.SqlClient.SqlException:超时已过期。在完成操作或服务器之前经过的超时时间没有响应。


唯一可预测的场景是拍卖期间特定控制器 - > sproc在事件期间开始超时(大概是由于负载)。所有其他时间,错误似乎是完全随机的,即使在用户不活动期间,也会出现单打,二,三等等。例如,系统将在18小时内没有错误,然后可能是来自不同家政方式的5 - 10个错误,或者可能是用户登录并查看了他们的帐户。



其他信息:



我已尝试使用本地SSMS和Azure基于Web的查询工具在SQL Azure上运行受影响的查询/ sprocs,似乎很快执行,1第二最大查询计划不显示任何东西太可疑,虽然我绝对不是SQL查询性能专家,或任何其他类型的专家J



我们已经包裹了所有受影响的区域在Azure SQL瞬态故障处理块中 - 但是如下所述 http://social.msdn.microsoft.com/Forums/en-US/ssdsgetstarted/thread/7a50985d-92c2-472f-9464-a6591efec4b3 ,他们不会超时,并且到瓦列里M这是很好的理由。



尽管asp.net会员信息存储在数据库中,我们并没有在数据库中存储任何会话信息。 / p>

我们使用1个SQL Azure服务器实例,它托管所有5个数据库,2个用于分段,3个用于生产。所有5个系统通常在同一时间是有效的,尽管在任何给定的时间,不仅一个以上的系统将处于使用状态。
所有的Web角色,工作人员角色和SQL Azure服务器都驻留在相同的Azure地理区域。



有关我们应该在哪里查看的任何想法?它会帮助每个系统自己的SQL Azure服务器? ...我们自己无法解决 - 有可能让微软打开一张支持票,看看我们的应用程序发生了什么 - 这是怎么回事?



提前感谢



Ilan

解决方案

SQL Azure是一个多租户系统,您可能会遇到潜在的其他租户过度使用。 Microsoft通过保持其他租户的调节来做一个OK工作,但是一段时间后,SQL Azure查询会超时。



要打开Microsoft的支持,请访问此页面: https://support.microsoft。 com / oas / default.aspx?gprid = 14919& st = 1& wfxredirect = 1& sd = gn


We have a set of 5 online auction systems running on Windows Azure & SQL Azure. Each system consists of a single web worker and one or more web roles. Each system is using ASP.NET MVC 3 and Entity Framework, Repository Pattern and StructureMap.

The worker role is responsible for housekeeping and runs two groups of processes. One group is run every ten seconds, the other every second. Each process will likely run a database query or stored procedure. These are scheduled with Quartz.net

The web role serves the public interface and back office. Among other basic crud functionality, both of these provide screens which, when open, will repeatedly call controller methods which will result in execution of stored procedure read-only queries. The frequency of repetition is about 2-3 seconds per client. A typical use case would be 5 back office windows open, and 25 end user windows open – all hitting the system repeatedly.

For a long time we have been experiencing intermittent SQL timeout errors. Three of the most common ones are:

System.Data.SqlClient.SqlException: A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)

System.Data.SqlClient.SqlException: A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.)

System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

The only predictable scenario is during an auction where a specific controller -> sproc starts to timeout during the event (presumably due to load). All other times the errors appear to be completely random and come in singles, two’s, and three’s etc. even during periods of user inactivity. For example the system will go 18 hours without an error and then could be 5 – 10 errors from different housekeeping methods, or perhaps a user logged on and viewed their account.

Other info:

I have tried to run the affected queries/sprocs on SQL Azure using both local SSMS and Azure web-based query tool – all seem to execute quickly, 1 second max. Query plans not showing anything too suspicious although I am by no means a SQL query performance expert, or any other kind of expert for that matter J

We have wrapped all affected areas in Azure SQL Transient Fault Handling Blocks – but as is discussed here http://social.msdn.microsoft.com/Forums/en-US/ssdsgetstarted/thread/7a50985d-92c2-472f-9464-a6591efec4b3, they do not catch timeouts, and according to "Valery M" this is for good reason.

We are not storing any session information in the database, although asp.net membership information is stored in the database.

We use 1 "SQL Azure server instance" which hosts all 5 databases, two for staging and three for production. All 5 systems are generally active at the same time although it is unlikely that more than one will be in a state of live load use at any given time. All web roles, worker roles and the SQL Azure server reside in the same Azure Geographical Region.

Any thoughts on where we should be looking? Would it help giving each system it's own SQL Azure server? ... Failing a solution by ourselves - is it possible to get Microsoft to open a support ticket and take a look under the hood at what’s going on in with our application – how does one go about this?

Thanks in advance.

Ilan

解决方案

SQL Azure is a multitenant system and you could be suffering from potential over usage from other tenants. Microsoft does an OK job by keeping other tenants throttled, but once in a while SQL Azure queries do time out..

To open support with Microsoft visit this page: https://support.microsoft.com/oas/default.aspx?gprid=14919&st=1&wfxredirect=1&sd=gn

这篇关于SQL Azure:更多的间歇性超时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆