处理大型数据库 [英] Handling large databases

查看:159
本文介绍了处理大型数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一个web项目(asp.net)工作了大约六个月。最终产品即将上市。该项目使用SQL Server作为数据库。我们使用一些大量的数据进行了性能测试,结果表明,当数据变得太大,例如200万行(超时问题,延迟回复等)时,性能降低。起初,我们使用完全规范化的数据库,但现在我们使其部分归一化,由于性能问题(以减少连接)。首先,这是正确的决定吗?
加上什么是可能的解决方案,当数据大小变得非常大,因为没有。的客户在未来增加?

I have been working in a web project(asp.net) for around six months. The final product is about to go live. The project uses SQL Server as the database. We have done performance testing with some large volumes of data, results show that performance degrades when data becomes too large, say 2 million rows (timeout issues, delayed reponses, etc). At first we were using fully normailized database, but now we made it partially normalized due to performance issues (to reduce joins). First of all, is it the right decision? Plus what are the possible solutions when data size becomes very large, as the no. of clients increase in future?

我想进一步添加:


  • 200万行是实体表

  • 当data + no时,性能下降。的用户增加。

  • 非标准化是在识别出使用频繁的查询后完成的。

  • 我们也使用了大量的xml列和xquery。这可能是原因吗?

  • 有点偏离主题,我的项目中的一些人说,动态sql查询比存储过程方法更快。他们做了某种性能测试来证明他们的观点。我认为相反是真的。一些大量使用的查询是动态创建的,因为大多数其他查询都封装在存储过程中。

  • 2 million rows are entity tables, tables resolving the relations have much larger rows.
  • Performance degrades when data + no. of users increases.
  • Denormalization was done after identifying the heavily used queries.
  • We are also using some heavy amount of xml columns and xquery. Can this be the cause?
  • A bit off the topic, some folks in my project say that dynamic sql query is faster than a stored procedure approach. They have done some kind of performance testing to prove their point. I think the opposite is true. Some of the heavily used queries are dynamicaly created where as most of other queries are encapsulated in stored procedures.

推荐答案

在方案中,几百万行不是特别大的数据库。

In the scheme of things, a few million rows is not a particulary large Database.

假设我们正在谈论一个OLTP数据库,瓶颈的根本原因是非常非常糟糕的主意

Assuming we are talking about an OLTP database, denormalising without first identifying the root cause of your bottlenecks is a very, very bad idea.

您需要做的第一件事是在代表性的时间段内配置您的查询工作负载,以确定大多数工作正在进行(例如,使用SQL Profiler,如果使用SQL Server)。查看查询执行的逻辑读取的数目乘以执行的次数。一旦你确定了前十个最差的执行查询,你需要详细检查查询执行计划。

The first thing you need to do is profile your query workload over a representative time period to identify where most of the work is being done (for instance, using SQL Profiler, if you are using SQL Server). Look at the number of logical reads a query performs multiplied by the number of times executed. Once you have identified the top ten worst performing queries, you need to examine the query execution plans in detail.

我要在这里出去通常是这种情况),但如果您的问题不是这样,我会感到惊讶。

I'm going to go out on a limb here (because it is usually the case), but I would be surprised if your problem is not either


  1. 缺少'昂贵的查询

  2. 配置不当或指定的磁盘子系统下

SO answer 说明如何以查找工作负载中性能最差的查询。

This SO answer describes how to profile to find the worst performing queries in a workload.

这篇关于处理大型数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆