数据库设计和数据检索从一张沉重的表 [英] DB Design and Data Retrieval from a heavy table

查看:169
本文介绍了数据库设计和数据检索从一张沉重的表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在我的数据库表中有612列。根据数据类型的列数为:



BigInt - 150(PositionCol1,PositionCol2 ............ PositionCol150)



Int - 5



SmallInt - 5



日期 - 150(SourceDateCol1,SourceDate2,......... ... .SourceDate150)



DateTime - 2



Varchar(2000) - 150(FormulaCol1,FormulaCol2 ......... ......... FormulaCol150)



位 - 150(IsActive1,IsActive2,..................IsActive150)



当用户首次导入时,数据存储在PositionCol1,SourceDateCol1,FormulaCol1,IsActiveCol1等(其他datetime,Int,Smallint列)中。



当用户第二次导入时,数据存储在PositionCol2,SourceDateCol2,FormulaCol2,IsActiveCol2等(其他datetime,Int,Smallint列)...等等。



表中正在导入数据中的一个ProjectID列。



在开始导入过程之前,用户将excel列名与数据库映射列名称(PositionCol1,SourceDateCol1,FormulaCol1,IsActiveCol1),此映射获取存储在单独的表中;这样当检索到的数据可以显示在这些映射列名称而不是DB列名称下。例如



PositionCol1可能映射到SAPDATA



SourceDateCol1可能映射到SAPDATE



FormulaCol1可能映射到SAPFORMULA



IsActiveCol1可能映射到SAPISACTIVE



每天会在这个表中添加40,000行,我的问题是,SQL能够长期处理大量数据的负载吗?



的时代,一行将有大约200-300列的数据;在最坏的情况下,它将在所有612列中都有数据。考虑到这一点,我应该在设计中进行一些改变,以避免将来的任何性能问题?如果是这样,请建议可以做什么?



如果我坚持我目前的设计,除了Indexing之外,我应该注意什么要具有最佳性能从这个巨大的表格中检索数据?



如果我需要检索特定实体的数据,例如SAPDATA,我必须去我的映射表,在这种情况下获取数据库列名称对SAPDATA即PositionCol1;并检索它。但是,我必须编写动态查询。有没有其他更好的方法?

解决方案

不要坚持你目前的设计。你的重复团体是不自信和自我限制的吗?有人上传151次会发生什么?规范化此表,以便您每行具有每种类型之一,而不是150.您不需要按照这种方式进行映射,因为您可以从positioncol选择SAPDATA,而不用担心是否为1-150。



您可能需要PROJECTS表与ID,PROJECT_UPLOADS表与ID和FK PROJECTS表。这个表将会给你上面的用例的Position,SourceDate,Formula和IsActive。



然后你可以做这样的事情,例如
$ b $选择p.name,pu.position from PROJECTS p inner join PROJECT_UPLOADS pu on pu.projectid = p.id WHERE pu.position ='SAPDATA'



等等。


I have a requirement to have 612 columns in my database table. The # of columns as per data type are:

BigInt – 150 (PositionCol1, PositionCol2…………PositionCol150)

Int - 5

SmallInt – 5

Date – 150 (SourceDateCol1, SourceDate2,………….SourceDate150)

DateTime – 2

Varchar(2000) – 150 (FormulaCol1, FormulaCol2………………FormulaCol150)

Bit – 150 (IsActive1, IsActive2,……………….IsActive150)

When user does the import for first time the data gets stored in PositionCol1, SourceDateCol1, FormulaCol1, IsActiveCol1, etc. (other datetime, Int, Smallint columns).

When user does the import for second time the data gets stored in PositionCol2, SourceDateCol2, FormulaCol2, IsActiveCol2, etc. (other datetime, Int, Smallint columns)….. so and so on.

There is a ProjectID column in the table for which data is being imported.

Before starting the import process, user maps the excel column names with the database column names (PositionCol1, SourceDateCol1, FormulaCol1, IsActiveCol1) and this mapping get stored in a separate table; so that when retrieved data can be shown under these mapping column names instead of DB column names. E.g.

PositionCol1 may be mapped to SAPDATA

SourceDateCol1 may be mapped to SAPDATE

FormulaCol1 may be mapped to SAPFORMULA

IsActiveCol1 may be mapped to SAPISACTIVE

40,000 rows will be added in this table every day, my questions is that will the SQL be able to handle the load of that much of data in the long run?

Most of the times, a row will have data in about 200-300 columns; in the worst case it’ll have data in all of the 612 columns. Keeping in view this point, shall I make some changes in the design to avoid any future performance issues? If so, please suggest what could be done?

If I stick to my current design, what points I should take care of, apart from Indexing, to have optimal performance while retrieving the data from this huge table?

If I need to retrieve data of a particular entity e.g. SAPDATA, I’ll have to go to my mapping table, get the database column name against SAPDATA i.e. PositionCol1 in this case; and retrieve it. But, in that way, I’ll have to write dynamic queries. Is there any other better way?

解决方案

Don't stick with your current design. Your repeating groups are unweildy and self limiting... What happens when somebody uploads 151 times? Normalise this table so that you have one of each type per row rather than 150. You won't need mapping this way as you can select SAPDATA from the positioncol without worring if it is 1-150.

You probably want a PROJECTS table with an ID, a PROJECT_UPLOADS table with an ID and an FK to the PROJECTS table. This table would have Position, SourceDate, Formula and IsActive given your use-case above.

Then you could do things like

select p.name, pu.position from PROJECTS p inner join PROJECT_UPLOADS pu on pu.projectid = p.id WHERE pu.position = 'SAPDATA'

etc.

这篇关于数据库设计和数据检索从一张沉重的表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆