许多表或行,哪一个在SQL中更高效? [英] Many tables or rows, which one is more efficient in SQL?

查看:98
本文介绍了许多表或行,哪一个在SQL中更高效?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在建立一个程式,储存公司的新闻标题及其来自各种来源的时间戳记。

I'm building a program that stores news headlines for companies and its timestamp from various sources.

我们假设公司的数目是1000。 ,Google,Microsoft ..等。

Let's say the number of company is 1000. It goes like Apple, Google, Microsoft.. etc.

所以我可以考虑两个选项。

So I can think about two options.


  1. 一个包含许多行的表(上面的代码只是一个例子)。

  1. One table with numerous rows (above code is just an example).

CREATE TABLE news
(
    news_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    company VARCHAR(10) NOT NULL,
    timestamp TIMESTAMP NOT NULL,
    source TEXT NOT NULL,
    content TEXT NOT NULL,
    ...
)

// I also can make company and timestamp as primary keys,
   and news_id will be unique key.*


  • 1000张表

  • 1000 Tables

    CREATE TABLE news_apple // and news_google, news_microsoft, news_...(x 1000)
    (
        news_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
        timestamp TIMESTAMP NOT NULL,
        source TEXT NOT NULL,
        content TEXT NOT NULL,
        ...
    )
    

    li>

  • 大多数时候,我会找到某个公司的消息。让我们说每个公司有超过10000个新闻。我不知道如果我在第一个选项中使用'WHERE'子句,它会比第二个选项慢。

    Most of the time, I will find the news for the certain company. Let's say there are more than 10000 news for each company. I wonder that if I use a 'WHERE' clause in the first option, it would be slower than the second option.

    哪一个在性能方面更有效,为什么?

    Which one is more efficient in terms of performance and why?

    推荐答案

    关系数据库设计为每个表存储许多行。有很多机制可以方便大型表格,例如:

    Relational databases are designed to store many rows per table. There are a whole bunch of mechanisms to facilitate large tables, such as:


    • 对任何字段组合建立索引以加快搜索速度

    • 页面缓存,因此常用的页面保留在内存中

    • 垂直划分(列式数据库)以进一步加快请求速度

    • 例如散列连接和分组依据(至少在MySQL以外的数据库中)

    • 使用多个处理器和磁盘处理查询

    • Indexes on any combination of fields to speed searches
    • Page caching so commonly used pages remain in memory
    • Vertical partitioning (columnar databases) to further speed requests
    • Advanced algorithms such as hash joins and group bys (at least in databases other than MySQL)
    • Use of multiple processors and disks to process queries

    将数据放在单个表中时,有一件事更困难,那就是安全性。事实上,在某些情况下,这是一个主要的问题,基本上要求数据在单独的表中。

    There is one thing that is more difficult when putting data in a single table, and that is security. And, in fact, in some circumstances this is a primary concern and basically requires that the data go in separate table. Those applications are rare and far between.

    为了举例说明在多个表格中存储数据的方式有多糟糕,想象一下,在你的系统中,每个公司有一条记录,将其存储在表中。此记录存储有关公司的信息 - 类似名称,地址,任何。调用是100个字节的信息。

    To give an example of how bad storing data in multiple tables could be, imagine that in your system you have one record per company and you store it in a table. This record stores information about the company -- something like name, address, whatever. Call is 100 bytes of information.

    在您的模式中,每个公司都有一个单独的表,因此每个表一行。该记录将驻留在一个数据页上。数据页可以是16 KB,因此您浪费了大约15.9 k字节来存储此数据。存储1000个这样的记录占用16兆字节而不是大约7页(112千字节)。这可能是一个重大的性能损失。

    In your schema there is a separate table for each "company", so that is one row per table. That record will reside on one data page. A data page could be 16 kbytes, so you are wasting about 15.9 kbytes to store this data. Storing 1000 such records occupies 16 Mbytes instead of about 7 pages worth (112 Kbytes). That can be a significant performance hit.

    此外,对于多个表,您没有考虑维护所有表和确保数据的正确性的挑战不同的表。维护更新需要应用于数千个表,而不是少数。

    In addition, with multiple tables you are not taking into account the challenges of maintaining all the tables and ensuring the correctness of data in the different tables. Maintenance updates need to be applied to thousands of tables, instead of a handful.

    这篇关于许多表或行,哪一个在SQL中更高效?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆