使用mysql处理非常大的数据 [英] Handling very large data with mysql

查看:171
本文介绍了使用mysql处理非常大的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

很抱歉很长的帖子!



我有一个包含~30个表(InnoDB引擎)的数据库。这些表中只有两个,即事务和移位非常大(第一个有150万行,移位有23k行)。现在一切正常,我没有当前数据库大小的问题。



但是,我们将拥有一个类似的数据库(相同的数据类型,设计,...)但更大,例如,事务表将具有大约 1亿条记录(每天大约2,300万笔交易)我们正在考虑如何在MySQL中处理如此大量的数据? (这是读写密集型)。我阅读了很多相关帖子,看看Mysql(更具体地说是InnoDB引擎)是否可以在数十亿条记录中表现良好,但我仍有一些问题。我读过的一些相关帖子如下:





到目前为止我已经理解为了提高非常大的表的性能:


  1. (对于innoDB表,这是我的case)增加 innodb_buffer_pool_size (例如,高达80%的RAM)。
    另外,我发现了一些其他MySQL性能调整设置这里是
    percona博客

  2. 在桌面上有适当的索引(在查询时使用EXPLAN)

  3. 分区表

  4. MySQL分片或群集

以下是我的问题/混淆:




  • 关于分区,我怀疑是否应该使用它。一方面,许多人建议在桌子非常大时提高性能。另一方面,我读过许多帖子,说它不会提高查询性能,也不会使查询运行得更快(例如,此处这里)。另外,我读了 MySQL参考手册 InnoDB外键和MySQL分区不兼容(我们有外键)。


  • 关于索引,现在它们表现不错,但据我所知,对于非常大的表索引更具限制性(正如Kevin Bedell在他的回答中提到的那样) 此处)。此外,索引加速读取,同时减慢写入(插入/更新)。那么,对于我们将拥有这个大型数据库的新类似项目,我们应该首先插入/加载所有数据然后创建索引吗? (加快插入速度)


  • 如果我们不能对我们的大表(事务表)使用分区,那么提高性能的另一种选择是什么? (除了MySQl变量设置,例如 innodb_buffer_pool_size )。我们应该使用Mysql集群吗? (我们也有很多连接)




编辑



这是我们最大的名为transaction的表的 show create table 语句:

  CREATE TABLE`transaction`(
`id`int(11)NOT NULL AUTO_INCREMENT,
`mino_transaction_id` int(11)NOT NULL,
`fuse_terminal_id`int(11) NOT NULL,
`fuse_terminal_serial` int(11)NOT NULL,
`xboard_id`int(11)NOT NULL,
`gas_station_id`int(11)NOT NULL,
` operator_id` text NOT NULL,
`shift_id`int(11)NOT NULL,
`xboard_total_counter`int(11)NOT NULL,
`fuke_type`int(11)NOT NULL,
`start_fuel_time` int(11)NOT NULL,
`end_fuel_time` int(11)DEFAULT NULL,
`preset_amount`int(11)NOT NULL,
`actual_amount`int(11) )DEFAULT NULL,
`fuel_cost`int(11)DEFAULT NULL,
`hayment_cost` int(11)DEFAULT NULL,
`buasure_type`int(11)NOT NULL,
`payment_ref_id`文本,
`unit_fuel_pri ce`int(11)NOT NULL,
`fuse_status_id` int(11)DEFAULT NULL,
`fura_mode_id`int(11)NOT NULL,
`hayment_result` int(11)NOT NULL ,
`card_pan`文本,
`state`int(11)DEFAULT NULL,
`累加器`int(11)NOT NULL DEFAULT'0',
`shift_start_time` int (11)DEFAULT NULL,
PRIMARY KEY(`id`),
UNIQUE KEY`termin_transaction_id`(`terminal_transaction_id`,`fuel_terminal_id`,`start_fuel_time`)使用BTREE,
KEY`start_fuel_time_idx `(`start_fuel_time`),
KEY`fuse_terminal_idx`(`fuel_terminal_id`),
KEY`xboard_idx`(`xboard_id`),
KEY`gas_station_id`(`gas_station_id`)使用BTREE ,
KEY`buding_type`(`purchase_type`)使用BTREE,
KEY`shift_start_time`(`shift_start_time`)使用BTREE,
KEY`fuin_type`(`fuel_type`)使用BTREE
)ENGINE = InnoDB AUTO_INCREMENT = 1665335 DEFAULT CHARSET = utf8 ROW_FORMAT = COMPACT

感谢您的时间,

解决 - MySQL可以处理数十亿行。 合理取决于查询;让我们看看它们。


  • InnoDB(MySQL 5.5.8)是数十亿行的正确选择吗? - 5.7有一些改进,但5.5相当不错,尽管已接近6岁,并且即将不再受支持。


  • 数十亿行的最佳数据存储 - 如果你的意思是'引擎',那么InnoDB。


  • 在性能开始降低之前,MySQL数据库有多大的优势 - - 同样,这取决于查询。我可以告诉你一个会崩溃的1K行表;我曾经使用过数十亿行的表格。


  • 为什么MySQL可能会因大表而变慢? - 范围扫描导致I / O,这是缓慢的部分。


  • Mysql可以处理将容纳约3亿条记录的表吗? - 再次,是的。限制大约是一万亿行。


  • (对于innoDB表,这是我的情况)增加innodb_buffer_pool_size(例如,高达80%的RAM)。此外,我在percona博客中找到了一些其他MySQL性能调整设置 - 是


  • 在表上有适当的索引(在查询中使用EXPLAN) - 以及,让我们看看他们。在 critical 区域可以犯很多错误。


  • 对表进行分区 - 分区不是灵丹妙药!我在 我的博客 中竖起大拇指


  • MySQL Sharding - 目前这是DIY


  • MySQL集群 - 目前最好的答案是一些基于Galera的选项(PXC,MariaDB 10,DIY与甲骨文)


  • 分区不支持 FOREIGN KEY 或global UNIQUE


  • UUID,您所谈论的规模,不仅会减慢系统速度,还会实际杀死它。 Type 1 UUID 可能是一种解决方法。


  • 插入和索引构建速度 - 提供单一答案的变体太多了。让我们看一下你的暂定 CREATE TABLE 以及你打算如何提供数据。


  • 很多联接 - 正常化,但不要过度正常化。特别是,不要标准化日期时间或浮点数或其他连续值。


  • 建立 汇总表


  • 每天2,300万笔交易 - 如果是2.3M 插入(30 /秒),则没有太大的性能问题。如果更复杂,那么RAID,SSD,批处理等可能是必要的。


  • 处理如此大量的数据 - 如果大多数活动都在最近的行,然后buffer_pool将很好地缓存活动,从而避免I / O.如果活动是随机,那么MySQL(或任何人其他)将有I / O问题。



  • Sorry for the long post!

    I have a database containing ~30 tables (InnoDB engine). Only two of these tables, namely, "transaction" and "shift" are quite large (the first one have 1.5 million rows and shift has 23k rows). Now everything works fine and I don't have problem with the current database size.

    However, we will have a similar database (same datatypes, design ,..) but much larger, e.g., the "transaction" table will have about 1 billion records (about 2,3 million transaction per day) and we are thinking about how we should deal with such volume of data in MySQL? (it is both read and write intensive). I read a lot of related posts to see if Mysql (and more specifically InnoDB engine) can perform well with billions of records, but still I have some questions. Some of those related posts that I've read are in the following:

    What I've understood so far to improve the performance for very large tables:

    1. (for innoDB tables which is my case) increasing the innodb_buffer_pool_size (e.g., up to 80% of RAM). Also, I found some other MySQL performance tunning settings here in percona blog
    2. having proper indexes on the table (using EXPLAN on queries)
    3. partitioning the table
    4. MySQL Sharding or clustering

    Here are my questions/confusions:

    • About partitioning, I have some doubts whether we should use it or not. On one hand many people suggested it to improve performance when table is very large. On the other hand, I've read many posts saying it does not improve query performance and it does not make queries run faster (e.g., here and here). Also, I read in MySQL Reference Manual that InnoDB foreign keys and MySQL partitioning are not compatible (we have foreign keys).

    • Regarding indexes, right now they perform well, but as far as I understood, for very large tables indexing is more restrictive (as Kevin Bedell mentioned in his answer here). Also, indexes speed up reads while slow down write (insert/update). So, for the new similar project that we will have this large DB, should we first insert/load all the data and then create indexes? (to speed up the insert)

    • If we cannot use partitioning for our big table ("transaction" table), what is an alternative option to improve the performance? (except MySQl variable settings such as innodb_buffer_pool_size). Should we use Mysql clusters? (we have also lots of joins)

    EDIT

    This is the show create table statement for our largest table named "transaction":

      CREATE TABLE `transaction` (
     `id` int(11) NOT NULL AUTO_INCREMENT,
     `terminal_transaction_id` int(11) NOT NULL,
     `fuel_terminal_id` int(11) NOT NULL,
     `fuel_terminal_serial` int(11) NOT NULL,
     `xboard_id` int(11) NOT NULL,
     `gas_station_id` int(11) NOT NULL,
     `operator_id` text NOT NULL,
     `shift_id` int(11) NOT NULL,
     `xboard_total_counter` int(11) NOT NULL,
     `fuel_type` int(11) NOT NULL,
     `start_fuel_time` int(11) NOT NULL,
     `end_fuel_time` int(11) DEFAULT NULL,
     `preset_amount` int(11) NOT NULL,
     `actual_amount` int(11) DEFAULT NULL,
     `fuel_cost` int(11) DEFAULT NULL,
     `payment_cost` int(11) DEFAULT NULL,
     `purchase_type` int(11) NOT NULL,
     `payment_ref_id` text,
     `unit_fuel_price` int(11) NOT NULL,
     `fuel_status_id` int(11) DEFAULT NULL,
     `fuel_mode_id` int(11) NOT NULL,
     `payment_result` int(11) NOT NULL,
     `card_pan` text,
     `state` int(11) DEFAULT NULL,
     `totalizer` int(11) NOT NULL DEFAULT '0',
     `shift_start_time` int(11) DEFAULT NULL,
     PRIMARY KEY (`id`),
     UNIQUE KEY `terminal_transaction_id` (`terminal_transaction_id`,`fuel_terminal_id`,`start_fuel_time`) USING BTREE,
     KEY `start_fuel_time_idx` (`start_fuel_time`),
     KEY `fuel_terminal_idx` (`fuel_terminal_id`),
     KEY `xboard_idx` (`xboard_id`),
     KEY `gas_station_id` (`gas_station_id`) USING BTREE,
     KEY `purchase_type` (`purchase_type`) USING BTREE,
     KEY `shift_start_time` (`shift_start_time`) USING BTREE,
     KEY `fuel_type` (`fuel_type`) USING BTREE
    ) ENGINE=InnoDB AUTO_INCREMENT=1665335 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT
    

    Thanks for your time,

    解决方案

    • Can MySQL reasonably perform queries on billions of rows? -- MySQL can 'handle' billions of rows. "Reasonably" depends on the queries; let's see them.

    • Is InnoDB (MySQL 5.5.8) the right choice for multi-billion rows? -- 5.7 has some improvements, but 5.5 is pretty good, in spite of being nearly 6 years old, and on the verge of no longer being supported.

    • Best data store for billions of rows -- If you mean 'Engine', then InnoDB.

    • How big can a MySQL database get before performance starts to degrade -- Again, that depends on the queries. I can show you a 1K row table that will meltdown; I have worked with billion-row tables that hum along.

    • Why MySQL could be slow with large tables? -- range scans lead to I/O, which is the slow part.

    • Can Mysql handle tables which will hold about 300 million records? -- again, yes. The limit is somewhere around a trillion rows.

    • (for innoDB tables which is my case) increasing the innodb_buffer_pool_size (e.g., up to 80% of RAM). Also, I found some other MySQL performance tunning settings here in percona blog -- yes

    • having proper indexes on the table (using EXPLAN on queries) -- well, let's see them. There are lot of mistakes that can be made in this critical area.

    • partitioning the table -- "Partitioning is not a panacea!" I harp on that in my blog

    • MySQL Sharding -- Currently this is DIY

    • MySQL clustering -- Currently the best answer is some Galera-based option (PXC, MariaDB 10, DIY w/Oracle)

    • Partitioning does not support FOREIGN KEY or "global" UNIQUE.

    • UUIDs, at the scale you are talking about, will not just slow down the system, but actually kill it. Type 1 UUIDs may be a workaround.

    • Insert and index-build speed -- There are too many variations to give a single answer. Let's see your tentative CREATE TABLE and how you intend to feed the data in.

    • Lots of joins -- "Normalize, but don't over-normalize." In particular, do not normalize datetimes or floats or other "continuous" values.

    • Do build summary tables

    • 2,3 million transaction per day -- If that is 2.3M inserts (30/sec), then there is not much of a performance problem. If more complex, then RAID, SSD, batching, etc, may be necessary.

    • deal with such volume of data -- If most activity is with the "recent" rows, then the buffer_pool will nicely 'cache' the activity, thereby avoiding I/O. If the activity is "random", then MySQL (or anyone else) will have I/O issues.

    这篇关于使用mysql处理非常大的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆