如何提高此数据分析的速度? [英] How do I increase the speed of this data analysis?

查看：45 发布时间：2020/8/11 6:01:37 php mysql performance

本文介绍了如何提高此数据分析的速度?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要优化如何分析相当大的数据集，但是我不确定下一步是什么.我已经做了相当多的MySQL配置调整.

I need to optimize how I analyze a rather large data set, and I'm unsure of what the next steps are. I have done a fair bit of MySQL configuration tuning.

我有这个InnoDB表:

I have this InnoDB table:

+----------------+--------------+------+-----+---------+----------------+
| Field          | Type         | Null | Key | Default | Extra          |
+----------------+--------------+------+-----+---------+----------------+
| id             | int(250)     | NO   | PRI | NULL    | auto_increment |
| memory         | int(15)      | YES  | MUL | NULL    |                |
| q              | varchar(250) | YES  | MUL | NULL    |                |
| created        | datetime     | YES  |     | NULL    |                |
| modified       | datetime     | YES  |     | NULL    |                |
| dt             | datetime     | YES  | MUL | NULL    |                |
| site_id        | int(250)     | NO   | MUL | NULL    |                |
| execution_time | int(11)      | YES  | MUL | NULL    |                |
+----------------+--------------+------+-----+---------+----------------+

这里是10行的示例:

+-----------+----------+-----------------+---------------------+---------------------+---------------------+---------+----------------+
| id        | memory   | q               | created             | modified            | dt                  | site_id | execution_time |
+-----------+----------+-----------------+---------------------+---------------------+---------------------+---------+----------------+
| 266864867 | 38011080 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:04:44 |     890 |           1534 |
| 266864868 | 46090184 | node/16432      | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:04:46 |     890 |            840 |
| 266864869 | 50329248 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:05:16 |     890 |           2500 |
| 266864870 | 38011272 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:01 |     890 |           1494 |
| 266864871 | 46087732 | node/16432      | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:03 |     890 |            850 |
| 266864872 | 30304428 | node/303        | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:12 |     890 |            113 |
| 266864873 | 50329412 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:25 |     890 |           2465 |
| 266864874 | 28253112 | front_page      | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:25 |     890 |             86 |
| 266864875 | 28256044 | front_page      | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:08:32 |     890 |             81 |
| 266864876 | 38021072 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:08:55 |     890 |           1458 |
+-----------+----------+-----------------+---------------------+---------------------+---------------------+---------+----------------+

这是表索引:

+----------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+
| Table    | Non_unique | Key_name             | Seq_in_index | Column_name    | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+
| memories |          0 | PRIMARY              |            1 | id             | A         |     8473766 |     NULL | NULL   |      | BTREE      |         |
| memories |          1 | index_dt             |            1 | dt             | A         |     1210538 |     NULL | NULL   | YES  | BTREE      |         |
| memories |          1 | index_execution_time |            1 | execution_time | A         |        2344 |     NULL | NULL   | YES  | BTREE      |         |
| memories |          1 | index_memory         |            1 | memory         | A         |     8473766 |     NULL | NULL   | YES  | BTREE      |         |
| memories |          1 | index_site_id        |            1 | site_id        | A         |          16 |     NULL | NULL   |      | BTREE      |         |
| memories |          1 | index_q              |            1 | q              | A         |      338950 |     NULL | NULL   | YES  | BTREE      |         |
+----------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+

它为许多不同的站点(site_id)存储了超过一百万条记录.对于给定的站点，可能有20,000行.存储的信息是各个页面请求的性能指标.如果重要的话，请使用非显而易见的字段:内存"字段是脚本使用的内存量，"q"是路径，"site_id"是对表站点"的引用.

It stores over a million records for many different sites (site_id). For a given site, there might be 20,000 rows. The information stored is performance metrics for individual page requests. If it matters, non-obvious fields: The memory field is how much memory the script used, q is the path, site_id is a reference to a table Sites.

我对此数据运行了两个慢查询.第一个获得25个记忆最多的页面:

I have two slow queries that I run on this data. The first gets the 25 most most memory-hog pages:

Select 
  Memory.q, count(*) as count, 
  AVG(Memory.memory) as average_memory, 
  MAX(Memory.memory) as peak_memory,
  AVG(Memory.execution_time) as average_execution_time,
  MAX(Memory.execution_time) as peak_execution_time 
FROM Memory 
WHERE site_id = $some_site_id 
ORDER BY average_memory DESC 
GROUP BY Memory.q
LIMIT 25

第二个查询获取给定网站的平均最慢25页:

The second query gets the the slowest average 25 pages for a given site:

Select 
  Memory.q, count(*) as count, 
  AVG(Memory.memory) as average_memory, 
  MAX(Memory.memory) as peak_memory,
  AVG(Memory.execution_time) as average_execution_time,
  MAX(Memory.execution_time) as peak_execution_time 
FROM Memory 
WHERE site_id = $some_site_id 
ORDER BY average_execution_time DESC 
GROUP BY Memory.q
LIMIT 25

我最近将表从MyISAM转换为InnoDB，以便这些读取不会锁定表.这导致更新此表的操作排队和滞后.

I recently converted the table from MyISAM to InnoDB, so that these reads would not lock the table. This was causing operations that update this table to queue up and lag.

除了要解决这个问题(增加InnoDB缓存大小)外，我还想看看是否还有其他选择.我从来没有使用过NoSQL数据库，但是据我了解，由于我使用聚合函数和查询，因此对它们没有太大帮助.

Beyond throwing more ram at the problem (to increase the InnoDB cache size), I want to see if there are other options. I've never worked with a NoSQL database, but from what I understand they won't be of much help here because I use aggregate functions and queries.

如果需要的话，该应用程序是用PHP编写的.

The app is written in PHP, if it matters.

有什么更好的方法来存储和分析这些数据的想法吗?

Any ideas for a better way to approach the storage and analysis of this data?

更新:

对查询进行概要分析显示，慢度全部在复制到临时表中.我将研究如何使这一步骤更快.

Profiling the query shows the slowness is all in the copying to temp table. I will research how to make this step faster.

+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000030 |
| checking query cache for query | 0.000065 |
| Opening tables                 | 0.000013 |
| System lock                    | 0.000004 |
| Table lock                     | 0.000014 |
| init                           | 0.000032 |
| optimizing                     | 0.000010 |
| statistics                     | 0.008119 |
| preparing                      | 0.000042 |
| Creating tmp table             | 0.000317 |
| executing                      | 0.000005 |
| Copying to tmp table           | 5.349280 |
| Sorting result                 | 0.006511 |
| Sending data                   | 0.000092 |
| end                            | 0.000005 |
| removing tmp table             | 0.001510 |
| end                            | 0.000007 |
| query end                      | 0.000004 |
| freeing items                  | 0.001163 |
| logging slow query             | 0.000006 |
| cleaning up                    | 0.000006 |
+--------------------------------+----------+
21 rows in set (0.01 sec)

mysql> show profile cpu for query 4;
+--------------------------------+----------+----------+------------+
| Status                         | Duration | CPU_user | CPU_system |
+--------------------------------+----------+----------+------------+
| starting                       | 0.000030 | 0.000000 |   0.000000 |
| checking query cache for query | 0.000065 | 0.000000 |   0.000000 |
| Opening tables                 | 0.000013 | 0.000000 |   0.000000 |
| System lock                    | 0.000004 | 0.000000 |   0.000000 |
| Table lock                     | 0.000014 | 0.000000 |   0.000000 |
| init                           | 0.000032 | 0.000000 |   0.000000 |
| optimizing                     | 0.000010 | 0.000000 |   0.000000 |
| statistics                     | 0.008119 | 0.001000 |   0.000000 |
| preparing                      | 0.000042 | 0.000000 |   0.000000 |
| Creating tmp table             | 0.000317 | 0.000000 |   0.000000 |
| executing                      | 0.000005 | 0.000000 |   0.000000 |
| Copying to tmp table           | 5.349280 | 0.687896 |   0.412937 |
| Sorting result                 | 0.006511 | 0.004999 |   0.001999 |
| Sending data                   | 0.000092 | 0.000000 |   0.000000 |
| end                            | 0.000005 | 0.000000 |   0.000000 |
| removing tmp table             | 0.001510 | 0.000000 |   0.001000 |
| end                            | 0.000007 | 0.000000 |   0.000000 |
| query end                      | 0.000004 | 0.000000 |   0.000000 |
| freeing items                  | 0.001163 | 0.000000 |   0.001000 |
| logging slow query             | 0.000006 | 0.000000 |   0.000000 |
| cleaning up                    | 0.000006 | 0.000000 |   0.000000 |
+--------------------------------+----------+----------+------------+

如何提高此数据分析的速度? [英] How do I increase the speed of this data analysis?

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

如何提高此数据分析的速度? [英] How do I increase the speed of this data analysis?

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭