是否可以找出在MySQL/MariaDB中(不是)明确查询了哪些列? [英] Is it possible to find out which columns are (not) explicitly queried in MySQL/MariaDB?

查看:74
本文介绍了是否可以找出在MySQL/MariaDB中(不是)明确查询了哪些列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个非常大,很旧的桌子,上面有几百列.其中一些列是历史列,在任何编写客户端中均未使用.它们大多是空的(期望在很旧的记录中).我想清理数据库并摆脱某些表中未使用的旧列.

We have a very large, very old table with a few hundred columns. Some of the columns are historical and aren't used in any writing client. They are mostly empty (expect in very old records). I want to clean up the DB and get rid of old, unused columns in certain tables.

问题是所有访问此数据库的第三方客户端(只读).我不能指望所有提供商都会更新他们的客户.只要他们要查询SELECT * ...,就没有关系.但我希望它们显式查询(SELECT colA, colB, ...).从表中删除colA显然会导致客户端错误.

The problem are all the third party clients that access this DB (reading only). I can't expect all providers to update their clients. As long as they are querying for SELECT * ..., it doesn't matter. But I expect them to query explicitly (SELECT colA, colB, ...). Removing colA from the table would result in errors on the client side, obviously.

现在,我想知道任何查询语句都明确使用了哪些列,因此我可以删除未使用的列.我想我可以使用查询日志,对其进行分析并找到明确使用的列,但是:

Now I would like to know which columns are explicitly used by any query statements, so I can remove the unused ones. I guess I could use the query log, analyze it and find explicitly used columns, but:

  1. 我们每小时收到数百万个查询.
  2. 有些客户可能每周一次访问我们的数据库,甚至每秒一次.

这意味着查询日志必须在生产环境中运行数月,并且我不知道这是否/将对服务器或整体性能产生负面影响.

That means the query log would have to run for months in a production environment and I don't know if that could/would have any negative impact on the servers or overall performance.

还有其他更可靠的解决方案吗?我对查询日志的担心是否夸大了?我希望MariaDB/MySQL将统计数据存储在某个位置,以显示列的使用情况,但找不到我需要的东西.

Is there any other, more solid solution? Are my concerns regarding the query log exaggerated? I was hoping that MariaDB/MySQL are storing statistical data somewhere, showing the usage of columns, but I couldn't find anything I need.

推荐答案

查询中没有提及哪些列.

There no log of what columns are mentioned in queries.

常规日志"将每个查询复制到一个文件中.这可能是一个严重的磁盘浪费(空间和速度),尤其是在每小时数百万个查询"的情况下.但这会尝试得到答案...

The "general log" copies every query to a file. This can be a serious disk hog (space and speed), especially with "millions of queries per hour". But it would have an attempt at the answer...

我认为,一般日志可以通过pt-query-digest进行汇总.

The general log can, I think, be summarized via pt-query-digest.

另一种可能性是将tcpdumppt-query-digest结合使用以获取所有查询.

Another possibility is to use tcpdump together with pt-query-digest to grab all the queries.

摘要的优点是它将多个相似"查询合并到一个条目中.您仍然必须手动(或有问题地)在输出中进行筛选.

The advantage of the digest is that it consolidates multiple 'similar' queries into a single entry. You would still have to manually (or problematically) sift through the output.

大多数列都不值得删除,即使它们从未使用过.我建议重点关注最大的10%.可能有一种方法可以使用tcpdump | egrep仅查找那些列名.对此进行几次优化,您可能会发现一些要删除的主要候选对象.

Most columns are not worth removing, even if they are never used. I would suggest focusing on the 10% that are the bulkiest. There might be a way to use tcpdump | egrep to look for just those column names. Refine that a few times, and you might discover some prime candidate(s) for removal.

不幸的是,如果客户端执行SELECT *,则将使用所有"列.

Unfortunately, if the clients do SELECT *, then "all" columns are being used.

这篇关于是否可以找出在MySQL/MariaDB中(不是)明确查询了哪些列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆