JOIN或2个查询-1个大表,1个小表,硬件有限 [英] JOIN or 2 queries - 1 large table, 1 small, hardware limited

查看:94
本文介绍了JOIN或2个查询-1个大表,1个小表,硬件有限的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个页面,其中有一个<select>菜单,其中包含一个小表(229行)中的所有值,例如<option value='KEY'>VALUE</option>.

I have a page in which there is a <select> menu, which contains all of the values from a small table (229 rows), such that <option value='KEY'>VALUE</option>.

此选择菜单是针对在大型表(350万行)上运行的查询的过滤器. 大表中有一个外键,它引用小表中的KEY.

This select menu is a filter for a query which runs on a large table (3.5M rows). In the large table is a foreign key which references KEY from small table.

但是,在大表查询的结果中,我还需要显示小表中的相对VALUE.

However, in the results of the large table query, I also need to display the relative VALUE from the small table.

我可以很容易地执行INNER JOIN来检索结果,或者我可以对较小的表进行单独的'pre'查询,将其值提取到数组中,然后让应用程序获取VALUE从小表结果中.

I could quite easily do an INNER JOIN to retrieve the results, OR I could do a separate 'pre'-query to my smaller table, fetch it's values into an array, and then let the application get the VALUE from small table results.

该应用程序是用PHP编写的.

The application is written in PHP.

硬件资源是一个问题(老板限制,目前无法升级到更高的实例)-我正在Amazon Web Services实例的t2.micro RDS上运行它. 我在WHERE&中的列上添加了单索引和覆盖索引. HAVING子句,并且我的服务器报告我有46mb RAM可用.

Hardware resources IS an issue (cannot upgrade to higher instance right now, boss constrained) - I am running this on a t2.micro RDS on Amazon Web Services instance. I have added both single and covering indexes on columns in WHERE & HAVING clauses, and my server is reporting that I have 46mb RAM available.

鉴于上述情况,我知道JOIN可能会很昂贵,尤其是在大桌子上.在这里进行2个查询,让应用程序处理一些工作,直到我可以协商更好的资源,这才有意义吗?

Given the above, I know that JOIN can be expensive especially on big tables. Does it just make sense here to do 2 queries, and let the application handle some of the work, until I can negotiate better resources?

不加入:6.9秒

SELECT nationality_id, COUNT(DISTINCT(txn_id)) as numtrans,
        SUM(sales) as sales, SUM(units) as units, YrQtr
FROM 1_txns
 GROUP BY nationality_id;

EXPLAIN
'1', 'SIMPLE', '1_txns', 'index', 'covering,nat', 'nat', '5', NULL, '3141206', NULL

有加入:59.03秒

SELECT 4_nationality.nationality, COUNT(DISTINCT(txn_id)) as numtrans,
        SUM(sales) as sales, SUM(units) as units, YrQtr
FROM 1_txns INNER JOIN 4_nationality USING (nationality_id)
 GROUP BY nationality_id
 HAVING YrQtr LIKE :period;
EXPLAIN
'1', 'SIMPLE', '4_nationality', 'ALL', 'PRIMARY', NULL, NULL, NULL, '229', 'Using temporary; Using filesort'
'1', 'SIMPLE', '1_txns', 'ref', 'covering,nat', 'nat', '5', 'reports.4_nationality.nationality_id', '7932', NULL

架构为

Table 1_txns (txn_id, nationality_id, yrqtr, sales, units)
Table 4_nationality (nationality_id, nationality)

我在每个nationality_id,txn_id,yrqtr上都有单独的索引.在我的大型交易表中.还有我的小桌子上的主键索引.

I have separate indexes on each nationality_id, txn_id, yrqtr. in my Large Transactions Table. And just a primary key index on my small table.

奇怪的是,没有联接的查询在结果中丢失了一行!

Something strange also, is that the query WITHOUT the join, is missing a row from it's results!

推荐答案

如果您的查找菜单"列表表仅是所述的229行,并且具有唯一键,并且菜单表的索引为(键,值),则该连接可以忽略不计...尤其是如果您仅基于单个键查询结果的话.

If your lookup "menu" list table is only the 229 rows as stated, and it has a unique key, and your menu table has index on (key, value), the join would be negligible... especially if your only querying the results based on a single key anyhow.

对我来说,更大的问题是在您拥有350万条记录的表上.在229个菜单"项上,每次平均返回超过15,000条记录.而且我敢肯定,并非每个类别都均衡地平衡了……有些类别可能有数百或数千个条目,另一些类别可能有30k +条目.还有其他一些条件可以允许返回较小的子集吗?显然没有足够的信息来量化.

The bigger question to me would be on your table of 3.5 million records. At 229 "menu" items, it would be returning an average of over 15k records each time. And I am sure that not every category is evenly balanced... some could have a few hundred or thousand entries, others could have 30k+ entries. Is there some other criteria that would allow smaller subsets to be returned? Obviously not enough info to quantify.

现在,在输入内容后看到您修改过的帖子后,我看到您正在尝试获取汇总.否则该表将固定用于历史数据.我建议在每个国籍/年鉴的基础上做一个汇总表.这样,您可以直接查询该时间段是否早于所讨论的当前时间段.如果是当前时期,则来自生产的总计.同样,由于交易历史上不会发生变化,因此交易计数也不会更改,您可以从汇总表中立即得到响应.

Now, after seeing your revised post while entering this, I see you are trying to get aggregations. The table would otherwise be fixed for historical data. I would suggest a summary table be done on a per Nationality/YrQtr basis. This way, you can query that directly if the period is PRIOR to the current period in question. If current period, then sum aggregates from production. Again, since transactions wont change historically, neither would their counts and you would have immediate response from the pre-summary table.

反馈

关于如何/何时实施汇总表.我将使用所需的相应列来创建表格...国籍,期间(年/月)以及不同交易的相应计数等.

As for how / when to implement a summary table. I would create the table with the respective columns you need... Nationality, Period (Yr/Month), and respective counts for distinct transactions, etc.

然后,我将针对所有现有数据(直至TO)(但不包括当前期间(年/月))进行一次预汇总.现在,您已经建立了总结的基线.

I would then pre-aggregate once for all your existing data for everything UP TO but not including the current period (Yr/Month). Now you have your baseline established in summary.

然后,在插入时向您的事务表添加一个触发器.然后,执行类似...的操作(并且请注意,这不是实际的触发器,但要做的事情却如此)

Then, add a trigger to your transaction table on insert. Then, process something like... (AND NOTE, THIS IS NOT ACTUAL TRIGGER, BUT CONTEXT OF WHAT TO DO)

update summaryTable
   set numTrans = numTrans + 1,
       TotSales = TotSales + NEWENTRY.Sales,
       TotUnits = TotUnits + NEWENTRY.Units
   where
           Nationality = NEWENTRY.Nationality
       AND YrQtr = NEWENTRY.YrQtr

if # records affected by the update = 0
   Insert into SummaryTable 
      ( Nationality, 
        YrQtr, 
        NumTrans, 
        TotSales, 
        TotUnits )
     values
     (  NEWENTRY.Nationality,
        NEWENTRY.YrQtr,
        1,
        NEWENTRY.Sales,
        NEWENTRY.Units )

现在,将每条记录插入到事务表中之后,汇总表中的汇总将始终保持同步.您可以始终查询此摘要表,而不是完整的事务表.如果您从没有针对给定国籍/年鉴的活动,那么将不存在任何记录.

Now, your aggregates will ALWAYS be in synch in the summary table after EVERY record inserted into the transaction table. You can ALWAYS query this summary table instead of the full transaction table. If you never have activity for a given Nationality / YrQtr, no record will exist.

这篇关于JOIN或2个查询-1个大表,1个小表,硬件有限的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆