MySql中的大表和分析 [英] Big tables and analysis in MySql

查看:121
本文介绍了MySql中的大表和分析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于我的创业公司,我自己跟踪所有内容,而不是依靠Google Analytics(分析).很好,因为我实际上可以拥有ips和用户ID以及所有内容.

For my startup, I track everything myself rather than rely on google analytics. This is nice because I can actually have ips and user ids and everything.

这很好,直到我的跟踪表增加了大约200万行.该表称为acts,并记录:

This worked well until my tracking table rose about 2 million rows. The table is called acts, and records:

  • ip
  • 网址
  • 注意
  • account_id

...(如果有).

现在,尝试做这样的事情:

Now, trying to do something like this:

SELECT COUNT(distinct ip) 
  FROM acts
  JOIN users ON(users.ip = acts.ip) 
 WHERE acts.url LIKE '%some_marketing_page%';

基本上永远不会结束.我切换到此:

Basically never finishes. I switched to this:

SELECT COUNT(distinct ip) 
  FROM acts
  JOIN users ON(users.ip = acts.ip) 
 WHERE acts.note = 'some_marketing_page';

但是,尽管有索引需要注意,但它仍然非常缓慢.

But it is still very slow, despite having an index on note.

我显然不是mysql专业人士.我的问题是:

I am obviously not pro at mysql. My question is:

拥有大量数据的公司如何跟踪渠道转换率之类的内容?是否可以在mysql中做而我只是缺少一些知识?如果没有,我可以阅读哪些书籍/博客来了解网站是如何做到的?

How do companies with lots of data track things like funnel conversion rates? Is it possible to do in mysql and I am just missing some knowledge? If not, what books / blogs can I read about how sites do this?

推荐答案

为什么要加入?如果我们可以假设没有IP使其在用户中没有相关记录的情况下付诸行动,那么您就不需要联接:

Why the JOIN? If we can assume that no IP makes it into acts without an associated record in users then you don't need the join:

SELECT COUNT(distinct ip) FROM acts
WHERE acts.url LIKE '%some_marketing_page%';

如果您确实需要JOIN,则可能需要先从行为中选择不同的IP,然后再将这些结果JOIN给用户(您必须查看执行计划并进行试验,看看这样做是否更快),这可能会花费很多.

If you really do need the JOIN it might pay to first select the distinct IPs from acts, then JOIN those results to users (you'll have to look at the execution plan and experiment to see if this is faster).

其次,具有领先通配符的LIKE将导致对行为进行全表扫描,并且还需要进行一些昂贵的文本搜索.您可以通过三种选择来改善这一点:

Secondly, that LIKE with a leading wild card is going to cause a full table scan of acts and also necessitate some expensive text searching. You have three choices to improve this:

  1. 在存储URL之前将其分解为组成部分,以使搜索与列值完全匹配.

  1. Decompose the url into component parts before you store it so that the search matches a column value exactly.

要求搜索词出现在url字段的开头,而不是中间.

Require the search term to appear at the beginning of the of the url field, not in the middle.

研究一个全文搜索引擎,该引擎将对url字段建立索引,从而甚至可以对索引执行内部LIKE搜索.

Investigate a full text search engine that will index the url field in such a way that even an internal LIKE search can be performed against indexes.

最后,在搜索act.notes的情况下,如果note上的索引不能提供足够的搜索改进,我会考虑在note上计算并存储一个整数散列并进行搜索.

Finally, in the case of searching on acts.notes, if an index on notes doesn't provide sufficient search improvement, I'd consider calculating and storing an integer hash on notes and searching for that.

这篇关于MySql中的大表和分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆