简单而沉重的应用程序消耗了大量资源。如何优化? [英] Simple but heavy application consuming a lot of resources. How to Optimize?

查看:147
本文介绍了简单而沉重的应用程序消耗了大量资源。如何优化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前我在生产中有一个监视应用程序。这个监视器的工作是在Facebook,Twitter,YouTube等社交网络上收集特定的条目。



以下是来自Twitter的API调用的一个简单示例:



http://search.twitter.com/search?q=Stackoverflow&format=json



基本上,这是系统的作用:


  1. 从数据库中选择搜索项特定订单

  2. 调用API

  3. 从当前搜索收集所有推文状态ID和用户ID

  4. 检查数据库是否存在

  5. 运行推文插入消除现有的推文和用户,并防止重复输入错误。

我们完成了两个表,一个用于用户,另一个用于tweets。



问题



当MySql数据库在tweets表上达到了200.000个条目(在头几个月)时,可视化该数据的应用程序在对现有的tweets执行select查询时开始消耗太多的资源。 / p>

为什么?



系统有分离的帐户,每个都有一定的搜索字词与其具体业务相关。当我们执行选择时,我们只需要选择与我们帐户条款相关联的选项。我们看不到与我们无关的推文。
但是一些推文可以在很多帐户上。



实际查询(伤害我的眼睛)

  SELECT * 
FROM tweets
WHERE
content LIKE'%searchterm1%'
OR LIKE'%searchterm2 %'
OR LIKE'%searchterm3%'
OR LIKE'%searchterm4%'
OR LIKE'%searchterm5%'
OR LIKE'%searchterm6%'
OR LIKE'%searchterm7%'
OR LIKE'%searchterm8%'
(and son on ...)

两种可能的解决方案



a)创建一个tweets_searches表,它将有两个外键:tweet_id和search_id。所以tweet表格上的每个tweet都可以与一个具体的搜索字词相关联。



所以我们不用搜索一个特定的字符串,而是加入这些表。 b
$ b

b)继续搜索,而是使用 MATCH()AGAISNT()
$ b

主要问题



是否足以减少资源消耗,如CPU和RAM?
有什么更好的我可以做吗?

解决方案

免责声明 / strong>:这是我对这个问题的评论之一,可能是答案:






我认为 match ... against 在这里是适当的。它是所谓的全文搜索。对于更复杂的搜索,我会使用 Sphinx - 它的索引你自己的数据库(拥有自己的机制),并且执行比MySQL快的搜索方式


Currently I have one monitor application in production. The job of this monitor is to collect specific entries on social networking like facebook, twitter, youtube and so on.

Here are one simple example of an API call from Twitter:

http://search.twitter.com/search?q=Stackoverflow&format=json

Basically, this is what the system does:

  1. Select the search term from database given an specific order
  2. Call the API
  3. Collect all tweets statuses IDs and users IDs from the current search
  4. Check on the database if it exists
  5. Run the tweets insertion eliminating existing tweets and users and preventing duplicated entry errors.

We finished with two tables, one for users and another for tweets.

THE PROBLEM

After the MySql database reached 200.000 entries on the tweets table (on the first months), the application that visualize that data started to consume too much resources when performing the select query on the existing tweets.

Why?

The system has separated accounts, each one has certain search terms related to their specific business. When we perform a select, we need to select only the ones that are associated with the terms of our account. We cannot see tweets the aren't related to us. But one tweet can be on many accounts.

The actual query (Hurting my eyes)

SELECT * 
   FROM tweets 
 WHERE 
   content LIKE '%searchterm1%' 
     OR LIKE '%searchterm2%' 
     OR LIKE '%searchterm3%' 
     OR LIKE '%searchterm4%' 
     OR LIKE '%searchterm5%' 
     OR LIKE '%searchterm6%' 
     OR LIKE '%searchterm7%' 
     OR LIKE '%searchterm8%' 
   (and son on...)

The two possible solutions

a) Create a tweets_searches table which will have two foreign keys: tweet_id and search_id. So that each tweet on the tweets table can be related to one specific search term.

So instead of search for a specific string, we will join these tables.

b) Continue searching, but instead, with fulltext searches using MATCH () AGAISNT ().

THE MAIN QUESTION

Is that enough to reduce the resources consumption like CPU and RAM? Is there anything better I can to do?

解决方案

Disclaimer: this is one of my comments on this question which might be the answer:


I think match ... against is appropriate here. It is so-called "fulltext search". For more complex searches, I'd use Sphinx - it indexes your database on its own (has own mechanism for it) and perform searches way faster than MySQL does

这篇关于简单而沉重的应用程序消耗了大量资源。如何优化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆