Elasticsearch - 我需要JDBC驱动程序吗? [英] Elasticsearch - Do i need the JDBC driver?

查看:139
本文介绍了Elasticsearch - 我需要JDBC驱动程序吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目标

将我的弹性搜索服务器与我的SQL数据库中的新数据和过期数据进行同步

To synchronize my elasticsearch server with new and expired data in my SQL database

问题

有两种非常不同的方式可以实现这一点,我不知道哪个更好。我可以通过使用JDBC River插件直接连接到SQL数据库来将信息拉到弹性搜索。或者,我可以使用PHP客户端使用下面的代码数据推送到弹性搜索:

There are two very different ways I can achieve this and I don't know which is better. I can either pull information to elasticsearch with a direct connection to the SQL database using the JDBC river plugin. Alternatively I can push data to elasticsearch using the PHP client using the code shown below as an example:

// The Id of the document
$id = 1;

// Create a document
$tweet = array(
    'id'      => $id,
    'user'    => array(
        'name'      => 'mewantcookie',
        'fullName'  => 'Cookie Monster'
    ),
    'msg'     => 'Me wish there were expression for cookies like there is for apples. "A cookie a day make the doctor diagnose you with diabetes" not catchy.',
    'tstamp'  => '1238081389',
    'location'=> '41.12,-71.34',
    '_boost'  => 1.0
);
// First parameter is the id of document.
$tweetDocument = new \Elastica\Document($id, $tweet);

// Add tweet to type
$elasticaType->addDocument($tweetDocument);

// Refresh Index
$elasticaType->getIndex()->refresh();

我将在每三十分钟内运行一次cron,以检查数据库中的项目,不仅有一个活动标志,但也没有索引标志,这意味着我需要将它们添加到索引。

I was going to have a cron run every thirty minuets to check for items in my database that not only have an "active" flag but that also do not have an "indexed" flag, that means I need to add them to the index.

QUESTION

看到我有两种方法在弹性搜索和mysql有两种不同的方式,每个选项的优点和缺点是什么。有没有一个特定的用法来定义使用另一个?

Seeing as I have two methods to synchronize data between elasticsearch and mysql in two different ways, what is the advantage and disadvantage of each option. Is there a specific usecase which defines using one over the other?

推荐答案

如果您忘记了一段时间,您需要将初始数据导入到Elasticsearch中,我将使用事件系统>将数据推送到Elasticsearch。从长远来看,这是更有效率的。

If you forget for a moment that you need to import initial data into Elasticsearch, I would use an event system to push data to Elasticsearch. This is more efficient in the long run.

如果需要索引Elasticsearch,您的应用程序就会知道 。以你的推文为例,在某个时候,一个新的推文将会输入你的应用程序(例如用户写一个)。这将触发 newTweet 事件。你有一个监听器,将听取该事件,并在发送此类事件时将该推送存储在Elasticsearch中。

Your application knows exactly when something needs to be indexed by Elasticsearch. To take your tweet example, at some point a new tweet will enter your application (a user writes one for example). This would trigger a newTweet event. You have a listener in place that will listen to that event, and store the tweet in Elasticsearch whenever such an event is dispatched.

如果您不想使用资源/网络请求中执行此操作的时间(您绝对不要想要这样做),监听器可以将一个作业添加到队列中( Gearman Beanstalkd )。然后,您将需要一名工作人员来选择这项工作,并将推特存储在Elasticsearch中。

If you don't want to use resources/time in the web request to do this (and you definitely don't want to do this), the listener could add a job to a queue (Gearman or Beanstalkd for example). You would then need a worker that will pick that job up and store the tweet in Elasticsearch.

主要优点是Elasticsearch保持最新的实时更新,时间。你不会需要一个会引起延迟的cronjob。您(主要)一次处理单个文档。您不需要打扰SQL数据库,以找出需要(重新)索引的内容。

The main advantage is that Elasticsearch is kept up-to-date more real-time. You won't need a cronjob that would introduce a delay. You'll (mostly) handle a single document at a time. You won't need to bother the SQL database to find out what needs to be (re)indexed.

另一个优点是,当事件/数据量不合时,您可以轻松扩展。当Elasticsearch本身需要更多的电力时,将服务器添加到集群。当工作人员无法处理负载时,只需添加更多的负载(并将它们放在专用机器上)。加上您的网络服务器和SQL数据库将不会感觉到一件事。

Another advantage is that you can easily scale when the amount of events/data gets out of hand. When Elasticsearch itself needs more power, add servers to the cluster. When the worker can't handle the load, simply add more of them (and place them on dedicated machines). Plus your webserver(s) and SQL database(s) won't feel a thing.

这篇关于Elasticsearch - 我需要JDBC驱动程序吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆