如何过滤和排序来自多个微服务的数据? [英] How to filter and sort data from multiple microservices?

查看:446
本文介绍了如何过滤和排序来自多个微服务的数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有微服务,可处理不同但相关的数据.例如,广告及其统计信息.我们希望能够过滤,排序和聚合UI的相关数据(而不仅限于此).例如,我们希望向用户显示广告中带有汽车"字样且点击次数超过100次的广告.

We have microservices which work with different, but related data. For example, ads and their stats. We want to be able to filter, sort and aggregate this related data for UI(and not only for it). For example, we want to show to a user ads which have 'car' in their text and which have more than 100 clicks.

挑战:

  • 可能有很多数据.某些用户在过滤后有数百万行
  • 服务没有所有数据.例如,对于没有统计信息==不存在的广告的统计信息服务广告.它对此类广告一无所知.但是排序和过滤仍然可以正常工作(没有统计信息的广告应被视为没有零点击的广告)

要求:

  • 几秒钟内最终保持一致
  • 数据丢失是不可接受的
  • 对具有数百万行的大客户进行5到10秒的过滤和排序是可以的

我们可以想到的解决方案:

  • 从所有服务加载查询所需的所有数据,并在内存中进行过滤和排序.
  • 将更新从服务推送到Elasticsearch(或类似的东西). Elastic处理查询并返回所需实体的ID,然后从服务中加载这些实体.
  • 一个包含所有服务的大型数据库

我们应该注意什么?还有其他方法可以解决我们的问题吗?

What should we pay attention to? Are there other ways to solve our problem?

推荐答案

您可以使用 CQRS .在这种低级架构中,用于写入数据的模型与用于读取/查询数据的模型分开.写模型是信息的规范来源,是真理的来源.

You could use CQRS. In this low level architecture, the model use for writing data is split from the model use to read/query data. The write model is the canonical source of information, is the source of truth.

写入模型以最终一致的方式发布由一个或多个读取模型解释/投影的事件.这些事件甚至可以发布在消息队列中,并由外部读取模型(其他微服务)使用.从写入到读取没有1:1映射.您可以具有1个用于写入的模型和3个用于读取的模型.每个读取模型都针对其用例进行了优化.这是您感兴趣的部分:一种速度优化的读取模型.

The write model publishes events that are interpreted/projected by one or more read models, in an eventually consistent manner. Those events could be even published in a message queue and consumed by external read models (other microservices). There is no 1:1 mapping from write to read. You can have 1 model for write and 3 models for read. Each read model is optimized for its use-case. This is the part that interests you: an speed-optimized read model.

优化的读取模型在回答查询时具有所需的一切.数据已完全非规范化(这意味着它不需要连接)并已建立索引.

An optimized read model has every thing it needs when it answers the queries. The data is fully denormalized (this means it needs no joins) and already indexed.

读取模型可以对其数据进行分片.这样做是为了最大程度地减少集合的大小(小的集合比大的集合要快).在您的情况下,您可以按用户分片:每个用户都有自己的统计信息集合(即SQL中的表或NoSQL中的文档集合).您可以使用数据库的内置分片,也​​可以通过拆分成单独的集合(表)来手动分片.

A read model can have its data sharded. You do this in order to minimize the collection size (a small collection is faster than a bigger one). In your case, you could shard by user: each user would have its own collection of statistics (i.e. a table in SQL or a document collection in NoSQL). You can use the build-in sharding of the database or you could shard it manually, by splitting in separate collections (tables).

服务没有所有数据.

Services doesn't have all the data.

读取模型可以订阅许多真相来源(即微服务或事件流).

A read model could subscribe to many sources of truth (i.e. microservices or event streams).

一个与CQRS配合得很好的特殊案例是事件源.它的优点是您可以从时间开始获取事件,而无需将其存储在持久的消息队列中.

One particular case that works very well with CQRS is Event sourcing; it has the advantage that you have the events from the begging of time, without the need to store them in a persistent message queue.

P.S.在足够的硬件资源的情况下,无法快速建立读取模型,我想不出一个用例.

这篇关于如何过滤和排序来自多个微服务的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆