Spring 数据存储库:列表与流 [英] Spring Data repository: list vs stream

查看:49
本文介绍了Spring 数据存储库:列表与流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么时候在 Spring Data 存储库中定义方法 liststream 有什么建议?

What are recommendations when to define method list and stream in Spring Data repository?

https://docs.spring.io/spring-data/jpa/docs/current/reference/html/#repositories.query-streaming

示例:

interface UserRepository extends Repository<User, Long> {

  List<User> findAllByLastName(String lastName);

  Stream<User> streamAllByFirstName(String firstName);                    
         
  // Other methods defined.
}

请注意,这里我不是在问页面切片 - 我很清楚它们,我在 文档.

Please, note, here I am not asking about Page, Slice - they are clear to me, and I found their description in the documentation.

我的假设(我错了吗?):

My assumption (am I wrong?):

  1. Stream 不会将所有记录加载到 Java 堆中.相反,它将 k 条记录加载到堆中并一一处理;然后它加载另一个 k 记录等等.

  1. Stream does not load all the records into Java Heap. Instead it loads k records into the heap and processes them one by one; then it loads another k records and so on.

List 会一次性将所有记录加载到 Java 堆中.

List does load all the records into Java Heap at once.

如果我需要一些后台批处理作业(例如计算分析),我可以使用流操作,因为我不会一次将所有记录加载到堆中.

If I need some background batch job (for example calculate analytics), I could use stream operation because I will not load all the records into the heap at once.

如果我需要返回包含所有记录的 REST 响应,无论如何我都需要将它们加载到 RAM 中并将它们序列化为 JSON.在这种情况下,一次加载一个列表是有意义的.

If I need to return a REST response with all the records, I will need to load them into RAM anyway and serialize them into JSON. In this case, it makes sense to load a list at once.


我看到一些开发人员在返回响应之前将流收集到列表中.


I saw that some developers collect the stream into a list before returning a response.

class UserController {

    public ResponseEntity<List<User>> getUsers() {
        return new ResponseEntity(
                repository.streamByFirstName()
                        // OK, for mapper - it is nice syntactic sugar. 
                        // Let's imagine there is not map for now...
                        // .map(someMapper)  
                       .collect(Collectors.toList()), 
                HttpStatus.OK);
    }
}

对于这种情况,我没有看到 Stream 的任何优势,使用 list 将产生相同的最终结果.

For this case, I do not see any advantage of Stream, using list will make the same end result.

那么使用 list 的例子是否合理?

Are then any examples when using list is justified?

推荐答案

tl;dr

Collection VS Stream 的主要区别在于以下两个方面:

tl;dr

The primary difference in Collection VS Stream are the following two aspects:

  1. 第一个结果的时间——客户端代码什么时候看到第一个元素?
  2. 处理时的资源状态 - 处理流时底层基础设施资源处于什么状态?
  1. Time to first result – when does the client code see the first element?
  2. The state of resources while processing - in what state are underlying infrastructure resources while the stream is processed?

处理集合

让我们通过一个例子来讨论这个问题.假设我们需要从存储库中读取 100k Customer 实例.您(必须)处理结果的方式暗示了上述两个方面.

Working with collections

Let's talk this through with an example. Let's say we need to read 100k Customer instances from a repository. The way you (have to) handle the result gives a hint at both of the aspects described above.

List<Customer> result = repository.findAllBy();

一旦所有元素从底层数据存储中完全读取,客户端代码将接收该列表,而不是在此之前的任何时刻.而且,底层数据库连接可以已经关闭.即例如在 Spring Data JPA 应用程序中,您将看到底层 EntityManager 已关闭并且实体已分离,除非您主动将其保留在更广泛的范围内,例如通过使用 @Transactional 或使用 OpenEntityManagerInViewFilter 注释周围的方法.此外,您无需主动关闭资源.

The client code will receive that list once all elements have been completely read from the underlying data store, not any moment before that. But also, underlying database connections can have been closed. I.e. e.g. in a Spring Data JPA application you will see the underlying EntityManager be closed and the entity detached unless you actively keep that in a broader scope, e.g. by annotating surrounding methods with @Transactional or using an OpenEntityManagerInViewFilter. Also, you don't need to actively close the resources.

必须像这样处理流:

@Transactional
void someMethod() {

  try (Stream result = repository.streamAllBy()) {
    // … processing goes here
  }
}

使用 Stream,只要第一个元素(例如数据库中的行)到达并被映射,处理就可以开始.即您将能够在处理结果集的其他部分时已经使用元素.这也意味着,底层资源需要积极保持开放,因为它们通常绑定到存储库方法调用.注意 Stream 也必须主动关闭(try-with-resources),因为它绑定了底层资源,我们必须以某种方式通知它关闭它们.

With a Stream, the processing can start as soon as the first element (e.g. row in a database) arrives and is mapped. I.e. you will be able to already consume elements while others of the result set are still processed. That also means, that the underlying resources need to actively be kept open and as they're usually bound to the repository method invocation. Note how the Stream also has to actively be closed (try-with-resources) as it binds underlying resources and we somehow have to signal it to close them.

使用 JPA,如果没有 @TransactionalStream 将无法正确处理,因为底层 EntityManager 在方法返回时关闭.您会看到处理了一些元素,但在处理过程中出现了异常.

With JPA, without @Transactional the Stream will not be able to be processed properly as the underlying EntityManager is closed on method return. You'd see a few elements processed but an exception in the middle of the processing.

因此,虽然理论上您可以使用 Stream 来例如有效地构建 JSON 数组,它使图片显着复杂化,因为您需要保持核心资源开放,直到您编写完所有元素.这通常意味着编写代码以将对象映射到 JSON 并将它们手动写入响应(使用例如 Jackson 的 ObjectMapperHttpServletResponse.

So while you theoretically can use a Stream to e.g. build up JSON arrays efficiently, it significantly complicates the picture as you need to keep the core resources open until you've written all elements. That usually means writing the code to map objects to JSON and writing them to the response manually (using e.g. Jackson's ObjectMapper and the HttpServletResponse.

虽然内存占用可能会有所改善,但这主要是因为您喜欢避免在映射步骤(ResultSet -> Customer -> CustomerDTO -> JSON 对象).已处理的元素不能保证从内存中删除,因为它们可能因其他原因被保留.再次,例如在 JPA 中,您必须保持 EntityManager 打开,因为它控制资源生命周期,因此所有元素都将绑定到该 EntityManager 并一直保留到 处理所有元素.

While the memory footprint will likely improve, this mostly stems from the fact that you're like avoiding the intermediate creation of collections and additional collections in mapping steps (ResultSet -> Customer -> CustomerDTO -> JSON Object). Elements already processed are not guaranteed to be evicted from memory as they might be held onto for other reasons. Again, e.g. in JPA you'd have to keep the EntityManager open as it controls the resource lifecycle and thus all elements will stay bound to that EntityManager and will be kept around until all elements are processed.

这篇关于Spring 数据存储库:列表与流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆