批量/离线处理设计图书/资料 [英] batch/offline processing design book / documentation

查看:113
本文介绍了批量/离线处理设计图书/资料的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有一本书或描述两方?

Is there a book or any documentation available that describes the best practice for designing batch (offline) processes for sharing data between two parties?

我已经找到了春天的一批网站上的一些有用的信息,但它是相当低的水平:
批处理策略并的批原则准则

I have found some useful information on the spring batch site, but it is quite low level: batch processing strategies and batch principles guidelines.

有一个很多考虑因素一批,例如:

There are a lots of considerations for batch, for example:


  1. 数据传输方法(例如文件)

  2. 双方之间的控制协议

  3. 错误处理

  4. 文件命名惯例(如果使用文件传输)

  5. 双方同步截止时间


这将是一件好事,如果有一些权威的文档和检查清单,以确保设计遵循最佳实践领域。

It would be good if there was some authorative document or checklists that ensure designs follow the best practice in the field.



更新:

,我碰到他们,我会补充回答这个部分。

I'll add answers to this section as I come across them.

这部分是从@ user1813068的回答服用。

This section is taken from @user1813068's answer.

您会发现在这个<一个建筑的一些设计模式href=\"http://mike2.openmethodology.org/wiki/Architecture_Patterns_for_Partner-to-Partner_Integration\"相对=nofollow>链接以及在此链接描述为合作伙伴的方法合作伙伴集成和数据同步。

You can find some architectural design patterns at this link and also at this link that describe approaches for partner to partner integration and for data synchronization.

本维基百科页面还给出了架构模式的高度概括,并包括数据集成模式:架构模式

This wikipedia page also gives a high level overview of architectural patterns and includes patterns for Data Integration: architectural patterns.

这本书数据集成蓝图和建模是很不错的了。

大多数在本节的内容都来自这里:

Most of the content in this section has come from here: source

有关平面文件交换使用页眉和页脚被认为是最佳做法。平面文件可以不页眉和页脚进行交换,该文件的命名可以概述一些相同的信息作为标题。当使用一个分隔的文件,字段列表标题总是必需的。

The use of headers and footers for flat file exchange is considered best practice. Flat files can be exchanged without headers and footers and the naming of the file can outline some of the same information as the header. When using a delimited file, the field list header is always required.

当系统之间的数据交换,这是很重要的是,接收方准确正在发送什么类型的数据知道。确保这一点的一种方式是提供一个头行包括关于所述数据的内容和它应如何进行处理的相关信息。

When exchanging data between systems, it is very important for the receiving party to know exactly what type of data is being sent. One way to ensure this is to provide a header row that includes pertinent information regarding the content of the data and how it should be processed.

当用平面文件时,文件名本身也可用于通知该文件的内容的接收方。然而,标题行提供了可能所有可用选项更好的支持。

When working with flat files, the filename itself can also be used to inform the receiving party of the content of the file. However, a header row provides better support for all options that may be available.

当用这些报头字段可以以类似的方式来提供的API工作。实施将通过API服务的开发人员来决定。

When working with an API these header fields can be provided in a similar fashion. Implementation will be determined by the developer of the API service.

如果被包括在报头,它由一组数据中的,并且必须始终是该文件中的第一数据

If the header is included, it consists of a single set of data, and must always be the first data in the file.

可以使用基于文件的格式时,以指示没有更多的数据留下来处理被提供页脚

A footer may be provided when using file-based formats to indicate that there is no more data left to process.

在处理,尾行后找到的数据应该被忽略。此外,在创建数据时,请注意尾行之后的任何数据都将被忽略。

When processing, the data found after the footer row should be ignored. Also, when creating the data, be aware that any data after the footer row will be ignored.

分隔的文件

事实上的行业标准是分隔的文件。

The de facto industry standard is delimited files.

逗号分隔(CSV或逗号分隔值)文件通常需要数据封装,通常用双引号();然后在双引号必须转义,可以用一个反斜杠()或双双引号( ),由于以CSV实施不一致,则建议使用标签作为分隔符,没有封装。在这种情况下,制表符,必须从数据中删除。分隔文件通常更快地处理该XML文件的

Comma-delimited (CSV, or comma-separated values) files usually requires data encapsulation, usually with double quotes ("); the double quotes must then be escaped, either with a backslash () or double double quotes (""). Due to the inconsistencies in CSV implementation, it is recommended to use tabs as a delimiter, with no encapsulation. In this case, tab characters must be removed from the data. Delimited Files are usually quicker to process that XML Files.

XML文件

有一些谁preFER XML文件的行业。 XML允许对信息的更清晰重新presentation,因为它支持嵌套数据。许多公司已经有限,或为这种格式不支持,所以不推荐。

There are some in the industry who prefer XML files. XML allows for a more clear representation of the information, since it supports nested data. Many companies have limited or no support for this format, so it is not recommended.

UTF-8编码

所有的数据应该是UTF-8 EN codeD,以确保所有系统之间最大的兼容性。

All data should be UTF-8 encoded to ensure maximum compatibility between all systems.

日期及放大器;时报

建议使用UTC时间,所有日期和放大器;时间字段,以prevent混乱。

It is recommended to use UTC time for all date & time fields to prevent confusion.

更多的一些最佳实践: EDI调度和文件传输

Some more best practices: EDI Scheduling and File Transfer

推荐答案

您会发现在这个<一个建筑的一些设计模式href=\"http://mike2.openmethodology.org/wiki/Architecture_Patterns_for_Partner-to-Partner_Integration\"相对=nofollow>链接以及在此链接描述为合作伙伴的方法合作伙伴集成和数据同步。

You can find some architectural design patterns at this link and also at this link that describe approaches for partner to partner integration and for data synchronization.

本维基百科页面还给出了架构模式的高度概括,并包括数据集成模式:架构模式

This wikipedia page also gives a high level overview of architectural patterns and includes patterns for Data Integration: architectural patterns.

这本书数据集成蓝图和建模是很不错的了。

这篇关于批量/离线处理设计图书/资料的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆