生成EBCDIC数据文件的记录布局. [英] Generating Record Layouts for EBCDIC Data Files.

查看:135
本文介绍了生成EBCDIC数据文件的记录布局.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正尝试在Perl中编写一个工具,该工具有望解析固定长度的EBCDIC数据文件,并通过查看记录中每个字节的十六进制值来生成记录布局.

We are attempting to write a tool in Perl which is expected to parse a fixed length EBCDIC data file and generate the record layout by looking at the hex value of each byte in the record.

假定由Cobol程序编写的每个数据文件(我们没有源代码)都可以具有多个记录布局,而Cobol程序没有我们的源代码.该工具的目的是通过生成布局,然后将其馈送到转换器来执行数据迁移(从EBCDIC到ASCII).

It is assumed that each data file, which is written by a Cobol program whose source code we do not have, can have multiple record layouts. The aim of this tool is to perform data migration (EBCDIC to ASCII) by generating layout which would then be fed to a converter.

问题在于每个字节可能会出现数百种排列和组合.我认为比较当前记录下面的记录中相应字节的十六进制值可能会给我们一些线索.但是即使在这种情况下,也没有一个具体的解决方案可以解决.需要在每个关键时刻做出可能影响最终结果的决定.

The problem is that there are hundreds of permutations and combinations that may arise with each byte. I thought that comparing the hex value of the corresponding byte in the record below the current one might give us some clue as to what this might be. But even in this case there is no concrete solution that one might arrive at. Decisions need to be taken at every juncture which might affect the end result.

有人可以让我知道我可以寻找的任何所说的模式吗?例如,对于所有COMP-3,每个半字节都可能代表0-9的值,因此字节的十六进制值可能类似于[0-9] [0-9].本质上,对于数据迁移,不必担心COMPs和COMP-3,因为它们的价值不会在迁移中受到影响.但是,确定什么是DISPLAY字段也变成了一项艰巨的任务.有人可以提出一些想法或指向我可以进一步探索的方向吗?

Could someone please let me know for any said patterns that I can look for? For example, for all COMP-3s each nibble can possibly represent a value from 0-9 and hence the hex value of the byte might be something like, [0-9][0-9]. Essentially for data migration one need not bother about COMPs and COMP-3s as their value would not be affected in the migration. But identifying what is the DISPLAY fields are is also turning out to be a huge task. Can someone throw some ideas or point me in some direction that I can further explore?

任何帮助将不胜感激.我真的陷入了泥潭.

Any help would be highly appreciated. I am really stuck in a mire here.

谢谢, 阿迪亚(Aditya).

Thanks, Aditya.

推荐答案

我想您必须考虑概率问题,并希望数据足够多,可以从中受益匪浅.

I guess you have to go with probabilities, and hope the data is varied enough to get a lot out of that.

  • 仅包含字母数字和标点符号的EBCDIC值的任何字段
  • 数字显示字段将是最简单的,仅包含EBCDIC 0-9.请注意,如果签名,则第一个数字将更改为字母,例如我认为A为-1.
  • 以十六进制0开头的值的随机分布很可能是二进制数字"COMP"字段.
  • COMP-3字段是数据的每个十六进制数字中的一位十进制数字.因此,如果所有十六进制数字都恰好是0-9,则这是comp-3字段的明显标志.字段的最后一个十六进制数字除外,其中将包含C表示正数,D表示负数,F表示无符号.
  • 某些程序在数字字段上使用空格,因此,如果一个字段包含所有二进制数,还包含十六进制40(空格),则最好将十六进制40丢掉.它可能告诉您一组字节是一个字段,如果它们都是空格或所有数据在一起的话.

对于多种布局,这很难.对于可以具有多种布局的记录,通常的约定是在记录的开头附近使用这是什么类型的数据"的一组有限的值.像ificantID,recordType,数据一样.因此,有意义的ID应该稳定增加,而recordType字段将在几个值之间变化并循环.

As for multiple layouts, that's tough. A common convention for records that can have multiple layouts is to have a limited set of values for "what type of data is this" near the front of the record. Like significantID, recordType, data. So the significantID should increase steadily, while the recordType fields will vary between just a few values and re-cycle.

这篇关于生成EBCDIC数据文件的记录布局.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆