用Java解压缩COMP-3数据(嵌入Pentaho) [英] COMP-3 data unpacking in Java (Embedded in Pentaho)
问题描述
我们在阅读Pentaho ETL中嵌入的Java中的COMP-3数据时遇到了挑战。在平面文件中存在少量Float值作为压缩小数存储以及其他纯文本。虽然普通文本正在被正确阅读,但我们尝试使用 Charset.forName(CP500);
,但它从未奏效。我们仍然得到垃圾字符。
We are facing a challenge in reading the COMP-3 data in Java embedded inside Pentaho ETL. There are few Float values stored as packed decimals in a flat file along with other plain text. While the plain texts are getting read properly, we tried using Charset.forName("CP500");
, but it never worked. We still get junk characters.
由于Pentaho脚本不支持COMP-3,因此在他们的论坛中他们建议使用用户定义的Java类
。如果你碰到并解决了这个问题,有没有人可以帮助我们?
Since Pentaho scripts doesn't support COMP-3, in their forums they suggested to go with User Defined Java class
. Could anyone help us if you have come across and solved such?
推荐答案
这是一个Cobol文件???,你有吗? Cobol Copybook ???
可能的选项包括
Is it a Cobol File ???, Do you have a Cobol Copybook ???. Possible options include
- 正如Bill所说,将Comp-3转换为源机器上的文本
- 编写您自己的转换代码
- 使用像 JRecord这样的库一>。 注意:我是JRecord的作者
- As Bill said Convert the Comp-3 to Text on the source machine
- Write your own Conversion Code
- Use a library like JRecord. Note: I am the author of JRecord
转换Comp-3
in Comp-3,
Converting Comp-3
in Comp-3,
Value Comp-3 (signed) Comp-3 (Unsigned) Zoned-Decimal
123 x'123c' x'123f' ?? "12C"
-123 x'123d' "12L"
不止一个将comp-3转换为十进制整数的方法。单程
是
There is more than one way to convert a comp-3 to a decimal integer. One way is to
- Connvert x'123c' - >> String123c
- 删除最后一个字符并测试符号
转换comp3的Java代码(来自字节数组:
Java Code to convert comp3 (from a byte array:
public static String getMainframePackedDecimal(final byte[] record,
final int start,
final int len) {
String hex = getDecimal(record, start, start + len);
//Long.toHexString(toBigInt(start, len).longValue());
String ret = "";
String sign = "";
if (! "".equals(hex)) {
switch (hex.substring(hex.length() - 1).toLowerCase().charAt(0)) {
case 'd' : sign = "-";
case 'a' :
case 'b' :
case 'c' :
case 'e' :
case 'f' :
ret = sign + hex.substring(0, hex.length() - 1);
break;
default:
ret = hex;
}
}
if ("".equals(ret)) {
ret = "0";
}
}
public static String getDecimal(final byte[] record, final int start, final int fin) {
int i;
String s;
StringBuffer ret = new StringBuffer("");
int b;
for (i = start; i < fin; i++) {
b = toPostiveByte(record[i]);
s = Integer.toHexString(b);
if (s.length() == 1) {
ret.append('0');
}
ret.append(s);
}
return ret.toString();
}
JRecord
在 JRecord 中,如果您有Cobol Copybook,
有
JRecord
In JRecord, if you have a Cobol Copybook, there is
- Cobol2Csv 使用Cobol Copybook将Cobol-Data文件转换为CSV的程序
- Data2Xml 使用Cobol Copybook将Cobol数据文件转换为Xml。
- 阅读带有Cobol Copybook的Cobol数据文件。
- 读取带有Xml的固定宽度文件描述
- 用Java定义字段
- Cobol2Csv a program to convert a Cobol-Data file to CSV using a Cobol Copybook
- Data2Xml convert a Cobol Data file to Xml using a Cobol Copybook.
- Read Cobol-Data File with a Cobol Copybook.
- Read a Fixed width file with a Xml Description
- Define the Fields in Java
ICobolIOBuilder ioBldr = JRecordInterface1.COBOL
.newIOBuilder(copybookName)
.setDialect( ICopybookDialects.FMT_MAINFRAME)
.setFont("cp037")
.setFileOrganization(Constants.IO_FIXED_LENGTH)
.setDropCopybookNameFromFields(true);
AbstractLine saleRecord;
AbstractLineReader reader = ioBldr.newReader(salesFile);
while ((saleRecord = reader.read()) != null) {
....
}
reader.close();
使用JRecord在Java中定义文件
Defining the File in Java with JRecord
AbstractLineReader reader = JRecordInterface1.FIXED_WIDTH.newIOBuilder()
.defineFieldsByLength()
.addFieldByLength("Sku" , Type.ftChar, 8, 0)
.addFieldByLength("Store", Type.ftNumRightJustified, 3, 0)
.addFieldByLength("Date" , Type.ftNumRightJustified, 6, 0)
.addFieldByLength("Dept" , Type.ftNumRightJustified, 3, 0)
.addFieldByLength("Qty" , Type.ftNumRightJustified, 2, 0)
.addFieldByLength("Price", Type.ftNumRightJustified, 6, 2)
.endOfRecord()
.newReader(this.getClass().getResource("DTAR020_tst1.bin.txt").getFile());
AbstractLine saleRecord;
while ((saleRecord = reader.read()) != null) {
}
分区十进制
另一个Mainframe-Cobol数字格式为Zoned-Decimal。它是一种文本格式,其中最后一位数字的符号为 Over-typed 。在分区十进制 123 是 12C ,而 -123 是 12L 。
Zoned Decimal
Another Mainframe-Cobol numeric format is Zoned-Decimal. It is a text format where the sign is Over-typed on the last digit. In zoned-decimal 123 is "12C" while -123 is "12L".
这篇关于用Java解压缩COMP-3数据(嵌入Pentaho)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!