通过polybase加载时的镶木地板文件问题 [英] Parquet file issue while loading through polybase
问题描述
我们在使用生成的镶木地板文件加载数据时遇到问题ADF,我们将Oracle数据(在UTF-8中有nls_characterset)复制到azure blob上的镶木地板文件中,当我们创建外部表并尝试使用select
语句访问时,我们得到了以下错误
" HdfsBridge :: recordReaderFillBuffer - 填写记录阅读器缓冲区遇到意外错误:ClassCastException:parquet.io.api.Binary $ ByteArraySliceBackedBinary无法强制转换为java.base / java.lang.Long"
然后按照 互联网上的建议当我试图通过powershell命令将相同的镶木地板文件编码更改为UTF-8然后我得到以下错误
EXTERNAL TABLE访问失败到期内部错误:'File / oracle / sod_utf:HdfsBridge :: CreateRecordReader - 创建记录阅读器时遇到意外错误:RuntimeException:wasbs:/xxx.blob.core.windows.net/oracle/sod_utf不是Parquet文件。
尾部的预期幻数[80,65,82,49],但找到[82,49,13,10]'
$
我使用下面的powershell命令转换编码
Get-Content sod | Set-Content -Encoding utf8 sod_utf
Hi,
We are facing issue while loading data from a parquet file that is generated with ADF, we are copying Oracle data (which has nls_characterset in UTF-8) into parquet file on azure blob and when we create the external table to it and try to access with select
statement we got the error below
"HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: parquet.io.api.Binary$ByteArraySliceBackedBinary cannot be cast to java.base/java.lang.Long"
then as per a suggestion on internet when i tried to change the same parquet file encoding to UTF-8 through powershell command then i got the error below
EXTERNAL TABLE access failed due to internal error: 'File /oracle/sod_utf: HdfsBridge::CreateRecordReader - Unexpected error encountered creating the record reader: RuntimeException: wasbs:/xxx.blob.core.windows.net/oracle/sod_utf is not a Parquet file.
expected magic number at tail [80, 65, 82, 49] but found [82, 49, 13, 10]'
I used below powershell command to convert the encoding
Get-Content sod | Set-Content -Encoding utf8 sod_utf
当我用C#代码更改它时,我得到了一个不同的错误
When i changed it with C# code then i got a different error
期待您的帮助
干杯,
推荐答案
嗨Amit,
您能否详细说明您用于创建的T-SQL外部数据源?请遵循本文档中的指导: CREATE
EXTERNAL FILE FORMAT(Transact-SQL)
Can you detail the T-SQL you used to create the external data source? Please follow the guidance in this document: CREATE EXTERNAL FILE FORMAT (Transact-SQL)
示例:
CREATE EXTERNAL FILE FORMAT parquetfile1
WITH (
FORMAT_TYPE = PARQUET,
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
);
如果您使用的是GzipCodec: 'org.apache.hadoop.io .compress.GzipCodec'
If you are using a GzipCodec: 'org.apache.hadoop.io.compress.GzipCodec'
创建外部文件格式是创建外部文件格式的先决条件外部表。通过创建外部文件格式,您可以指定外部表引用的数据的实际布局。
Creating an external file format is a prerequisite for creating an External Table. By creating an External File Format, you specify the actual layout of the data referenced by an external table.
谢谢,
Mike
这篇关于通过polybase加载时的镶木地板文件问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!