通过polybase加载时的镶木地板文件问题 [英] Parquet file issue while loading through polybase

查看:197
本文介绍了通过polybase加载时的镶木地板文件问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述





我们在使用生成的镶木地板文件加载数据时遇到问题ADF,我们将Oracle数据(在UTF-8中有nls_characterset)复制到azure blob上的镶木地板文件中,当我们创建外部表并尝试使用select
语句访问时,我们得到了以下错误



" HdfsBridge :: recordReaderFillBuffer - 填写记录阅读器缓冲区遇到意外错误:ClassCastException:parquet.io.api.Binary $ ByteArraySliceBackedBinary无法强制转换为java.base / java.lang.Long"




然后按照 互联网上的建议当我试图通过powershell命令将相同的镶木地板文件编码更改为UTF-8然后我得到以下错误



EXTERNAL TABLE访问失败到期内部错误:'File / oracle / sod_utf:HdfsBridge :: CreateRecordReader - 创建记录阅读器时遇到意外错误:RuntimeException:wasbs:/xxx.blob.core.windows.net/oracle/sod_utf不是Parquet文件。
尾部的预期幻数[80,65,82,49],但找到[82,49,13,10]'


$
我使用下面的powershell命令转换编码



Get-Content sod | Set-Content -Encoding utf8 sod_utf

Hi,

We are facing issue while loading data from a parquet file that is generated with ADF, we are copying Oracle data (which has nls_characterset in UTF-8) into parquet file on azure blob and when we create the external table to it and try to access with select statement we got the error below

"HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: parquet.io.api.Binary$ByteArraySliceBackedBinary cannot be cast to java.base/java.lang.Long"


then as per a  suggestion on internet when i tried to change the same parquet file encoding to UTF-8 through powershell command then i got the error below

EXTERNAL TABLE access failed due to internal error: 'File /oracle/sod_utf: HdfsBridge::CreateRecordReader - Unexpected error encountered creating the record reader: RuntimeException: wasbs:/xxx.blob.core.windows.net/oracle/sod_utf is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [82, 49, 13, 10]'

I used below powershell command to convert the encoding

Get-Content sod | Set-Content -Encoding utf8 sod_utf

当我用C#代码更改它时,我得到了一个不同的错误

When i changed it with C# code then i got a different error

期待您的帮助

干杯,




推荐答案

嗨Amit,

您能否详细说明您用于创建的T-SQL外部数据源?请遵循本文档中的指导:  CREATE
EXTERNAL FILE FORMAT(Transact-SQL)

Can you detail the T-SQL you used to create the external data source? Please follow the guidance in this document: CREATE EXTERNAL FILE FORMAT (Transact-SQL)

示例:

CREATE EXTERNAL FILE FORMAT parquetfile1  
WITH (  
    FORMAT_TYPE = PARQUET,  
    DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'  
);  

如果您使用的是GzipCodec:  'org.apache.hadoop.io .compress.GzipCodec'

If you are using a GzipCodec: 'org.apache.hadoop.io.compress.GzipCodec'

创建外部文件格式是创建外部文件格式的先决条件外部表。通过创建外部文件格式,您可以指定外部表引用的数据的实际布局。

Creating an external file format is a prerequisite for creating an External Table. By creating an External File Format, you specify the actual layout of the data referenced by an external table.

谢谢,

Mike


这篇关于通过polybase加载时的镶木地板文件问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆