用C#反序列化Avro文件 [英] Deserialize an Avro file with C#
问题描述
我找不到用C#反序列化Apache Avro文件的方法。 Avro文件是由存档功能。
使用Java,我可以使用 Apache的Avro Tools 将文件转换为JSON:
java -jar avro-tools-1.8.1.jar tojson --pretty inputfile> output.json
使用NuGet包 Microsoft.Hadoop.Avro 提取 SequenceNumber
, Offset
和 EnqueuedTimeUtc
,但是因为I不知道用于 Body
的类型是否会抛出异常。我试着用 Dictionary< string,object>
和其他类型。 c> static void Main(string [] args)
{
var fileName =...;
using(Stream stream = new FileStream(fileName,FileMode.Open,FileAccess.Read,FileShare.Read))
{
using(var reader = AvroContainer.CreateReader< EventData> ;(stream))
{
using(var streamReader = new SequentialReader< EventData>(reader))
{
var record = streamReader.Objects.FirstOrDefault();
$ DataContract(Namespace =Microsoft.ServiceBus.Messaging)]
public class EventData
{
[DataMember(Name =SequenceNumber)]
public long SequenceNumber {get;组; }
$ b $ [DataMember(Name =Offset)]
public string Offset {get;组; }
$ b $ [DataMember(Name =EnqueuedTimeUtc)]
public string EnqueuedTimeUtc {get;组; }
[DataMember(Name =Body)]
public foo Body {get;组; }
//更多属性...
}
模式如下所示:
{
type:record,
name :EventData,
namespace:Microsoft.ServiceBus.Messaging,
fields:[
{
name:SequenceNumber,
type:long
},
{
name:Offset,
type:string
},
$ bname:EnqueuedTimeUtc,
type:string
},
{
name:SystemProperties,
type:{
type:map,
values:[long,double,string,bytes]
}
,
name:属性,
type:{
type:map,
values :[long,double,string,bytes]
}
},
{
name:Body,
type:[null,bytes]
}
]
}
我能够使用动态
。以下是访问原始 body
数据的代码,数据以字节数组形式存储。在我的情况下,这些字节包含UTF8编码的JSON,但当然取决于您最初如何创建您发布到Event Hub的 EventData
实例:
using(var reader = AvroContainer.CreateGenericReader(stream))
{
while(reader.MoveNext())
{
foreach(reader.Current.Objects中的动态记录)
{
var sequenceNumber = record.SequenceNumber;
var bodyText = Encoding.UTF8.GetString(record.Body);
Console.WriteLine(${sequenceNumber}:{bodyText});
code
$ b如果有人可以静态发布的解决方案,我会赞成它,但考虑到任何系统中的更大延迟几乎肯定会与事件中心归档blob连接,我不会担心解析性能。 :)
I can't find a way to deserialize an Apache Avro file with C#. The Avro file is a file generated by the Archive feature in Microsoft Azure Event Hubs.
With Java I can use Avro Tools from Apache to convert the file to JSON:
java -jar avro-tools-1.8.1.jar tojson --pretty inputfile > output.json
Using NuGet package Microsoft.Hadoop.Avro I am able to extract SequenceNumber
, Offset
and EnqueuedTimeUtc
, but since I don't know what type to use for Body
an exception is thrown. I've tried with Dictionary<string, object>
and other types.
static void Main(string[] args)
{
var fileName = "...";
using (Stream stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read))
{
using (var reader = AvroContainer.CreateReader<EventData>(stream))
{
using (var streamReader = new SequentialReader<EventData>(reader))
{
var record = streamReader.Objects.FirstOrDefault();
}
}
}
}
[DataContract(Namespace = "Microsoft.ServiceBus.Messaging")]
public class EventData
{
[DataMember(Name = "SequenceNumber")]
public long SequenceNumber { get; set; }
[DataMember(Name = "Offset")]
public string Offset { get; set; }
[DataMember(Name = "EnqueuedTimeUtc")]
public string EnqueuedTimeUtc { get; set; }
[DataMember(Name = "Body")]
public foo Body { get; set; }
// More properties...
}
The schema looks like this:
{
"type": "record",
"name": "EventData",
"namespace": "Microsoft.ServiceBus.Messaging",
"fields": [
{
"name": "SequenceNumber",
"type": "long"
},
{
"name": "Offset",
"type": "string"
},
{
"name": "EnqueuedTimeUtc",
"type": "string"
},
{
"name": "SystemProperties",
"type": {
"type": "map",
"values": [ "long", "double", "string", "bytes" ]
}
},
{
"name": "Properties",
"type": {
"type": "map",
"values": [ "long", "double", "string", "bytes" ]
}
},
{
"name": "Body",
"type": [ "null", "bytes" ]
}
]
}
解决方案 I was able to get full data access working using dynamic
. Here's the code for accessing the raw body
data, which is stored as an array of bytes. In my case, those bytes contain UTF8-encoded JSON, but of course it depends on how you initially created your EventData
instances that you published to the Event Hub:
using (var reader = AvroContainer.CreateGenericReader(stream))
{
while (reader.MoveNext())
{
foreach (dynamic record in reader.Current.Objects)
{
var sequenceNumber = record.SequenceNumber;
var bodyText = Encoding.UTF8.GetString(record.Body);
Console.WriteLine($"{sequenceNumber}: {bodyText}");
}
}
}
If someone can post a statically-typed solution, I'll upvote it, but given that the bigger latency in any system will almost certainly be the connection to the Event Hub Archive blobs, I wouldn't worry about parsing performance. :)
这篇关于用C#反序列化Avro文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!