使用 C# 反序列化 Avro 文件 [英] Deserialize an Avro file with C#
问题描述
我找不到使用 C# 反序列化 Apache Avro 文件的方法.Avro 文件是由存档功能 在 Microsoft Azure 事件中心.
I can't find a way to deserialize an Apache Avro file with C#. The Avro file is a file generated by the Archive feature in Microsoft Azure Event Hubs.
使用 Java 我可以使用 Avro Tools 将文件转换为 JSON:
With Java I can use Avro Tools from Apache to convert the file to JSON:
java -jar avro-tools-1.8.1.jar tojson --pretty inputfile > output.json
使用 NuGet 包 Microsoft.Hadoop.Avro 我能够提取 SequenceNumber
、Offset
和 EnqueuedTimeUtc
,但由于我不知道 Body
使用什么类型,因此抛出异常.我尝试过 Dictionary
和其他类型.
Using NuGet package Microsoft.Hadoop.Avro I am able to extract SequenceNumber
, Offset
and EnqueuedTimeUtc
, but since I don't know what type to use for Body
an exception is thrown. I've tried with Dictionary<string, object>
and other types.
static void Main(string[] args)
{
var fileName = "...";
using (Stream stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read))
{
using (var reader = AvroContainer.CreateReader<EventData>(stream))
{
using (var streamReader = new SequentialReader<EventData>(reader))
{
var record = streamReader.Objects.FirstOrDefault();
}
}
}
}
[DataContract(Namespace = "Microsoft.ServiceBus.Messaging")]
public class EventData
{
[DataMember(Name = "SequenceNumber")]
public long SequenceNumber { get; set; }
[DataMember(Name = "Offset")]
public string Offset { get; set; }
[DataMember(Name = "EnqueuedTimeUtc")]
public string EnqueuedTimeUtc { get; set; }
[DataMember(Name = "Body")]
public foo Body { get; set; }
// More properties...
}
架构如下所示:
{
"type": "record",
"name": "EventData",
"namespace": "Microsoft.ServiceBus.Messaging",
"fields": [
{
"name": "SequenceNumber",
"type": "long"
},
{
"name": "Offset",
"type": "string"
},
{
"name": "EnqueuedTimeUtc",
"type": "string"
},
{
"name": "SystemProperties",
"type": {
"type": "map",
"values": [ "long", "double", "string", "bytes" ]
}
},
{
"name": "Properties",
"type": {
"type": "map",
"values": [ "long", "double", "string", "bytes" ]
}
},
{
"name": "Body",
"type": [ "null", "bytes" ]
}
]
}
推荐答案
我能够使用 dynamic
获得完整的数据访问权限.这是用于访问原始 body
数据的代码,该数据存储为字节数组.就我而言,这些字节包含 UTF8 编码的 JSON,但当然这取决于您最初创建发布到事件中心的 EventData
实例的方式:
I was able to get full data access working using dynamic
. Here's the code for accessing the raw body
data, which is stored as an array of bytes. In my case, those bytes contain UTF8-encoded JSON, but of course it depends on how you initially created your EventData
instances that you published to the Event Hub:
using (var reader = AvroContainer.CreateGenericReader(stream))
{
while (reader.MoveNext())
{
foreach (dynamic record in reader.Current.Objects)
{
var sequenceNumber = record.SequenceNumber;
var bodyText = Encoding.UTF8.GetString(record.Body);
Console.WriteLine($"{sequenceNumber}: {bodyText}");
}
}
}
如果有人可以发布静态类型的解决方案,我会赞成它,但考虑到任何系统中更大的延迟几乎肯定是与事件中心存档 blob 的连接,我不会担心解析性能.:)
If someone can post a statically-typed solution, I'll upvote it, but given that the bigger latency in any system will almost certainly be the connection to the Event Hub Archive blobs, I wouldn't worry about parsing performance. :)
这篇关于使用 C# 反序列化 Avro 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!