使用 C# 反序列化 Avro 文件 [英] Deserialize an Avro file with C#

查看:66
本文介绍了使用 C# 反序列化 Avro 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我找不到使用 C# 反序列化 Apache Avro 文件的方法.Avro 文件是由存档功能 在 Microsoft Azure 事件中心.

I can't find a way to deserialize an Apache Avro file with C#. The Avro file is a file generated by the Archive feature in Microsoft Azure Event Hubs.

使用 Java 我可以使用 Avro Tools 将文件转换为 JSON:

With Java I can use Avro Tools from Apache to convert the file to JSON:

java -jar avro-tools-1.8.1.jar tojson --pretty inputfile > output.json

使用 NuGet 包 Microsoft.Hadoop.Avro 我能够提取 SequenceNumberOffsetEnqueuedTimeUtc,但由于我不知道 Body 使用什么类型,因此抛出异常.我尝试过 Dictionary 和其他类型.

Using NuGet package Microsoft.Hadoop.Avro I am able to extract SequenceNumber, Offset and EnqueuedTimeUtc, but since I don't know what type to use for Body an exception is thrown. I've tried with Dictionary<string, object> and other types.

static void Main(string[] args)
{
    var fileName = "...";

    using (Stream stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read))
    {
        using (var reader = AvroContainer.CreateReader<EventData>(stream))
        {
            using (var streamReader = new SequentialReader<EventData>(reader))
            {
                var record = streamReader.Objects.FirstOrDefault();
            }
        }
    }
}

[DataContract(Namespace = "Microsoft.ServiceBus.Messaging")]
public class EventData
{
    [DataMember(Name = "SequenceNumber")]
    public long SequenceNumber { get; set; }

    [DataMember(Name = "Offset")]
    public string Offset { get; set; }

    [DataMember(Name = "EnqueuedTimeUtc")]
    public string EnqueuedTimeUtc { get; set; }

    [DataMember(Name = "Body")]
    public foo Body { get; set; }

    // More properties...
}

架构如下所示:

{
  "type": "record",
  "name": "EventData",
  "namespace": "Microsoft.ServiceBus.Messaging",
  "fields": [
    {
      "name": "SequenceNumber",
      "type": "long"
    },
    {
      "name": "Offset",
      "type": "string"
    },
    {
      "name": "EnqueuedTimeUtc",
      "type": "string"
    },
    {
      "name": "SystemProperties",
      "type": {
        "type": "map",
        "values": [ "long", "double", "string", "bytes" ]
      }
    },
    {
      "name": "Properties",
      "type": {
        "type": "map",
        "values": [ "long", "double", "string", "bytes" ]
      }
    },
    {
      "name": "Body",
      "type": [ "null", "bytes" ]
    }
  ]
}    

推荐答案

我能够使用 dynamic 获得完整的数据访问权限.这是用于访问原始 body 数据的代码,该数据存储为字节数组.就我而言,这些字节包含 UTF8 编码的 JSON,但当然这取决于您最初创建发布到事件中心的 EventData 实例的方式:

I was able to get full data access working using dynamic. Here's the code for accessing the raw body data, which is stored as an array of bytes. In my case, those bytes contain UTF8-encoded JSON, but of course it depends on how you initially created your EventData instances that you published to the Event Hub:

using (var reader = AvroContainer.CreateGenericReader(stream))
{
    while (reader.MoveNext())
    {
        foreach (dynamic record in reader.Current.Objects)
        {
            var sequenceNumber = record.SequenceNumber;
            var bodyText = Encoding.UTF8.GetString(record.Body);
            Console.WriteLine($"{sequenceNumber}: {bodyText}");
        }
    }
}

如果有人可以发布静态类型的解决方案,我会赞成它,但考虑到任何系统中更大的延迟几乎肯定是与事件中心存档 blob 的连接,我不会担心解析性能.:)

If someone can post a statically-typed solution, I'll upvote it, but given that the bigger latency in any system will almost certainly be the connection to the Event Hub Archive blobs, I wouldn't worry about parsing performance. :)

这篇关于使用 C# 反序列化 Avro 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆