用C#反序列化Avro文件 [英] Deserialize an Avro file with C#

查看:302
本文介绍了用C#反序列化Avro文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我找不到用C#反序列化Apache Avro文件的方法。 Avro文件是由存档功能



使用Java,我可以使用 Apache的Avro Tools 将文件转换为JSON:

  java -jar avro-tools-1.8.1.jar tojson --pretty inputfile> output.json 

使用NuGet包 Microsoft.Hadoop.Avro 提取 SequenceNumber Offset EnqueuedTimeUtc ,但是因为I不知道用于 Body 的类型是否会抛出异常。我试着用 Dictionary< string,object> 和其他类型。 c> static void Main(string [] args)
{
var fileName =...;

using(Stream stream = new FileStream(fileName,FileMode.Open,FileAccess.Read,FileShare.Read))
{
using(var reader = AvroContainer.CreateReader< EventData> ;(stream))
{
using(var streamReader = new SequentialReader< EventData>(reader))
{
var record = streamReader.Objects.FirstOrDefault();




$ DataContract(Namespace =Microsoft.ServiceBus.Messaging)]
public class EventData
{
[DataMember(Name =SequenceNumber)]
public long SequenceNumber {get;组; }
$ b $ [DataMember(Name =Offset)]
public string Offset {get;组; }
$ b $ [DataMember(Name =EnqueuedTimeUtc)]
public string EnqueuedTimeUtc {get;组; }

[DataMember(Name =Body)]
public foo Body {get;组; }

//更多属性...
}

模式如下所示:

  {
type:record,
name :EventData,
namespace:Microsoft.ServiceBus.Messaging,
fields:[
{
name:SequenceNumber,
type:long
},
{
name:Offset,
type:string
},
$ bname:EnqueuedTimeUtc,
type:string
},
{
name:SystemProperties,
type:{
type:map,
values:[long,double,string,bytes]
}


name:属性,
type:{
type:map,
values :[long,double,string,bytes]
}
},
{
name:Body,
type:[null,bytes]
}
]
}


解决方案

我能够使用动态。以下是访问原始 body 数据的代码,数据以字节数组形式存储。在我的情况下,这些字节包含UTF8编码的JSON,但当然取决于您最初如何创建您发布到Event Hub的 EventData 实例:

  using(var reader = AvroContainer.CreateGenericReader(stream))
{
while(reader.MoveNext())
{
foreach(reader.Current.Objects中的动态记录)
{
var sequenceNumber = record.SequenceNumber;
var bodyText = Encoding.UTF8.GetString(record.Body);
Console.WriteLine(${sequenceNumber}:{bodyText});



code
$ b如果有人可以静态发布的解决方案,我会赞成它,但考虑到任何系统中的更大延迟几乎肯定会与事件中心归档blob连接,我不会担心解析性能。 :)

I can't find a way to deserialize an Apache Avro file with C#. The Avro file is a file generated by the Archive feature in Microsoft Azure Event Hubs.

With Java I can use Avro Tools from Apache to convert the file to JSON:

java -jar avro-tools-1.8.1.jar tojson --pretty inputfile > output.json

Using NuGet package Microsoft.Hadoop.Avro I am able to extract SequenceNumber, Offset and EnqueuedTimeUtc, but since I don't know what type to use for Body an exception is thrown. I've tried with Dictionary<string, object> and other types.

static void Main(string[] args)
{
    var fileName = "...";

    using (Stream stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read))
    {
        using (var reader = AvroContainer.CreateReader<EventData>(stream))
        {
            using (var streamReader = new SequentialReader<EventData>(reader))
            {
                var record = streamReader.Objects.FirstOrDefault();
            }
        }
    }
}

[DataContract(Namespace = "Microsoft.ServiceBus.Messaging")]
public class EventData
{
    [DataMember(Name = "SequenceNumber")]
    public long SequenceNumber { get; set; }

    [DataMember(Name = "Offset")]
    public string Offset { get; set; }

    [DataMember(Name = "EnqueuedTimeUtc")]
    public string EnqueuedTimeUtc { get; set; }

    [DataMember(Name = "Body")]
    public foo Body { get; set; }

    // More properties...
}

The schema looks like this:

{
  "type": "record",
  "name": "EventData",
  "namespace": "Microsoft.ServiceBus.Messaging",
  "fields": [
    {
      "name": "SequenceNumber",
      "type": "long"
    },
    {
      "name": "Offset",
      "type": "string"
    },
    {
      "name": "EnqueuedTimeUtc",
      "type": "string"
    },
    {
      "name": "SystemProperties",
      "type": {
        "type": "map",
        "values": [ "long", "double", "string", "bytes" ]
      }
    },
    {
      "name": "Properties",
      "type": {
        "type": "map",
        "values": [ "long", "double", "string", "bytes" ]
      }
    },
    {
      "name": "Body",
      "type": [ "null", "bytes" ]
    }
  ]
}    

解决方案

I was able to get full data access working using dynamic. Here's the code for accessing the raw body data, which is stored as an array of bytes. In my case, those bytes contain UTF8-encoded JSON, but of course it depends on how you initially created your EventData instances that you published to the Event Hub:

using (var reader = AvroContainer.CreateGenericReader(stream))
{
    while (reader.MoveNext())
    {
        foreach (dynamic record in reader.Current.Objects)
        {
            var sequenceNumber = record.SequenceNumber;
            var bodyText = Encoding.UTF8.GetString(record.Body);
            Console.WriteLine($"{sequenceNumber}: {bodyText}");
        }
    }
}

If someone can post a statically-typed solution, I'll upvote it, but given that the bigger latency in any system will almost certainly be the connection to the Event Hub Archive blobs, I wouldn't worry about parsing performance. :)

这篇关于用C#反序列化Avro文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆