AudioSet 和 Tensorflow 理解 [英] AudioSet and Tensorflow Understanding

查看:40
本文介绍了AudioSet 和 Tensorflow 理解的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

随着 AudioSet 的发布,并为那些从事这项工作的人提供了一个全新的研究领域用于研究的声音分析,最近几天我一直在努力深入研究如何分析和解码此类数据.

With AudioSet released and providing a brand new area of research for those who do sound analysis for research, I've been really trying to dig deep these last few days on how to analyze and decode such data.

数据在 .tfrecord 文件中给出,这是一个小片段.

The data is given in .tfrecord files, heres a small snippet.

�^E^@^@^@^@^@^@C�bd
u
^[
^Hvideo_id^R^O

^KZZcwENgmOL0
^^
^Rstart_time_seconds^R^H^R^F
^D^@^@�C
^X
^Flabels^R^N^Z^L

�^B�^B�^B�^B�^B
^\
^Pend_time_seconds^R^H^R^F
^D^@^@�C^R�

�

^Oaudio_embedding^R�

�^A
�^A
�^A3�^] q^@�Z�r�����w���Q����.���^@�b�{m�^@P^@^S����,^]�x�����:^@����^@^@^Z0��^@]^Gr?v(^@^U^@��^EZ6�$
�^A

给出的示例原型是:

context: {
  feature: {
    key  : "video_id"
    value: {
      bytes_list: {
        value: [YouTube video id string]
      }
    }
  }
  feature: {
    key  : "start_time_seconds"
    value: {
      float_list: {
        value: 6.0
      }
    }
  }
  feature: {
    key  : "end_time_seconds"
    value: {
      float_list: {
        value: 16.0
      }
    }
  }
  feature: {
    key  : "labels"
      value: {
        int64_list: {
          value: [1, 522, 11, 172] # The meaning of the labels can be found here.
        }
      }
    }
}
feature_lists: {
  feature_list: {
    key  : "audio_embedding"
    value: {
      feature: {
        bytes_list: {
          value: [128 8bit quantized features]
        }
      }
      feature: {
        bytes_list: {
          value: [128 8bit quantized features]
        }
      }
    }
    ... # Repeated for every second of the segment
  }

}

我在这里非常直接的问题 - 我似乎找不到好的信息是 - 我如何在两者之间进行干净的转换?

My very direct question here - something that I can't seem to find good information on is - how do I convert cleanly between the two?

如果我有一个机器可读的文件,如何使它成为人类可读的,以及反过来.

If I have a machine readable file, how to make it human readable, as well as the other way around.

我发现这个图片的 tfrecord 并将其转换为可读格式...但我似乎无法将其转换为适用于 AudioSet 的形式

I have found this which takes a tfrecord of a picture and converts it to a readable format... but I can't seem to get it to a form that works with AudioSet

推荐答案

这对我有用,将功能存储在 feat_audio 中.绘制它们,将它们转换为 ndarray 并相应地重塑它们.

this worked for me, storing the features in feat_audio. to plot them, convert them to an ndarray and reshape them accordingly.

audio_record = '/audioset_v1_embeddings/eval/_1.tfrecord'
vid_ids = []
labels = []
start_time_seconds = [] # in secondes
end_time_seconds = []
feat_audio = []
count = 0
for example in tf.python_io.tf_record_iterator(audio_record):
    tf_example = tf.train.Example.FromString(example)
    #print(tf_example)
    vid_ids.append(tf_example.features.feature['video_id'].bytes_list.value[0].decode(encoding='UTF-8'))
    labels.append(tf_example.features.feature['labels'].int64_list.value)
    start_time_seconds.append(tf_example.features.feature['start_time_seconds'].float_list.value)
    end_time_seconds.append(tf_example.features.feature['end_time_seconds'].float_list.value)

    tf_seq_example = tf.train.SequenceExample.FromString(example)
    n_frames = len(tf_seq_example.feature_lists.feature_list['audio_embedding'].feature)

    sess = tf.InteractiveSession()
    rgb_frame = []
    audio_frame = []
    # iterate through frames
    for i in range(n_frames):
        audio_frame.append(tf.cast(tf.decode_raw(
                tf_seq_example.feature_lists.feature_list['audio_embedding'].feature[i].bytes_list.value[0],tf.uint8)
                       ,tf.float32).eval())

    sess.close()
    feat_audio.append([])

    feat_audio[count].append(audio_frame)
    count+=1

这篇关于AudioSet 和 Tensorflow 理解的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆