如何从 Google 的 AudioSet 中提取音频嵌入(特征)? [英] How can I extract the audio embeddings (features) from Google’s AudioSet?

查看:91
本文介绍了如何从 Google 的 AudioSet 中提取音频嵌入(特征)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我说的是 https://research.google 上的音频特征数据集.com/audioset/download.html 作为包含帧级音频 tfrecords 的 tar.gz 存档.

I’m talking about the audio features dataset available at https://research.google.com/audioset/download.html as a tar.gz archive consisting of frame-level audio tfrecords.

从 tfrecord 文件中提取所有其他内容工作正常(我可以提取键:video_id、start_time_seconds、end_time_seconds、labels),但训练所需的实际嵌入似乎根本不存在.当我遍历数据集中任何 tfrecord 文件的内容时,只打印四个键 video_id、start_time_seconds、end_time_seconds 和标签.

Extracting everything else from the tfrecord files works fine (I could extract the keys: video_id, start_time_seconds, end_time_seconds, labels), but the actual embeddings needed for training do not seem to be there at all. When I iterate over the contents of any tfrecord file from the dataset, only the four keys video_id, start_time_seconds, end_time_seconds, and labels, are printed.

这是我正在使用的代码:

This is the code I'm using:

import tensorflow as tf
import numpy as np

def readTfRecordSamples(tfrecords_filename):

    record_iterator = tf.python_io.tf_record_iterator(path=tfrecords_filename)

    for string_record in record_iterator:
        example = tf.train.Example()
        example.ParseFromString(string_record)
        print(example)  # this prints the abovementioned 4 keys but NOT audio_embeddings

        # the first label can be then parsed like this:
        label = (example.features.feature['labels'].int64_list.value[0])
        print('label 1: ' + str(label))

        # this, however, does not work:
        #audio_embedding = (example.features.feature['audio_embedding'].bytes_list.value[0])

readTfRecordSamples('embeddings/01.tfrecord')

提取 128 维嵌入有什么技巧吗?或者他们真的不在这个数据集中?

Is there any trick to extracting the 128-dimensional embeddings? Or are they really not in this dataset?

推荐答案

解决了,tfrecord文件需要作为序列示例读取,不能作为示例读取.上面的代码如果行

Solved it, the tfrecord files need to be read as sequence examples, not as examples. The above code works if the line

example = tf.train.Example()

被替换为

example = tf.train.SequenceExample()

然后只需运行即可查看嵌入和所有其他内容

The embeddings and all other content can then be viewed by simply running

print(example)

这篇关于如何从 Google 的 AudioSet 中提取音频嵌入(特征)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆