如何将 Librosa 光谱图保存为特定大小的图像? [英] How can I save a Librosa spectrogram plot as a specific sized image?

查看：25 发布时间：2021/12/20 23:31:02 python matplotlib audio librosa

本文介绍了如何将 Librosa 光谱图保存为特定大小的图像?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我想将频谱图图像提供给卷积神经网络，以尝试对各种声音进行分类.我希望每个图像都是 384x128 像素.但是，当我实际保存图像时，它只有 297x98.这是我的代码:

So I'm wanting to feed spectrogram images to a convolutional neural network as an attempt to classify various sounds. I want each image to be exactly 384x128 pixels. However, when I actually save the image it is only 297x98. Here's my code:

def save_spectrogram(num):
  dpi = 128
  x_pixels = 384
  y_pixels = 128
  samples, sr = load_wave(num)
  stft = np.absolute(librosa.stft(samples))
  db = librosa.amplitude_to_db(stft, ref=np.max)
  fig = plt.figure(figsize=(x_pixels//dpi, y_pixels//dpi), dpi=dpi, frameon=False)
  ax = fig.add_subplot(111)
  ax.axes.get_xaxis().set_visible(False)
  ax.axes.get_yaxis().set_visible(False)
  ax.set_frame_on(False)
  librosa.display.specshow(db, y_axis='linear')
  plt.savefig(TRAIN_IMG+str(num)+'.jpg', bbox_inches='tight', pad_inches=0, dpi=dpi)

有人对我如何解决这个问题有任何指示吗?我也试过在没有子图的情况下这样做，但是当我这样做时，它仍然保存为错误的大小并且有空白/背景.

Does anyone have any pointers on how I can fix this? I've also tried doing it without the subplot, but when I do that it still saves as the wrong size AND has white space/background.

推荐答案

绘图供人类查看，其中包含对机器学习无用的轴标记、标签等内容.要为模型提供频谱图的图像"，应仅输出数据.这些数据可以以任何格式存储，但如果您想使用标准图像格式，则应使用 PNG.JPEG 等有损压缩会引入压缩伪像.

Plots are for humans to look at, and contains things like axis markers, labels etc that are not useful for machine learning. To feed a model with an 'image' of the spectrogram, one should output only the data. This data be stored in any format, but if you want to use a standard image format then should use PNG. Lossy compression such as JPEG introduces compression artifacts.

以下是保存频谱图的工作示例代码.请注意，为了获得固定大小的图像输出，代码会提取音频信号的固定长度窗口.将音频流划分为此类固定长度的分析窗口是标准做法.

Here follows working example code to save spectrogram. Note that to get a fixed size image output, the code extracts a fixed-length window of the audio signal. Dividing an audio stream into such fixed-length analysis windows is standard practice.

import librosa
import numpy
import skimage.io

def scale_minmax(X, min=0.0, max=1.0):
    X_std = (X - X.min()) / (X.max() - X.min())
    X_scaled = X_std * (max - min) + min
    return X_scaled

def spectrogram_image(y, sr, out, hop_length, n_mels):
    # use log-melspectrogram
    mels = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=n_mels,
                                            n_fft=hop_length*2, hop_length=hop_length)
    mels = numpy.log(mels + 1e-9) # add small number to avoid log(0)

    # min-max scale to fit inside 8-bit range
    img = scale_minmax(mels, 0, 255).astype(numpy.uint8)
    img = numpy.flip(img, axis=0) # put low frequencies at the bottom in image
    img = 255-img # invert. make black==more energy

    # save as PNG
    skimage.io.imsave(out, img)


if __name__ == '__main__':
    # settings
    hop_length = 512 # number of samples per time-step in spectrogram
    n_mels = 128 # number of bins in spectrogram. Height of image
    time_steps = 384 # number of time-steps. Width of image

    # load audio. Using example from librosa
    path = librosa.util.example_audio_file()
    y, sr = librosa.load(path, offset=1.0, duration=10.0, sr=22050)
    out = 'out.png'

    # extract a fixed length window
    start_sample = 0 # starting at beginning
    length_samples = time_steps*hop_length
    window = y[start_sample:start_sample+length_samples]
    
    # convert to PNG
    spectrogram_image(window, sr=sr, out=out, hop_length=hop_length, n_mels=n_mels)
    print('wrote file', out)

输出

这篇关于如何将 Librosa 光谱图保存为特定大小的图像?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将 Librosa 光谱图保存为特定大小的图像? [英] How can I save a Librosa spectrogram plot as a specific sized image?

问题描述

推荐答案

输出

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何将 Librosa 光谱图保存为特定大小的图像? [英] How can I save a Librosa spectrogram plot as a specific sized image?

问题描述

推荐答案

输出

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭