输出到文件和流时,Microsoft SpeechSynthesizer发出裂纹 [英] Microsoft SpeechSynthesizer crackles when outputting to files and streams

查看:80
本文介绍了输出到文件和流时,Microsoft SpeechSynthesizer发出裂纹的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在写一个东西,该文件可以使用SpeechSynthesizer根据要求生成波形文件,但是我在发出crack啪的声音时遇到了问题.奇怪的是,直接输出到声卡就可以了.

I'm writing a thing that uses the SpeechSynthesizer to generate wave files on request, but I'm having problems with crackling noises. The weird thing is that output directly to the sound card is just fine.

尽管我使用C#编写程序,但该简短的Powershell脚本演示了该问题.

This short powershell script demonstrates the issue, though I'm writing my program in C#.

Add-Type -AssemblyName System.Speech
$speech = New-Object System.Speech.Synthesis.SpeechSynthesizer
$speech.Speak('Guybrush Threepwood, mighty pirate!')
$speech.SetOutputToWaveFile("${PSScriptRoot}\foo.wav")
$speech.Speak('Guybrush Threepwood, mighty pirate!')

应该所做的操作将输出到扬声器,然后将相同的声音保存为脚本旁边的"foo.wav".

What this should do, is output to the speakers, and then save that same sound as "foo.wav" next to the script.

它所做的工作将输出到扬声器,然后将有裂纹的旧电唱机发声版本另存为wave文件.我已经在三台不同的机器上对此进行了测试,尽管默认情况下它们选择了不同的声音(所有Microsoft提供了默认声音),但它们听起来都像是wave文件中的垃圾从楼梯上掉下来了.

What it does is output to the speakers, and then save a crackling, old record player sounding version as a wave file. I've tested this on three different machines, and though they select different voices by default (all Microsoft provided default ones), they all sound like garbage falling down stairs in the wave file.

为什么?

我正在Windows 10 Pro上对此进行测试,其最新更新添加了任务栏上令人讨厌的人"按钮.

I am testing this on Windows 10 Pro, with the latest updates that add that annoying "People" button on the taskbar.

这里是使用上述脚本生成的示例声音的链接.请注意the啪作响的声音,当脚本直接输出到扬声器时就不会出现.

女性声音更显着

与上述相同的声音,已通过TextAloud 3保存到文件中-没有破裂,没有垂直尖峰.

推荐答案

这是SpeechSynthesizer API的一个问题,该API只能提供质量差的声音,如上面的示例所示.解决方案是执行TextAloud的操作,即直接使用SpeechLib COM对象.

This is an issue with the SpeechSynthesizer API, which simply provides bad quality, crackling audio as seen in the samples above. The solution is to do what TextAloud does, which is to use the SpeechLib COM objects directly.

这是通过将COM引用添加到"Microsoft语音对象库(5.4)"来完成的.这是我最后得到的代码的片段,它产生的音频片段的质量与TextAloud相同:

This is done by adding a COM reference to "Microsoft Speech Object Library (5.4)". Here is a snippet of the code I ended up with, which produces audio clips of the same quality as TextAloud:

public new static byte[] GetSound(Order o)
{
    const SpeechVoiceSpeakFlags speechFlags = SpeechVoiceSpeakFlags.SVSFlagsAsync;
    var synth = new SpVoice();
    var wave = new SpMemoryStream();
    var voices = synth.GetVoices();
    try
    {
        // synth setup
        synth.Volume = Math.Max(1, Math.Min(100, o.Volume ?? 100));
        synth.Rate = Math.Max(-10, Math.Min(10, o.Rate ?? 0));
        foreach (SpObjectToken voice in voices)
        {
            if (voice.GetAttribute("Name") == o.Voice.Name)
            {
                synth.Voice = voice;
            }
        }
        wave.Format.Type = SpeechAudioFormatType.SAFT22kHz16BitMono;
        synth.AudioOutputStream = wave;
        synth.Speak(o.Text, speechFlags);
        synth.WaitUntilDone(Timeout.Infinite);

        var waveFormat = new WaveFormat(22050, 16, 1);
        using (var ms = new MemoryStream((byte[])wave.GetData()))
        using (var reader = new RawSourceWaveStream(ms, waveFormat))
        using (var outStream = new MemoryStream())
        using (var writer = new WaveFileWriter(outStream, waveFormat))
        {
            reader.CopyTo(writer);
            return o.Mp3 ? ConvertToMp3(outStream) : outStream.GetBuffer();
        }
    }
    finally
    {
        Marshal.ReleaseComObject(voices);
        Marshal.ReleaseComObject(wave);
        Marshal.ReleaseComObject(synth);
    }
}

这是将wave文件转换为mp3的代码.它使用nuget的NAudio.Lame.

This is the code to convert a wave file to mp3. It uses NAudio.Lame from nuget.

internal static byte[] ConvertToMp3(Stream wave)
{
    wave.Position = 0;
    using (var mp3 = new MemoryStream())
    using (var reader = new WaveFileReader(wave))
    using (var writer = new LameMP3FileWriter(mp3, reader.WaveFormat, 128))
    {
        reader.CopyTo(writer);
        mp3.Position = 0;
        return mp3.ToArray();
    }
}

这篇关于输出到文件和流时,Microsoft SpeechSynthesizer发出裂纹的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆