使用Java增加/降低AudioInputStream的音频播放速度 [英] Increase/decrease audio play speed of AudioInputStream with Java

查看:1404
本文介绍了使用Java增加/降低AudioInputStream的音频播放速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


使用Java进入复杂的音频世界我正在使用这个
,基本上我在Github上进行了改进和发布。

Getting into the complex world of audio using Java I am using this library , which basically I improved and published on Github.

库的主要类是 StreamPlayer 并且代码有注释并且很容易理解。

The main class of the library is StreamPlayer and the code has comments and is straightforward to understand.

问题是它支持许多功能除了提高/降低音频速度之外。让我们说改变视频速度就像YouTube一样。

The problem is that it supports many functionalities except speed increase/decrease audio speed. Let's say like YouTube does when you change the video speed.

我不知道如何实现这样的功能。我的意思是,在将音频写入 targetFormat 的采样率时,我该怎么办?我每次都要一次又一次地重启音频....

I have no clue how I can implement such a functionality. I mean, what can I do when writing the audio to the sample rate of targetFormat? I have to restart the audio again and again every time....

AudioFormat targetFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, sourceFormat.getSampleRate()*2, nSampleSizeInBits, sourceFormat.getChannels(),
                nSampleSizeInBits / 8 * sourceFormat.getChannels(), sourceFormat.getSampleRate(), false);






播放音频的代码是:


The code of playing the audio is:

/**
 * Main loop.
 *
 * Player Status == STOPPED || SEEKING = End of Thread + Freeing Audio Resources.<br>
 * Player Status == PLAYING = Audio stream data sent to Audio line.<br>
 * Player Status == PAUSED = Waiting for another status.
 */
@Override
public Void call() {
    //  int readBytes = 1
    //  byte[] abData = new byte[EXTERNAL_BUFFER_SIZE]
    int nBytesRead = 0;
    int audioDataLength = EXTERNAL_BUFFER_SIZE;
    ByteBuffer audioDataBuffer = ByteBuffer.allocate(audioDataLength);
    audioDataBuffer.order(ByteOrder.LITTLE_ENDIAN);

    // Lock stream while playing.
    synchronized (audioLock) {
        // Main play/pause loop.
        while ( ( nBytesRead != -1 ) && status != Status.STOPPED && status != Status.SEEKING && status != Status.NOT_SPECIFIED) {
            try {
                //Playing?
                if (status == Status.PLAYING) {

                    // System.out.println("Inside Stream Player Run method")
                    int toRead = audioDataLength;
                    int totalRead = 0;

                    // Reads up a specified maximum number of bytes from audio stream   
                    //wtf i have written here xaxaxoaxoao omg //to fix! cause it is complicated
                    for (; toRead > 0
                            && ( nBytesRead = audioInputStream.read(audioDataBuffer.array(), totalRead, toRead) ) != -1; toRead -= nBytesRead, totalRead += nBytesRead)

                        // Check for under run
                        if (sourceDataLine.available() >= sourceDataLine.getBufferSize())
                            logger.info(() -> "Underrun> Available=" + sourceDataLine.available() + " , SourceDataLineBuffer=" + sourceDataLine.getBufferSize());

                    //Check if anything has been read
                    if (totalRead > 0) {
                        trimBuffer = audioDataBuffer.array();
                        if (totalRead < trimBuffer.length) {
                            trimBuffer = new byte[totalRead];
                            //Copies an array from the specified source array, beginning at the specified position, to the specified position of the destination array
                            // The number of components copied is equal to the length argument. 
                            System.arraycopy(audioDataBuffer.array(), 0, trimBuffer, 0, totalRead);
                        }

                        //Writes audio data to the mixer via this source data line
                        sourceDataLine.write(trimBuffer, 0, totalRead);

                        // Compute position in bytes in encoded stream.
                        int nEncodedBytes = getEncodedStreamPosition();

                        // Notify all registered Listeners
                        listeners.forEach(listener -> {
                            if (audioInputStream instanceof PropertiesContainer) {
                                // Pass audio parameters such as instant
                                // bit rate, ...
                                listener.progress(nEncodedBytes, sourceDataLine.getMicrosecondPosition(), trimBuffer, ( (PropertiesContainer) audioInputStream ).properties());
                            } else
                                // Pass audio parameters
                                listener.progress(nEncodedBytes, sourceDataLine.getMicrosecondPosition(), trimBuffer, emptyMap);
                        });

                    }

                } else if (status == Status.PAUSED) {

                    //Flush and stop the source data line 
                    if (sourceDataLine != null && sourceDataLine.isRunning()) {
                        sourceDataLine.flush();
                        sourceDataLine.stop();
                    }
                    try {
                        while (status == Status.PAUSED) {
                            Thread.sleep(50);
                        }
                    } catch (InterruptedException ex) {
                        Thread.currentThread().interrupt();
                        logger.warning("Thread cannot sleep.\n" + ex);
                    }
                }
            } catch (IOException ex) {
                logger.log(Level.WARNING, "\"Decoder Exception: \" ", ex);
                status = Status.STOPPED;
                generateEvent(Status.STOPPED, getEncodedStreamPosition(), null);
            }
        }

        // Free audio resources.
        if (sourceDataLine != null) {
            sourceDataLine.drain();
            sourceDataLine.stop();
            sourceDataLine.close();
            sourceDataLine = null;
        }

        // Close stream.
        closeStream();

        // Notification of "End Of Media"
        if (nBytesRead == -1)
            generateEvent(Status.EOM, AudioSystem.NOT_SPECIFIED, null);

    }
    //Generate Event
    status = Status.STOPPED;
    generateEvent(Status.STOPPED, AudioSystem.NOT_SPECIFIED, null);

    //Log
    logger.info("Decoding thread completed");

    return null;
}

如果需要,可以随意下载并查看图书馆。 :)我需要一些帮助...... 图书馆链接

Feel free to download and check out the library alone if you want. :) I need some help on this... Library link.

推荐答案

简答:

为了加快一个人说话,请使用我的<一个href =https://github.com/waywardgeek/sonic/blob/master/Sonic.java =nofollow noreferrer> Sonic.java 我的Sonic算法的本机Java实现。有关如何使用它的示例,请参见 Main.Java 。 Android的AudioTrack使用相同算法的C语言版本。为了加快音乐或电影的速度,找一个基于WSOLA的库。

For speeding up a single person speaking, use my Sonic.java native Java implementation of my Sonic algorithm. An example of how to use it is in Main.Java. A C-language version of the same algorithm is used by Android's AudioTrack. For speeding up music or movies, find a WSOLA based library.

臃肿的回答:

加速发言是比听起来更复杂。在不调整样本的情况下简单地提高采样率将使扬声器听起来像花栗鼠。我已经听过基本上两种用于线性加速语音的好方案:基于固定帧的方案,如WSOLA,以及像PICOLA这样的音调同步方案,Sonic使用这种方案,速度高达2倍。我听过的另一个方案是基于FFT的,IMO应该避免这些实现。我听说有传言说可以很好地完成基于FFT的操作,但是上次我检查的时候,我所知道的开源版本都没有用,可能是在2014年。

Speeding up speech is more complex than it sounds. Simply increasing the sample rate without adjusting the samples will cause speakers to sound like chipmunks. There are basically two good schemes for linearly speeding up speech that I have listened to: fixed-frame based schemes like WSOLA, and pitch-synchronous schemes like PICOLA, which is used by Sonic for speeds up to 2X. One other scheme I've listened to is FFT-based, and IMO those implementations should be avoided. I hear rumor that FFT-based can be done well, but no open-source version I am aware of was usable the last time I checked, probably in 2014.

I不得不发明一种速度大于2倍的新算法,因为PICOLA只会降低整个音高周期,只要你不连续两个音高周期就可以正常工作。对于超过2X的速度,Sonic会在每个输入音高周期的一部分样本中混合,保留每个输入音高周期的一些频率信息。这对于大多数语音都很有效,尽管像匈牙利语这样的语言似乎有很短的部分,甚至PICOLA都会破坏一些音素。但是,一般规则是你可以在不设置音素的情况下减少一个音高周期似乎在大多数情况下运行良好。

I had to invent a new algorithm for speeds greater than 2X, since PICOLA simply drops entire pitch periods, which works well so long as you don't drop two pitch periods in a row. For faster than 2X, Sonic mixes in a portion of samples from each input pitch period, retaining some frequency information from each. This works well for most speech, though some languages such as Hungarian appear to have parts of speech so short that even PICOLA mangles some phonemes. However, the general rule that you can drop one pitch period without mangling phonemes seems to work well most of the time.

音高同步方案专注于一个扬声器,并且将会通常使扬声器比固定帧方案更清晰,代价是屠杀非语音声音。然而,对于大多数扬声器而言,在低于约1.5X的速度下,难以听到基于固定帧方案的音调同步方案的改进。这是因为当只有一个扬声器并且每帧需要丢弃不超过一个音调周期时,像WSOLA这样的固定帧算法基本上模拟像PICOLA这样的音调同步方案。如果WSOLA很好地适应了扬声器,那么在这种情况下,数学运算基本相同。例如,如果它能够及时选择+/-一帧的声音片段,那么50ms的固定帧将允许WSOLA为基本音高> 100 Hz的大多数扬声器模拟PICOLA。然而,使用这些设置,使用WSOLA对具有95Hz的深度声音的男性进行屠杀​​。此外,当参数未被最佳调谐时,WSOLA也可以对语音的部分,例如在句子末尾,我们的基本音高显着下降。此外,WSOLA通常会因速度超过2倍而分崩离析,像PICOLA一样,它开始连续多个音高周期。

Pitch-synchronous schemes focus on one speaker, and will generally make that speaker clearer than fixed-frame schemes, at the expense of butchering non-speech sounds. However, the improvement of pitch synchronous schemes over fixed-frame schemes is hard to hear at speeds less than about 1.5X for most speakers. This is because fixed-frame algorithms like WSOLA basically emulate pitch synchronous schems like PICOLA when there is only one speaker and no more than one pitch period needs to be dropped per frame. The math works out basically the same in this case if WSOLA is tuned well to the speaker. For example, if it is able to select a sound segment of +/- one frame in time, then a 50ms fixed frame will allow WSOLA to emulate PICOLA for most speakers who have a fundamental pitch > 100 Hz. However, a male with a deep voice of say 95 Hz would be butchered with WSOLA using those settings. Also, parts of speech, such as at the end of a sentence, where our fundamental pitch drops significantly can also be butchered by WSOLA when parameters are not optimally tuned. Also, WSOLA generally falls apart for speeds greater than 2X, where like PICOLA, it starts dropping multiple pitch periods in a row.

从积极的方面来说,WSOLA将获得最多声音包括音乐可以理解,如果不是高保真。使用像WSOLA和PICOLA这样的重叠和添加(OLA)方案,采用非谐波多声音和改变速度而不会引入实质性失真是不可能的。
做得好这需要分离不同的声音,独立地改变它们的速度,并将结果混合在一起。然而,大多数音乐都是谐波,足以让WSOLA听起来不错。

On the positive side, WSOLA will make most sounds including music understandable, if not high fidelity. Taking non-harmonic multi-voice sounds and changing the speed without introducing substantial distortion is impossible with overlap-and-add (OLA) schemes like WSOLA and PICOLA. Doing this well would require separating the different voices, changing their speeds independently, and mixing the results together. However, most music is harmonic enough to sound OK with WSOLA.

事实证明,WSOLA质量低于2倍是人们很少以高于2X。人们根本不喜欢它。一旦Audible.com从WSOLA切换到Android上的类似Sonic的算法,他们就能够将支持的速度范围从2X增加到3X。在过去的几年里我没有在iOS上听过,但截至2014年,iOS上的Audible.com很难以3倍的速度收听,因为他们使用了内置的iOS WSOLA库。从那以后,他们很可能已经修好了。

It turns out that the poor quality of WSOLA at > 2X is one reason folks rarely listen at higher speeds than 2X. Folks simply don't like it. Once Audible.com switched from WSOLA to a Sonic-like algorithm on Android, they were able to increase the supported speed range from 2X to 3X. I haven't listened on iOS in the last few years, but as of 2014, Audible.com on iOS was miserable to listen to at 3X speed, since they used the built-in iOS WSOLA library. They've likely fixed it since then.

这篇关于使用Java增加/降低AudioInputStream的音频播放速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆