TRVFFmpegSpeechToTextProperty.BufferDuration

<< Click to display table of contents >>

TRVFFmpegSpeechToTextProperty.BufferDuration

The size of the speech recognition buffer, in milliseconds. A larger value causes more latency and possibly longer pauses, but overall reduces system load and increases quality.

property BufferDuration: Cardinal;

The maximum size that will be queued before processing the audio with the speech recognition model.

Using a small value the audio stream will be processed more often, but the transcription quality will be lower and the required processing power will be higher. Using a large value (e.g. 10000-20000) will produce more accurate results using less CPU/GPU, but the transcription latency will be higher, thus not useful to process real-time streams.

Consider using the VAD model option associated with a large BufferDuration value.

If the value of this property is changed during a speech recognition session, the new value is not used in that session. It will be used the next time speech recognition is run. See Active.

Default value:

3000 (3 seconds)