Asr And Communication Processing; Xvf3510-Int - For Integrated Voice Interface Applications - XMOS VocalFusion XVF3510 User Manual

Table of Contents

Advertisement

of. A reference copy of the audio is provided to the AEC in order for it to accurately estimate the
echo.
}
Automatic Delay Estimation & Control (ADEC): Automatically monitors and automatically
compensates for the delay between the reference audio and the echo received by the
microphone.
Following echo cancellation, the ASR and communications paths diverge to permit parameter tuning
appropriate for the individual audio output use cases.
}
Interference Cancellation (IC): Suppresses static noise from point sources such as cooker
hoods, washing machines, or radios for which there is no reference audio signal available.
}
Voice Activity Detection (VAD): Controls adaption the IC and AGC to optimise output for near-
end speech.
}
Noise Suppression (NS): Suppresses diffuse noise from sources whose frequency
characteristics do not change rapidly over time (i.e., diffuse stationary noise).
}
Automatic Gain Control (AGC): Controls the audio output level via separate AGC channels for
Automatic Speech Recognition (ASR) and communications output. The VAD is used to prevent
gain changes during speech to improve speech recognition performance.
The pipeline has been designed to minimise the need to tune and modify these functions. However, if
required for specific use cases, these later sections of this document provide details of the relevant
parameters and processes.

2.3. ASR AND COMMUNICATION PROCESSING

The audio pipeline discussed above produces two separate audio streams, one specifically tuned for
integration with keyword and ASR services and the other designed for conferencing and
communication applications. Both processed audio streams are available to be output at the same
using the left and right channels of USB and I2S. The default configuration is as follows:
Table 2-1 Default channel mapping (both USB and I2S)
CHANNEL
[0] - Left
[1] - Right
In situations where an ASR is used to invoke a call it may be necessary to continually monitor the ASR
channel for a 'end call' intent. The parallel output of both ASR and Communications processed streams
allow the combination of high-quality calling audio with the tuned ASR capability.
The IO_MAP configuration parameter (see
configure both channels to be ASR or Communications if required.

2.4. XVF3510-INT - FOR INTEGRATED VOICE INTERFACE APPLICATIONS

The XVF3510-INT product embeds the core audio processing pipeline in an audio infrastructure that
supports rate conversion, filtering and signal routing. This infrastructure is controllable by the host
system via a set of control registers. In addition, the XVF3510-INT provides a set of peripheral
interfaces to the host system to other devices, eg digital inputs, LEDs, SPI peripherals etc.
The peripheral interfaces supported include an interface to an optional QSPI Flash device containing
the XVF3510 firmware and configuration information that is loaded by the processor on start-up.
XM-014232-PC
DEFAULT
Automatic Speech Recognition (ASR) optimised
Communications
Signal flow and processing
section) allows users to also
9

Advertisement

Table of Contents
loading

This manual is also suitable for:

Vocalfusion xvf3510-intVocalfusion xvf3510-ua

Table of Contents