Convert PyTorch* QuartzNet to the Intermediate Representation

NeMo project provides the QuartzNet model.

Download the Pre-Trained QuartzNet Model

To download the pre-trained model, refer to the NeMo Speech Models Catalog. Here are the instructions on how to obtain QuartzNet in ONNX* format.

import nemo
import nemo.collections.asr as nemo_asr

quartznet = nemo_asr.models.ASRConvCTCModel.from_pretrained(model_info='QuartzNet15x5-En')
# Export QuartzNet model to ONNX* format
quartznet.export('qn.onnx')

This code produces 3 ONNX* model files: encoder_qt.onnx, decoder_qt.onnx, qn.onnx. They are decoder, encoder and a combined decoder(encoder(x)) models, respectively.

Convert ONNX* QuartzNet model to IR

If using a combined model:

./mo.py --input_model <MODEL_DIR>/qt.onnx --input_shape [B,64,X]

If using separate models:

./mo.py --input_model <MODEL_DIR>/encoder_qt.onnx --input_shape [B,64,X]
./mo.py --input_model <MODEL_DIR>/decoder_qt.onnx --input_shape [B,1024,Y]

Where shape is determined by the audio file Mel-Spectrogram length: B - batch dimension, X - dimension based on the input length, Y - determined by encoder output, usually X / 2.