VGGVox for PyTorch


Implementation of the VGGVox network using pytorch. The implementation is based on the descriptions given in the papers

  • A. Nagrani, J. S. Chung, A. Zisserman, VoxCeleb: a large-scale speaker identification dataset, INTERSPEECH, 2017
  • S. Albanie, A. Nagrani, A. Vedaldi: Emotion Recognition in Speech using Cross-Modal Transfer in the Wild

This repository contains the implementation of the VGGVox network itself, some utility functions for audio processing and an example DataLoader for audio files.