

A computer-implemented method for training an anomaly detection neural network system comprising an encoder and a decoder is described. The method includes receiving training data comprising a plurality of training examples, each training example including a training audio waveform and a machine identity (ID); processing the training audio waveform to extract training audio features; receiving an environmental noise audio waveform; processing the environmental noise audio waveform to extract noise features; generating augmented features by combining the extracted training audio features and the noise features; processing, using the encoder, the augmented features to generate latent embeddings; processing, using the decoder, the latent embeddings to generate reconstructed audio features; processing, using a convolutional neural network, the augmented features to generate a predicted machine ID probability distribution; and adjusting, through backpropagation, the current values of the parameters of the encoder and the decoder to minimize an objective function.






