Alternative Neural Networks for DeepVariant

ISP RAS Skoltech BIOPOLIS MSU
1ISP RAS, Moscow, Russia 2Skoltech, Moscow, Russia
3BIOPOLIS, Vairão, Portugal 4MSU, Moscow, Russia

Abstract


Google DeepVariant pipeline for variant calling diverged from traditional statistic‑based methods by rethinking task as an image‑classification problem well suited to convolutional neural networks. Despite broad adoption and steady evolution, most of DeepVariant improvements have been focused on expanding and refining training data. Surprisingly there have been no efforts to modify baseline Inception V3 neural network at the core of pipeline – despite rapid advances in image-classification field. We for the first time adapted DeepVariant pipeline to use alternative neural networks and, as a proof of concept, tested whether replacing DeepVariant’s original model could improve variant calling accuracy.




Results


We extended original training pipeline to support alternative neural networks and selected a representative model to showcase their potential. Model choice was primarily aimed at balancing accuracy and computational efficiency, so mid-sized EfficientNet-B3, which offers a strong accuracy gain while maintaining a favorable size and GPU time, was chosen as a suitable candidate for training and integration into the DeepVariant pipeline.




Training and testing were implemented on a downsampled Genome in a Bottle (GIAB) dataset, following the original pipeline. Proposed alternative model, with a twofold reduction in parameters and shorter training time, exhibits more stable convergence and generalization. On an independent test set, this trend persists with F1-score gains of 0.1% for SNPs and 0.2% for indels, enabling detection of up to several hundred additional true variants per genome.


Case we present, we hope, will contribute to rethinking current practices in genomic variant calling pipelines by leveraging well-designed models rather than relying solely on scaling legacy architectures with ever-larger volumes of training data. We believe that exploring advanced neural network designs could improve accuracy and efficiency of variant calling. Your experiments, feedback, and contributions to further this work are also welcomed.