Toward Complex-Valued Neural Networks for Waveform Generation

Hyung-Seok Oh, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee
Neural vocoders have recently advanced waveform generation, yielding natural and expressive audio. Among these approaches, iSTFT-based vocoders have gained attention. They predict a complex-valued spectrogram and then synthesize the waveform via iSTFT, thereby avoiding redundant, computationally expensive upsampling. However, current approaches use real-valued networks that process the real and imaginary parts independently. This separation limits their ability to capture the inherent structure of complex spectrograms. We present ComVo, a complex-valued neural vocoder whose generator and discriminator use native complex arithmetic. This enables an adversarial training framework that provides structured feedback directly in the complex domain. To guide phase transformations in a structured manner, we introduce phase quantization, which discretizes phase values and regularizes the training process. Finally, we propose a block-matrix computation scheme to improve training efficiency by reducing redundant operations. Experiments demonstrate that ComVo achieves higher synthesis quality than comparable real-valued baselines, and that its block-matrix scheme reduces training time by 25%. Audio samples and code are available at https://hs-oh-prml.github.io/ComVo/.

Speech

LibriTTS

LibriTTS-clean

Speech
LibriTTS_test-clean_1580_141083_1580_141083_000070_000000
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
LibriTTS_test-clean_5683_32879_5683_32879_000004_000000
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
LibriTTS_test-clean_7127_75947_7127_75947_000031_000000
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
LibriTTS_test-clean_8230_279154_8230_279154_000021_000003
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos

LibriTTS-others

Speech
LibriTTS_test-other_1688_142285_1688_142285_000006_000002
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
LibriTTS_test-other_4198_61336_4198_61336_000005_000000
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
LibriTTS_test-other_533_131556_533_131556_000001_000000
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
LibriTTS_test-other_7975_280057_7975_280057_000014_000001
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos

Out-of-distribution

MUSDB18-HQ and YouTube

MUSDB18-HQ Bass

Music
Detsky Sad - Walkie Talkie
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
Moosmusic - Big Dummy Shake
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
Secretariat - Over The Top
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
The Doppler Shift - Atrophy
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos

MUSDB18-HQ Drums

Music
Bobby Nobody - Stitch Up
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
Buitraker - Revo X
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
Georgia Wonder - Siren
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
The Easton Ellises - Falcon 69
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos

MUSDB18-HQ Mixture

Music
James Elder & Mark M Thompson - The English Actor
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
Side Effects Project - Sing With Me
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
The Easton Ellises (Baumi) - SDRNR
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
The Sunshine Garcia Band - For I Am The Moon
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos

MUSDB18-HQ Others

Music
Al James - Schoolboy Facination
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
AM Contra - Heart Peripheral
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
The Doppler Shift - Atrophy
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
We Fell From The Sky - Not You
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos

MUSDB18-HQ Vocals

Music
Ben Carrigan - We'll Talk About It All Tonight
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
Forkupines - Semantics
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
Girls Under Glass - We Feel Alright
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
Side Effects Project - Sing With Me
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos

YouTube Source

In-the-wild
youtube1
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
youtube2
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
youtube3
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos
youtube4
Ground Truth
HiFiGAN (V1)
iSTFTNet
BigVGAN (base)
Vocos