| Paper ID | MLSP-24.2 | 
    | Paper Title | Efficient Adversarial Audio Synthesis via Progressive Upsampling | 
	| Authors | Youngwoo Cho, Korea Advanced Institute of Science and Technology (KAIST), South Korea; Minwook Chang, NCSOFT, South Korea; Sanghyeon Lee, Korea Advanced Institute of Science and Technology (KAIST), South Korea; Hyoungwoo Lee, Gerard Jounghyun Kim, Korea University, South Korea; Jaegul Choo, Korea Advanced Institute of Science and Technology (KAIST), South Korea | 
  | Session | MLSP-24: Applications in Audio and Speech Processing | 
  | Location | Gather.Town | 
  | Session Time: | Wednesday, 09 June, 16:30 - 17:15 | 
  | Presentation Time: | Wednesday, 09 June, 16:30 - 17:15 | 
  | Presentation | Poster | 
	 | Topic | Machine Learning for Signal Processing: [MLR-APPL] Applications of machine learning | 
  
	
    | IEEE Xplore Open Preview | Click here to view in IEEE Xplore | 
  
	
    | Virtual Presentation | Click here to watch in the Virtual Conference | 
  
  
    | Abstract | This paper proposes a novel generative model called \toolname, which progressively synthesizes high-quality audio in raw-waveform. Progressive upsampling GAN (PUGAN) leverages the previous idea of the progressive generation of higher-resolution output by stacking multiple encoder-decoder architectures. Compared to the existing state-of-the-art model called WaveGAN, which uses a single decoder architecture, our model generates audio signals and converts them to a higher resolution in a progressive manner, while using a significantly smaller number of parameters, e.g., 3.17x smaller for 16 kHz output, than the WaveGAN. Our experiments show that the audio signals can be generated in real-time with comparable quality to that of WaveGAN with respect to the inception scores and human perception. |