Joon Son Chung
정준선
Research Scientist
Naver Corporation
Google Scholar
GitHub
Highlights
Publications

2020

  • Perfect Match: Self-Supervised Embeddings for Cross-modal Retrieval
    S. W. Chung, J. S. Chung, H. G. Kang
    Journal of Selected Topics in Signal Processing
    PDF

  • Spot the conversation: speaker diarisation in the wild
    J. S. Chung*, J. Huh*, A. Nagrani*, T. Afouras, A. Zisserman
    Interspeech
    PDF | Project page

  • Now you’re speaking my language: Visual language identification
    T. Afouras, J. S. Chung, A. Zisserman
    Interspeech
    PDF coming soon

  • In defence of metric learning for speaker recognition
    J. S. Chung, J. Huh, S. Mun, M. Lee, H. Heo, S. Choe, C. Ham, S. Jung, B. Lee, I. Han
    Interspeech
    PDF | Code

  • Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision
    S. W. Chung, H. G. Kang, J. S. Chung
    Interspeech
    PDF

  • FaceFilter: Audio-visual speech separation using still images
    S. W. Chung, S. Choe, J. S. Chung, H. G. Kang
    Interspeech
    PDF | Video

  • Self-supervised learning of audio-visual objects from video
    T. Afouras, A. Owens, J. S. Chung, A. Zisserman
    European Conference on Computer Vision
    PDF

  • BSL-1K: Scaling up co-articulated sign recognition using mouthing cues
    S. Albanie, G. Varol, L. Momeni, T. Afouras, J. S. Chung, N. Fox, A. Zisserman
    European Conference on Computer Vision
    PDF

  • Delving into VoxCeleb: environment invariant speaker recognition
    J. S. Chung*, J. Huh*, S. Mun
    Speaker Odyssey
    PDF

  • ASR is all you need: Cross-modal distillation for lip reading
    T. Afouras, J. S. Chung, A. Zisserman
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

  • Disentangled Speech Embeddings using Cross-Modal Self-Supervision
    A. Nagrani*, J. S. Chung*, S. Albanie*, A. Zisserman
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

  • The sound of my voice: speaker representation loss for target voice separation
    S. Mun, S. Choe, J. Huh, J. S. Chung
    International Conference on Acoustics, Speech, and Signal Processing
    PDF

2019

  • Deep Audio-Visual Speech Recognition
    T. Afouras*, J. S. Chung*, A. Senior, O. Vinyals, A. Zisserman
    IEEE Transactions on Pattern Analysis and Machine Intelligence
    PDF | Dataset

  • You said that? : Synthesising talking faces from audio
    A. Jamaludin*, J. S. Chung*, A. Zisserman
    International Journal of Computer Vision
    PDF

  • VoxCeleb: Large-scale speaker verification in the wild
    A. Nagrani*, J. S. Chung*, W. Xie, A. Zisserman
    Computer Speech and Language
    PDF

  • Who said that?: Audio-visual speaker diarisation of real-world meetings
    J. S. Chung, B. Lee, I. Han
    Interspeech
    PDF

  • My lips are concealed: Audio-visual speech enhancement through obstructions
    T. Afouras, J. S. Chung, A. Zisserman
    Interspeech
    PDF | Project page

  • Naver at ActivityNet Challenge 2019--Task B Active Speaker Detection (AVA)
    J. S. Chung
    International Challenge on Activity Recognition
    PDF

  • Utterance-level Aggregation For Speaker Recognition In The Wild
    W. Xie, A. Nagrani, J. S. Chung, A. Zisserman
    International Conference on Acoustics, Speech, and Signal Processing
    PDF | Project page

  • Perfect match: Improved cross-modal embeddings for audio-visual synchronisation
    S. W. Chung, J. S. Chung, H. G. Kang
    International Conference on Acoustics, Speech, and Signal Processing
    PDF | Model

2018

  • Learning to Lip Read Words by Watching Videos
    J. S. Chung, A. Zisserman
    Computer Vision and Image Understanding
    PDF

  • VoxCeleb2: Deep Speaker Recognition
    J. S. Chung*, A. Nagrani*, A. Zisserman
    Interspeech
    PDF | Dataset

  • The Conversation: Deep Audio-Visual Speech Enhancement
    T. Afouras, J. S. Chung, A. Zisserman
    Interspeech
    PDF | Project page

  • Deep Lip Reading: a comparison of models and an online application
    T. Afouras, J. S. Chung, A. Zisserman
    Interspeech
    PDF | Project page

2017

  • VoxCeleb: a large-scale speaker identification dataset
    A. Nagrani*, J. S. Chung*, A. Zisserman
    Interspeech
    Best Student Paper Award
    PDF | Dataset

  • Lip Reading in Profile
    J. S. Chung, A. Zisserman
    British Machine Vision Conference
    PDF

2016

  • Out of time: automated lip sync in the wild
    J. S. Chung, A. Zisserman
    Workshop on Multi-view Lip-reading, ACCV
    PDF | Project page

  • Lip Reading in the Wild
    J. S. Chung, A. Zisserman
    Asian Conference on Computer Vision
    Best Student Paper Award
    PDF | Dataset

  • Signs in time: Encoding human motion as a temporal image
    J. S. Chung, A. Zisserman
    Workshop on Brave New Ideas for Motion Representations, ECCV
    PDF | Video

Preprints and Technical Reports
  • Augmentation adversarial training for unsupervised speaker recognition
    J. Huh, H. Heo, J. Kang, S. Watanabe, J. S. Chung
    arXiv:2007.12085
    PDF

  • Metric Learning for Keyword Spotting
    J. Huh, M. Lee, H. Heo, S. Mun, J. S. Chung
    arXiv:2005.08776
    PDF

  • VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge
    J. S. Chung, A. Nagrani, E. Coto, W. Xie, M. McLaren, D. Reynolds, A. Zisserman
    arXiv:1912.02522
    PDF

  • LRS3-TED: a large-scale dataset for visual speech recognition
    T. Afouras, J. S. Chung, A. Zisserman
    arXiv:1809.00496
    PDF | Dataset

Teaching
M3224.000100 - Machine Learning for Visual Understanding, Seoul National University