Interactive Demos & Tools

  • SoundCLIP: Unified Audio-Visual Understanding
    Live DemoGitHub
    Adapt LLaVA latent space to ingest audio alongside video. Interactive demonstration of unified multimodal token alignment for audio-visual understanding tasks.
  • AVVA-Curation: Audio-Video Vector Alignment
    Project SiteGitHub
    LLM-gated curation system for large-scale audiovisual data. Quality-over-quantity approach for training data-efficient audio-video foundation models.
  • PW-VQA: Possible Worlds Visual Question Answering
    GitHub
    Causal VQA benchmark for investigating cross-modal bias. Interactive evaluation framework for testing multimodal reasoning fidelity.
  • MMPerspective: Multimodal Perspective Understanding
    GitHub
    Comprehensive benchmark for perspective perception, reasoning, and robustness in multimodal large language models.