Research Impact
M. Ali Vosoughi (محمد علی وثوقی) • PhD Candidate, University of Rochester • alivosoughi.com
Publications & Citations
- 345 total citations (Google Scholar)
- h-index: 8 across computer vision, audio processing, and multimodal AI
- First-author publications: IEEE TMM, ICASSP, EUSIPCO
- Survey leadership: Video Understanding with LLMs (IEEE TCSVT 2025)
Industry Research Collaborations
- Microsoft Research (2024): Unified audiovisual encoder → EUSIPCO 2025
- Bosch AI Research (2023): Counterfactual audio-language → ICASSP 2024 + patent pending
- Smule AI (2025): Spatial audio generation for real-time applications
Open Research Contributions
- Datasets: AVE-2 audiovisual dataset
- Benchmarks: PW-VQA (causal VQA), MMPerspective (MLLM evaluation)
- Code: GitHub repositories with reproducible implementations
- Demos: SoundCLIP interactive demo
Recognition
- US National Interest Waiver (2023): Permanent residency for contributions to AI research
- Patent applications: US 18/379,518 (audio-language learning)
- Program leadership: DARPA PTG multimodal AI development
Research focus: Unified multimodal systems for controllable audio and video generation, grounded by language model reasoning.