Ali Vosoughi is a Ph.D. student in the Electrical and Computer Engineering Department at the University of Rochester. His interests are multimodal information processing using deep learning, multisensory perception, video and scene understanding, audio, image, video, and natural language (NLP) that can help to build the next generation of AI assistants able to solve complex and imaginative tasks.
He works with Prof. Chenliang Xu and Prof. Axel Wismueller on deep multimodal learning of audio, vision, image, speech, and language models for egocentric/3rd person video understanding and medical imaging applications. Ali is a scholar in the NSF’s Augmented and Virtual Reality project.
Write ✉️ to Ali Vosoughi: mvosough 🙂 ece.rochester.edu
- – Demo presentation of our conversational AI assistant!
- – Presentations at SPIE 2023
- – A paper is accepted in Nature, digital medicine. Congratulations to the team!
- – Congratulations on being nominated for the 2022 Donald M. and Janet C. Barnard Fellowship!
- – One paper is accepted in ICASSP 2022, track of Machine Learning. Thanks to our team!
- – Two papers accepted in Defense and Commercial Sensing, 2022! Congrats!
- – Four papers are accepted to SPIE. Congratulations to our team!
- – Our participation in NSF Research program on AR/VR
- – Codes for our Nature paper
- – A new paper on causal learning is accepted for publication in EUSIPCO 2021
Audio-Visual Deep Learning
The work is currently under double-blind review…
Visual Question Answering
The work is currently under double-blind review…
Cross-Modal Language and Vision Models for Radiology
Is it possible to combine radiology images and the corresponding medical reports to train a deep-learning model that does better classification?
What about zero-shot classification? Here we explore these problems. We show that language improves classification accuracy in both supervised settings as well as zero-shot settings.
The paper is publicly available on arxiv and is accepted in Med. Imaging. CAD 2023.
Point Clouds and Deep Learning
Visual information processing using point cloud (PC) data has attracted considerable attention during the last decade. Although PC data are in analytic geometry and Euclidean space, their use in deep neural networks faces challenges related to analyzing point sets. PC data constitute a permutation-invariant set of points that fades the advantages of using deep learning (DL) architectures for computer vision tasks. In this study, we use a Siamese neural network (SNN) for tracking and shape completion of 3D objects. Our contributions include a modified objective function and non-random sampling methods. Finally, by experimenting with the proposed method on the KITTI dataset of self-driving cars, the success and precision of the 3D object tracking are improved from 41.3 and 57.9 percent to 68.3 and 78.1 percent (this study) with a Kalman-filter’s search space.
Neural Networks and Causality
Causal queries nourish essential insight into various scientific discoveries. Unfortunately, data observed at the massive number of sensors typically are nonlinearly related and complex. At the same time, they have information from millions of sensors with only a few decades of temporal samples, which leads to the curse of dimensionality in large-scale systems. Despite a rich literature on the causal discovery, the problem is challenging for large-scale datasets. We propose a novel method that utilizes a radial basis function (RBF) to tackle the curse-of-dimensionality in complex systems. The proposed method is probabilistic, encompasses nonlinear relations, and is suitable for large-scale data in two steps. Extensive simulations on synthetic data of different sizes and real-world climatology data show that our method outperforms all other methods when nodal observations are temporally scarce.
Anomaly Detection with Autoencoders
Labeled data are necessary for supervised neural network (NN) training. However, supervised learning does not scale favorably, because human intervention for labeling large datasets is expensive. Here, we propose a method that introduces interventions on the training set, and enables NNs to learn features in a self-supervised learning (SSL) setting. The method intervenes in the training data by randomly changing image contrast and removing input image patches, thus creating a significantly augmented training dataset. This is fed into an autoencoder (AE) network, which learns how to reconstruct input images given variable contrast and missing patches of pixels. The proposed technique enables few-shot learning of most relevant image features by forcing NNs to exploit context information in a generative model. Here, we focus on a medical imaging application, where large labeled datasets are usually not available. We evaluate our proposed algorithm for anomaly detection on a small dataset with only 23 training and 35 test images of T2-weighted brain MRI scans from healthy controls (training) and tumor patients (test). We find that the image reconstruction error for healthy controls is significantly lower than for tumor patients (Mann-Whitney U-test, p <10^-10), which can be exploited for anomaly detection of pathologic brain regions by human expert analysis of reconstructed images. Interestingly, this still holds for conventional AE training without SSL, although reconstruction error distributions for healthy/diseased subjects appear to be less dissimilar (p<10^-7). We conclude that the proposed SSL method may be useful for anomaly detection in medical imaging, thus potentially enhancing radiologists’ productivity.
Information Theory for Hardware Security
Information theory for security
CMOS-Based Optimization Accelerators
Optimization using mixed-signal analog – Ising machines[website]
Camera Sensor Design
Tera hertz (lensless) camera sensors design [website]
CMOS and Electronics
Noise analysis and sensor interface design [website]
چرا فارسی را نمی شود در این وبسایت از راست نوشت؟