Abstract

Recent efforts in machine learning (especially with the new waves of deep learning introduced in the last decade) have obliterated records for regression and classification tasks that have previously seen only incremental accuracy improvements. There are many other fields that would significantly benefit from machine learning (ML)-based inferences where data collection or labeling is expensive. In these domains (i.e. Small Data domains), the challenge we now face is how to learn efficiently with the same performance with less data. Many applications will benefit from a strong inference framework with deep structure that will: (i) work with limited labeled training samples; (ii) integrate explicit (structural or data-driven) domain knowledge into the inference model as editable priors to constrain search space; and (iii) maximize the generalization of learning across domains. My research aims to explore a generalized ML approach to solve the small data problem that leverages existing research and fills in key gaps with original work. There are two basic approaches to reduce data needs during model training: (1) decrease inference model learning complexity via data-efficient machine learning, and (2) incorporate domain knowledge in the learning pipeline through the use of data-driven or simulation-based generative models. In this talk, I present my recent work on merging the benefits of these two approaches to enable the training of robust and accurate (i.e. strong) inference models that can be applied on real-world problems dealing with data limitation. My plan to achieve this aim is structured in four research thrusts: (i) introduction of physics- and/or data-driven computational models here referred to as weak generator to synthesize enough labeled data in an adjacent domain; (ii) design and analysis of unsupervised domain adaptation techniques to close the gap between the domain adjacent and domain specific data distributions; (iii) combined use of the weak generator, a weak inference model and an adversarial framework to refine the domain adjacent dataset by employing a set of unlabeled domain specific dataset; and (iv) development and analysis of co-labeling/active learning techniques to select the most informative datasets to refine and adapt the weak inference model into a strong inference model in the target application.

Biography

Dr. Sarah Ostadabbas is a professor at Northeastern University with a research focus on machine learning/pattern recognition, computer vision, affective computing and human-machine interaction. 

Webinar link: https://zoom.us/j/97336697506?pwd=SnBiWlFPNUdzNWNzY2t3eU5rZ3J0QT09
Password: 759948