III: Effective Labeled Data Generation via Generative Adversarial Learning

Description

Recent successes in applying deep learning to solve many challenging data science problems is in part due to the availability of large-scale labeled training data. However, creating large-scale labeled datasets is time consuming, labor-intensive, costly, and often requires significant domain knowledge. Many real-world applications, therefore, come with only data with limited label information (i.e., a small amount of labeled data or no labeled data). Thus, lack of labeled training data is still one of major roadblocks in applying deep learning techniques to challenging data science problems. On the other hand, recent advancements in generative adversarial learning have shown promising results in generating realistic data, which could enable a new perspective for alleviating the problem of lacking labeled training data. Thus, this project explores effective labeled data generation via generative adversarial learning. The proposed research extends the state-of-the-art labeled data generation and generative adversarial learning to a new frontier, investigates original problems that entreat innovative solutions and paves the way for a new research endeavor effectively tame synthetic labeled data generation. As many real-world problems face the challenge of limited labeled data, the project has potential to benefit many real-world applications from various disciplines such as Computer Science, Education, Politics, Healthcare and Bioinformatics.

This project proposes novel approaches based on generative adversarial learning for effective labeled data generation to facilitate deep learning with limited label information, investigates associated fundamental research issues and develops effective algorithms. It has three primary research objectives. First, when a small amount of labeled data is available, it explores to estimate the underlying data distribution from unlabeled data and incorporate the label information for labeled data generation, including extremely imbalanced data and incomplete label scenarios. Second, when labeled data is not available, it adopts an alternative weak supervision (e.g., inaccurate labels, inexact labels and pairwise constraints) for generating labeled data. Third, when neither labeled data nor weak supervision is available, it explores to integrate human involvement to generative adversarial learning for providing supervision.

Publications

  • Conferences
  • Journal
  • Workshop
  • Resources

  • Code
  • Dataset
  • Project Members

    Acknowledgments

    This project is supported by National Science Foundation (NSF) under Grant #1909702. Any opinions, findings, and conclusions or recommendations expressed here are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

    Created by Suhang Wang who can be reached at szw494 at psu.edu.
    Webmaster: Enyan Dai, Email: emd5759 at psu.edu.


    Last Updated: September 20th, 2022