III: Effective Labeled Data Generation via Generative Adversarial Learning

Description

Recent successes in applying deep learning to solve many challenging data science problems is in part due to the availability of large-scale labeled training data. However, creating large-scale labeled datasets is time consuming, labor-intensive, costly, and often requires significant domain knowledge. Many real-world applications, therefore, come with only data with limited label information (i.e., a small amount of labeled data or no labeled data). Thus, lack of labeled training data is still one of major roadblocks in applying deep learning techniques to challenging data science problems. On the other hand, recent advancements in generative adversarial learning have shown promising results in generating realistic data, which could enable a new perspective for alleviating the problem of lacking labeled training data. Thus, this project explores effective labeled data generation via generative adversarial learning. The proposed research extends the state-of-the-art labeled data generation and generative adversarial learning to a new frontier, investigates original problems that entreat innovative solutions and paves the way for a new research endeavor effectively tame synthetic labeled data generation. As many real-world problems face the challenge of limited labeled data, the project has potential to benefit many real-world applications from various disciplines such as Computer Science, Education, Politics, Healthcare and Bioinformatics.

This project proposes novel approaches based on generative adversarial learning for effective labeled data generation to facilitate deep learning with limited label information, investigates associated fundamental research issues and develops effective algorithms. It has three primary research objectives. First, when a small amount of labeled data is available, it explores to estimate the underlying data distribution from unlabeled data and incorporate the label information for labeled data generation, including extremely imbalanced data and incomplete label scenarios. Second, when labeled data is not available, it adopts an alternative weak supervision (e.g., inaccurate labels, inexact labels and pairwise constraints) for generating labeled data. Third, when neither labeled data nor weak supervision is available, it explores to integrate human involvement to generative adversarial learning for providing supervision.

Publications

Conferences

Hongliang Chi, Cong Qi, Suhang Wang, and Yao Ma. ``Active Learning for Graphs with Noisy Structures.'' In Proceedings of SIAM International Conference on Data Mining (SDM 2024)
Huaisheng Zhu, Enyan Dai, Hui Liu, and Suhang Wang. ``Learning Fair Models without Sensitive Attributes: A Generative Approach.'' Neurocomputing, 2023
Tianxiang Zhao, Dongsheng Luo, Xiang Zhang, and Suhang Wang. ``Faithful and Consistent Graph Neural Network Explanations with Rationale Alignment.'' ACM TIST, 2023
Fali Wang, Tianxiang Zhao, and Suhang Wang. ``Distribution Consistency based Self-Training for Graph Neural Networks with Sparse Labels.'' In Proceedings of the 16th ACM International Conference on Web Search and Data Mining (WSDM 2024)
Tianxiang Zhao, Wenchao Yu, Suhang Wang, Lu Wang, Xiang Zhang, Yuncong Chen, Yanchi Liu, Wei Cheng, and Haifeng Chen. ``Interpretable Imitation Learning with Dynamic Causal Relations.'' In Proceedings of the 16th ACM International Conference on Web Search and Data Mining (WSDM 2024)
Minhua Lin, Teng Xiao, Enyan Dai, and Suhang Wang. ``Certifiably Robust Graph Contrastive Learning.'' In Proceedings of Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)
Teng Xiao, Huaisheng Zhu, Zhengyu Chen, and Suhang Wang. ``GraphACL: Simple Asymmetric Contrastive Learning of Graphs.'' In Proceedings of Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)
Zhimeng Guo, Jialiang Li, Teng Xiao, Yao Ma, and Suhang Wang. ``Towards Fair Graph Neural Networks via Graph Counterfactual.'' In Proceedings of 32nd ACM International Conference on Information and Knowledge Management (CIKM 2023)
Teng Xiao, Zhengyu Chen, and Suhang Wang. ``Reconsidering Learning Objectives in Unbiased Recommendation: A Distribution Shift Perspective.'' In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2023)
Tianxiang Zhao, Wenchao Yu, Suhang Wang, Lu Wang, Xiang Zhang, Yuncong Chen, Yanchi Liu, Wei Cheng, Haifeng Chen. ``Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations.'' In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2023)
Enyan Dai, Limeng Cui, Zhengyang Wang, Xianfeng Tang, Yinghan Wang, Monica Chen, Bing Yin, and Suhang Wang. ``A Unified Framework of Graph Information Bottleneck for Robustness and Membership Privacy.'' In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2023)
Enyan Dai, Minhua Lin, Xiang Zhang, and Suhang Wang. ``Unnoticeable Backdoor Attacks on Graph Neural Networks.'' In Proceedings of the Web Conference (WWW 2023)
Wenqi Fan, Han Xu, Wei Jin, Xiaorui Liu, Xianfeng Tang, Suhang Wang, Qing Li, Jiliang Tang, Jianping Wang, and Charu Aggarwal. ``Jointly Attacking Graph Neural Network and its Explanations.'' In Proceedings of the 39th IEEE International Conference on Data Engineering (ICDE 2023)
Tianxiang Zhao, Dongsheng Luo, Xiang Zhang, and Suhang Wang. ``Towards Faithful and Consistent Explanations for Graph Neural Networks.'' In Proceedings of the 15th ACM International Conference on Web Search and Data Mining (WSDM 2023)
Huaisheng Zhu, Xianfeng Tang, Tianxiang Zhao, and Suhang Wang. ``You Need to Look Globally: Discovering Representative Topology Structures to Enhance Graph Neural Network.'' In Proceedings of the 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2023)
Tianxiang Zhao, Dongsheng Luo, Xiang Zhang, and Suhang Wang. ``TopoImb: Toward Topology-level Imbalance in Learning from Graphs.'' In Proceedings of the 1st Learning on Graphs Conference (LoG 2022)
Enyan Dai, Shijie Zhou, Zhimeng Guo, and Suhang Wang. ``Label-Wise Graph Convolutional Network for Heterophilic Graphs.'' In Proceedings of the 1st Learning on Graphs Conference (LoG 2022)
Teng Xiao, Zhengyu Chen, Zhimeng Guo, Zeyang Zhuang, and Suhang Wang. ``Decoupled Self-supervised Learning for Non-Homophilous Graphs.'' In Proceedings of Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)
Junjie Xu, Enyan Dai, Xiang Zhang, and Suhang Wang. ``HP-GMN: Graph Memory Networks for Heterophilous Graphs.'' In Proceedings of 22nd ICDM IEEE International Conference on Data Mining (ICDM 2022)
Teng Xiao, Zhengyu Chen, and Suhang Wang. ``Representation Matters When Learning From Biased Feedback in Recommendation.'' In Proceedings of 31st ACM International Conference on Information and Knowledge Management (CIKM 2022)
Tianxiang Zhao, Xiang Zhang, and Suhang Wang. ``Exploring Edge Disentanglement for Node Classification.'' In Proceedings of the Web Conference (WWW 2022)
Teng Xiao, and Suhang Wang. ``Towards Off-Policy Learning for Ranking Policies with Logged Feedback.'' In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI 2022)
Enyan Dai, Jin Wei, Hui Liu, and Suhang Wang. ``Towards Robust Graph Neural Networks for Noisy Graphs with Sparse Labels.'' In Proceedings of the 15th ACM International Conference on Web Search and Data Mining (WSDM 2022)
Tianxiang Zhao, Enyan Dai, Kai Shu, and Suhang Wang. ``Towards Fair Classifiers Without Sensitive Attributes: Exploring Biases in Related Features.'' In Proceedings of the 15th ACM International Conference on Web Search and Data Mining (WSDM 2022)
Teng Xiao, and Suhang Wang. ``Towards Unbiased and Robust Causal Ranking for Recommender Systems.'' In Proceedings of the 15th ACM International Conference on Web Search and Data Mining (WSDM 2022)
Thai Le, Long Tran-Thanh, and Dongwon Lee. ``Socialbots on Fire: Modeling Adversarial Behaviors of Socialbots via Multi-Agent Hierarchical Reinforcement Learning.'' In Proceedings of the Web Conference (WWW 2022)
Limeng Cui, Xianfeng Tang, Sumeet Katariya, Nikhil Rao, Pallav Agrawal, Karthik Subbian, and Dongwon Lee. ``ALLIE: Active Learning on Large-scale Imbalanced Graphs.'' In Proceedings of the Web Conference (WWW 2022)
Joe McCalmon, Thai Le, Sarra Alqahtani, and Dongwon Lee. ``CAPS: Comprehensible Abstract Policy Summaries for Explaining Reinforcement Learning Agents.'' In Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2021)
Limeng Cui, and Dongwon Lee. ``KETCH: Knowledge Graph Enhanced Thread Recommendation in Healthcare Forums.'' In Proceddings of the 45th International ACM Conference on Research and Development in Information Retrieval (SIGIR 2022)
Haeseung Seo, Aiping Xiong, Sian Lee, and Dongwon Lee. ``If You Have a Reliable Source, Say Something: Effects of Correction Comments on COVID-19 Misinformation.'' In Proceedings of the 16th International AAAI Conference on Web and Social Media (ICWSM 2022)
Enyan Dai, and Suhang Wang. ``Towards Self-Explainable Graph Neural Networks.'' In Proceedings of 30th ACM International Conference on Information and Knowledge Management (CIKM-21)
Enyan Dai, Charu Aggarwal, and Suhang Wang. ``NRGNN: Learning a Label Noise Resistant Graph Neural Network on Sparsely and Noisily Labeled Graphs.'' In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-21)
Teng Xiao, Zhengyu Chen, Donglin Wang and Suhang Wang. ``Learning How to Propagate Messages in Graph Neural Networks.'' In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-21)
Enyan Dai, Yiwei Sun, Kai Shu, and Suhang Wang. ``Labeled Data Generation with Inexact Supervision.'' In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-21)
Yao Ma, Suhang Wang, Lingfei Wu, and Jiliang Tang. ``Attacking Graph Convolutional Networks via Rewiring.'' In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-21)
Tsung-Yu Hsieh, Yiwei Sun, Xianfeng Tang, Suhang Wang, and Vasant Honavar. ``SrVARM: State Regularized Vector Autoregressive Model for Joint Learning of Hidden State Transitions and State-Dependent Inter-Variable Dependencies from Multi-variate Time Series.'' In Proceedings of the Web Conference (WWW 2021)
Tsung-Yu Hsieh, Yiwei Sun, Suhang Wang, and Vasant Honavar. ``Functional Autoencoders for Functional Data Representation Learning.'' In Proceedings of the Twenty-First SIAM International Conference on Data Mining (SDM-21)
Enyan Dai, and Suhang Wang. ``Say No to the Discrimination: Learning Fair Graph Neural Networks with Limited Sensitive Attribute Information.'' In Proceedings of the 14th ACM International Conference on Web Search and Data Mining (WSDM 2021)
Tianxiang Zhao, Xiang Zhang, and Suhang Wang. ``GraphSMOTE: Imbalanced Node Classification on Graphs with Graph Neural Networks.'' In Proceedings of the 14th ACM International Conference on Web Search and Data Mining (WSDM 2021)
Tsung-Yu Hsieh, Suhang Wang, Yiwei Sun, and Vasant Honavar. ``Explainable Multivariate Time Series: A Deep Neural Network Which Learns to Attend Important Variables As Well As Time Intervials.'' In Proceedings of the 14th ACM International Conference on Web Search and Data Mining (WSDM 2021)
Duanshun Li, Jing Liu, Jinsung Jeon, Seoyoung Hong, Thai Le, Dongwon Lee, Noseong Park. ``Large-Scale Data-Driven Airline Market Influence Maximization.'' In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-21)
Maryam Tabar, Jared Gluck, Anchit Goyal, Fei Jiang, Derek Morr, Annalyse Kehs, Dongwon Lee, David Hughes, Amulya Yadav. ``A PLAN for Tackling the Locust Crisis in East Africa: Harnessing Spatiotemporal Deep Models for Locust Movement Forecasting.'' In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-21)
Won Chang Lee, Yeon-Chang Lee, Dongwon Lee, Sang-Wook Kim. ``Look Before You Leap: Confirming Edge Signs in Random Walk with Restart for Personalized Node Ranking in Signed Networks.'' In Proceedings of the 44th ACM Conference on Research and Development in Information Retrieval (SIGIR-21)
Minjin Choi, Heesoo Park, Sunkyung Lee, Eunseong Choi, Junhyuk Lee, Dongwon Lee, Jongwuk Lee. ``MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories.'' In Proceedings of the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-21)
Maria Molina, S. Shyam Sundar, Md Main Uddin Rony, Naeemul Hassan, Thai Le, Dongwon Lee. ``Does Clickbait Actually Attract More Clicks? Three Clickbait Studies You Must Read.'' In Proceedings of the 2021 Annual Conference on Human Factors in Computing Systems (CHI-21)
Yiming Liang, Shuguang Wang, Dongwon Lee. ``WILSON: A Divide and Conquer Approach for Fast and Effective News Timeline Summarization.'' In Proceedings of the 24th International Conference on Extending Database Technology (EDBT-21)
Thai Le, Suhang Wang, and Dongwon Lee. ``MALCOM: Generating Malicious Comments to Attack Neural Fake News Detection Models'', In Proceedings of 20th ICDM IEEE International Conference on Data Mining (ICDM-20)
Wentao Wang, Tyler Derr, Yao Ma, Suhang Wang, Hui Liu, Zitao Liu, and Jiliang Tang. ``Learning from Incomplete Labeled Data via Adversarial Data Generation'', In Proceedings of 20th ICDM IEEE International Conference on Data Mining (ICDM-20)
Xianfeng Tang, Huaxiu Yao, Yiwei Sun, Yiqi Wang, Jiliang Tang, Charu Aggarwal, Prasenjit Mitra, and Suhang Wang. ``Investigating and Mitigating Degree-Related Biases in Graph Convolutional Networks'', In Proceedings of 29th ACM International Conference on Information and Knowledge Management (CIKM-20)
Tianxiang Zhao, Xianfeng Tang, Xiang Zhang, and Suhang Wang. ``Semi-Supervised Graph-to-Graph Translations'', In Proceedings of 29th ACM International Conference on Information and Knowledge Management (CIKM-20)
Xianfeng Tang, Yozen Liu, Neil Shah, Xiaolin Shi, Prasenjit Mitra, and Suhang Wang. ``Knowing Your FATE: Friendship, Action and Temporal Explanations for User Engagement Prediction on Social Apps'', In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discover and Data Mining (KDD-20)
Wei Jin, Yao Ma, Xiaorui Liu, Xianfeng Tang, Suhang Wang, and Jiliang Tang. ``Graph Structure Learning for Robust Graph Neural Networks'', In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discover and Data Mining (KDD-20)
Thai Le, Suhang Wang, and Dongwon Lee. ``GRACE: Generating Concise and Informative Contrastive Sample to Explain Neural Network Model's Prediction'', In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discover and Data Mining (KDD-20)
Limeng Cui, Haseseung Seo, Maryain Tabar, Fenglong Ma, Suhang Wang, and Dongwon Lee. ``DETERRENT: Knowledge Guided Graph Attention Network for Detecting Healthcare Misinformation'', In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discover and Data Mining (KDD-20)
Enyan Dai, Yiwei Sun, and Suhang Wang. ``Ginger Cannot Cure Cancer: Battling Fake Health News with A Comprehensive Data Repository'', In Proceedings of the Internatioal AAAI Conference on Web and Social Media (ICWSM-20)
Wentao Wang, Suhang Wang, Wenqi Fan, Zitao Liu, and Jiliang Tang. ``Global-and-Local Aware Data Generation for the Class Imblance Problem'', In Proceedings of the Twentieth SIAM International Conference on Data Mining (SDM-20)
Xianfeng Tang, Huaxiu Yao, Yiwei Sun, Charu Aggarwal, Prasenjit Mitra, and Suhang Wang. ``Joint Modeling of Local and Global Temporal Dynamics of Multivariate Time Series Forecasting with Missing Values'', In Proceedings of the Thirty-Forth AAAI Conference on Artificial Intelligence (AAAI-20)
Xianfeng Tang, Yandong Li, Yiwei Sun, Huaxiu Yao, Prasenjit Mitra, and Suhang Wang. ``Transferring Robustness for Graph Neural Network Against Poisoning Attacks'', In Proceedings of the 13th ACM International Conference on Web Search and Data Mining (WSDM-20)

Journal

Enyan Dai, and Suhang Wang ``Learning Fair Graph Neural Networks with Limited and Private Sensitive Attribute Information'', IEEE Transactions on Knowledge and Data Engineering (TKDE 2022)

Workshop

Michiharu Yamashita, Yunqi Li, Thanh Tran, Yongfeng Zhang, and Dongwon Lee ``Looking Further into the Future: Career Pathway Prediction'', ACM WSDM Workshop on Computational Jobs Marketplace (WSDM Workshop 2022)

Resources

Code

Dataset

FakeHealth [code, data] (Ginger Cannot Cure Cancer: Battling Fake Health News with A Comprehensive Data Repository)

Project Members

Suhang Wang (PI)
Dongwon Lee (Co-PI)
Enyan Dai (PhD Student)
Huaisheng Zhu (PhD Student)
Fali Wang (PhD Student)
Teng Xiao (PhD Student)
Thai Le (PhD Student)
Limeng Cui (PhD Student)

Acknowledgments

This project is supported by National Science Foundation (NSF) under Grant #1909702. Any opinions, findings, and conclusions or recommendations expressed here are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Created by Suhang Wang who can be reached at szw494 at psu.edu.
Webmaster: Enyan Dai, Email: emd5759 at psu.edu.

Last Updated: September 20th, 2022