A Comparative Analysis of Synthetic Data Generation with VAE and CTGAN Models on Financial Credit Loan Offer Data
Tarih
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
Creating synthetic data is a practical approach to provide a solution for privacy and scalability issues of data in machine learning applications. Data science in finance is encountering an increasing need for anonymized data for the same reasons: strict privacy regulations, and need for balanced data for many modeling tasks. In this work, we address three problems in machine learning applications for financial application and offer solutions. First, we successfully generate synthetic data using a set of actual credit loan offer data by training a custom variational autoencoder and a GAN model. Second, we present a comparative analysis of these models using statistical methods. As far as we know, there are no golden standards for the assessment of synthetically generated data for finance applications. Lastly, we introduce a performance comparison method to evaluate synthetically generated data. Our experimental analysis has shown that the proposed methods achieve a satisfactory performance with the generated data in various machine learning models. © 2023 Elsevier B.V., All rights reserved.








