Generating and Assessing Synthetic Transaction Data
, Bloomberg
, Machine learning architect, Bloomberg
This talk will explore how we used a GPT transformer model to understand the patterns of 24 million rows of credit card data in order to infer 42 million more rows of highly representative synthetic data that maintain the underlying data constructs, while simultaneously removing completely any references to the original data. In the past, VAEs and GANs were typically used to generate synthetic data, but we will discuss how and why transformers are better at this process, as well as how we tested this hypothesis using statistical approaches to validate the new data.