Creative AI, training generative deep neural networks for NLP (such as poems, Haiku, stories), image (such as painting, animation), and speech (such as classic and popular music generation, singing), has achieved impressive milestones during recent years, thanks to deep networks such as attentive encoder-decoder architectures (Transformers), generative-discriminative frameworks (GANs) and self-supervised encoders (VAEs). On the other hand, conversational AI products, text and speech based multi-modal communication between chatbots and human beings, have obtained million-level users in Japan and globally. In order to construct the strong persona of conversational AI products, chatbots are enhanced to be able to interactively write poems, create songs, sing, and even tell stories, through multi-turn communications. We specially focus on introducing the following novel algorithms and frameworks: (1) conversational AI based painting through a pipeline of (a) “sentence-image” attentive GAN (e.g., selecting “horse running/field” for an input of “I want freedom!”); (b) PG (progressive growing) GAN for drawing the front-ground objects (running horses); and (c) cycle GAN for unifying the background (sky/grass…) and front-ground objects. (2) conversational AI based music generating using transformer-XL and time-sensitive notes. The novel ideas include (a) music score inspired expressions of time-valued notes and (b) joint training of note-on-to-note-on, note-on-to-note-off, pitch and velocity under transformer-XL. (3) a singing model for conversational AI with a transformer’s encoder-decoder framework for pop music to opera singing in Chinese song singing.