Effective descriptions of content within images and video clips has been performed with convolutional and recurrent neural networks. Users will apply a deep learning technique via a framework to create captions on data and generate their own captions.
Convolutional and recurrent neural networks can be combined to generate effective descriptions of content within images and video clips. In this TensorFlow lab, attendees will learn about data processing and preparation for network ingestion, network configuration, training, and inference. At the end the lab, participants will know how to ingest and process input data from images as well as sentences, extract image feature vectors from a pretrained network, one-hot encode sentences, concatenate input data, configure and RNN and train it, and then perform inference with their own trained network.
Level: Intermediate
Data, Tools and Frameworks used: Jupyter Notebook, Python, TensorFlow, MSCOCO, CNN, RNN
Prerequisite: Familiarity with TensorFlow, Python and DL concepts will be helpful but this is not required.