This talk is the second part in a series of Core Performance optimization techniques. It is intended for developers who are already familiar with the basics covered in the first part. We'll teach advanced techniques, and how to use some of the new features introduced in Hoppper. The topics covered will include asynchronous copies and barriers, CUDA clusters, L2 persistency, CUDA graphs, memory pools, dynamic parallelism 2.0.