|
CUDA Context Thread Management
Simple program illustrating how to the CUDA Context Management API. CUDA contexts can be created separately and attached independently to different threads. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Simple Vote Intrinsics
Simple program which demonstrates how to use the Vote (any, all) intrinsic instruction in a CUDA kernel. Requires Compute Capability 1.2 or higher. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Simple D3D9 Texture
Simple program which demonstrates Direct3D9 texture interoperability with CUDA. The program creates a number of D3D9 textures (2D, 3D, and CubeMap) which are written to from CUDA kernels. Direct3D then renders the results on the screen. |
|
or later
Download - Windows x86
Download - Windows x64
|
|
|
Simple Atomic Intrinsics
A simple demonstration of global memory atomic instructions. Requires Compute Capability 1.1 or higher. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Separable Convolution
This sample implements a separable convolution filter of a 2D signal with a gaussian kernel. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Texture-based Separable Convolution
Texture-based implementation of a separable 2D convolution with a gaussian kernel. Used for performance comparison against convolutionSeparable. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Black-Scholes Option Pricing
This sample evaluates fair call and put prices for a given set of European options by Black-Scholes formula. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Bitonic Sort
Bitonic sort is a very simple parallel sorting algorithm that is very
efficient when sorting a small number of elements:
http://citeseer.ist.psu.edu/blelloch98experimental.html
This implementation is based on:
http://www.tools-of-computing.com/tc/CS/Sorts/bitonic_sort.htm
|
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Scalar Product
This sample calculates scalar products of a given set of input vector pairs. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Clock
This example shows how to use the clock function to measure the performance of kernel accurately. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Simple multi-GPU
This application demonstrates how to use the CUDA API to use multiple GPUs.
|
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Aligned Types
A simple test, showing huge access speed gap between aligned and misaligned structures. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Bicubic Texture Filtering
This sample demonstrates how to efficiently implement bicubic texture filtering in CUDA. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Volume rendering
This sample demonstrates basic volume rendering using 3D textures. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Simple Texture 3D
Simple example that demonstrates use of 3D textures in CUDA. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Simple Direct3D
Simple program which demonstrates interoperability between CUDA and Direct3D9. The program modifies vertex positions with CUDA and uses Direct3D9 to render the geometry. |
|
or later
Download - Windows x86
Download - Windows x64
|
|
|
asyncAPI
This sample uses CUDA streams and events to overlap execution on CPU and GPU. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
cudaOpenMP
This sample shows how to use OpenMP API to write an application for multiple GPUs. |
|
or later
Download - Windows x86
Download - Windows x64
|
|
|
simpleStreams
This sample uses CUDA streams to overlap kernel executions with memcopies between the device and the host. Requires Compute Capability 1.1 or higher. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Device Query
This sample enumerates the properties of the CUDA devices present in the system. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Simple Templates
This sample is a templatized version of the template project. It also shows how to correctly templatize dynamically allocated shared memory arrays. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Bandwidth Test
This is a simple test program to measure the memcopy bandwidth of the GPU. It currently is capable of measuring device to device copy bandwidth, host to device copy bandwidth for pageable and page-locked memory, and device to host copy bandwidth for pageable and page-locked memory. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Simple Texture (Driver Version)
Simple example that demonstrates use of textures in CUDA using the driver API. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Simple Texture
Simple example that demonstrates use of textures in CUDA. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Matrix Multiplication (Driver Version)
This sample implements matrix multiplication using the CUDA driver API.
It has been written for clarity of exposition to illustrate various CUDA programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication.
CUBLAS provides high-performance matrix multiplication. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Template
A trivial template project that can be used as a starting point to create new CUDA projects. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Simple CUFFT
Example of using CUFFT. In this example, CUFFT is used to compute the 1D-convolution of some signal with some filter by transforming both into frequency domain, multiplying them together, and transforming the signal back to time domain. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Simple OpenGL
Simple program which demonstrates interoperability between CUDA and OpenGL. The program modifies vertex positions with CUDA and uses OpenGL to render the geometry. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Simple CUBLAS
Example of using CUBLAS. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
Matrix Multiplication
This sample implements matrix multiplication and is exactly the same as Chapter 6 of the programming guide.
It has been written for clarity of exposition to illustrate various CUDA programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication.
CUBLAS provides high-performance matrix multiplication. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|
|
C++ Integration
This example demonstrates how to integrate CUDA into an existing C++ application, i.e. the CUDA entry point on host side is only a function which is called from C++ code and only the file containing this function is compiled with nvcc. It also demonstrates that vector types can be used from cpp. |
|
or later
Download - Windows x86
Download - Windows x64
Download - Linux/Mac
|
|