Class KernelLauncher
java.lang.Object
jcuda.utils.KernelLauncher
This is a utility class that simplifies the setup and launching
of CUDA kernels using the JCuda Driver API.
Instances of this class may be created using one of the following methods:
These instances may then be used to call a kernel function with the
The setup of the execution may be performed similarly as the invocation of a kernel when using the Runtime API in C. Such a call has the form
where
When default values for some of the parameters should be used, one of the overloaded versions of the setup method may be called:
The parameters may also be set individually:
Instances of this class may be created using one of the following methods:
-
compile(String, String, String...)
will compile a kernel from a String containing the CUDA source code -
create(String, String, String...)
will create a kernel for a function that is contained in a CUDA source file -
load(String, String)
will load a kernel from a PTX or CUBIN (CUDA binary) file. -
load(InputStream, String)
will load a kernel from PTX- or CUBIN data which is provided via an InputStream (useful for packaging PTX- or CUBIN files into JAR archives)
These instances may then be used to call a kernel function with the
call(Object...)
method. The actual
kernel function arguments which are passed to this method
will be set up automatically, and aligned appropriately for
their respective size.The setup of the execution may be performed similarly as the invocation of a kernel when using the Runtime API in C. Such a call has the form
kernel<<<gridDim, blockDim,
sharedMemorySize, stream>>>(...);
where
- gridDim is a dim3 which specifies the number of blocks per grid
- blockDim is a dim3 that specifies the number of threads per block
- sharedMemorySize is the size of the shared memory for the kernel
- stream is a stream for asynchronous kernel execution
setup(dim3, dim3, int, CUstream)
method:
kernelLauncher.setup(gridDim,
blockDim, sharedMemorySize, stream).call(...);
When default values for some of the parameters should be used, one of the overloaded versions of the setup method may be called:
kernelLauncher.setup(gridDim,
blockDim).call(kernel);
The parameters may also be set individually:
kernelLauncher.setGridSize(gridSize);
kernelLauncher.setBlockSize(blockSize);
kernelLauncher.call(...);
-
Method Summary
Modifier and TypeMethodDescriptionvoid
Call the function of this KernelLauncher with the current grid size, block size, shared memory size and stream, and with the given arguments.
The given arguments must all be either of the typePointer
, or of a primitive type except boolean.static KernelLauncher
Create a new KernelLauncher for the function with the given name, that is defined in the given source code.static KernelLauncher
Create a new KernelLauncher for the function with the given name, that is contained in the .CU CUDA source file with the given name.static KernelLauncher
Create a new KernelLauncher for the function with the given name, that is contained in the .CU CUDA source file with the given name.forFunction
(String functionName) Create a new KernelLauncher which uses the same module as this KernelLauncher, but may be used to execute a different function.jcuda.driver.CUmodule
Returns the module that was created from the PTX- or CUBIN file, and which contains the function that should be executed.static KernelLauncher
load
(InputStream moduleInputStream, String functionName) Create a new KernelLauncher which may be used to execute the specified function which is loaded from the PTX- or CUBIN data that is read from the given input stream.static KernelLauncher
Create a new KernelLauncher which may be used to execute the specified function which is loaded from the PTX- or CUBIN (CUDA binary) file with the given name.setBlockSize
(int x, int y, int z) Set the block size (number of threads per block) for the function call.
This corresponds to the second parameter in the runtime call:
kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);
The default block size is (1,1,1)static void
setCompilerPath
(String path) Set the path to the NVCC compiler.static void
setDeviceNumber
(int number) Set the number (index) of the device which should be used by the KernelLaunchersetGridSize
(int x, int y) Set the grid size (number of blocks per grid) for the function call.
This corresponds to the first parameter in the runtime call:
kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);
The default grid size is (1,1,1)setGridSize
(int x, int y, int z) Set the grid size (number of blocks per grid) for the function call.
This corresponds to the first parameter in the runtime call:
kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);
The default grid size is (1,1,1)setSharedMemSize
(int sharedMemSize) Set the size of the shared memory for the function call.
This corresponds to the third parameter in the runtime call:
kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);
The default shared memory size is 0.setStream
(jcuda.driver.CUstream stream) Set the stream for the function call.
This corresponds to the fourth parameter in the runtime call:
kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);
The default stream is null (0).setup
(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize) Set the given grid size and block size for this KernelLauncher.setup
(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize, int sharedMemSize) Set the given grid size and block size and shared memory size for this KernelLauncher.setup
(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize, int sharedMemSize, jcuda.driver.CUstream stream) Set the given grid size and block size, shared memory size and stream for this KernelLauncher.
-
Method Details
-
setCompilerPath
Set the path to the NVCC compiler. For example:
setCompilerPath("C:/CUDA/bin");
By default, this path is empty, assuming that the compiler is in a path that is visible via an environment variable.- Parameters:
path
- The path to the NVCC compiler.
-
setDeviceNumber
public static void setDeviceNumber(int number) Set the number (index) of the device which should be used by the KernelLauncher- Parameters:
number
- The number of the device to use- Throws:
jcuda.CudaException
- If number<0 or number>=deviceCount
-
compile
public static KernelLauncher compile(String sourceCode, String functionName, String... nvccArguments) Create a new KernelLauncher for the function with the given name, that is defined in the given source code.
The source code is stored in a temporary .CU CUDA source file, and a PTX file is compiled from this source file using the NVCC (NVIDIA CUDA C Compiler) in a separate process. The optional nvccArguments are passed to the NVCC.
The NVCC has to be in a visible directory. E.g. for Windows, the NVCC.EXE has to be in a directory that is contained in the PATH environment variable. Alternatively, the path to the NVCC may be specified by callingsetCompilerPath(String)
with the respective path.
Note: In order to make the function accessible by the name it has in the source code, the function has to be declared as anextern "C"
function:
extern "C"
__global__ void functionName(...)
{
...
}
- Parameters:
sourceCode
- The source code containing the functionfunctionName
- The name of the function.nvccArguments
- Optional arguments for the NVCC- Returns:
- The KernelLauncher for the specified function
- Throws:
jcuda.CudaException
- If the creation of the CU- or PTX file fails, or the PTX may not be loaded, or the specified function can not be obtained.- See Also:
-
create
public static KernelLauncher create(String cuFileName, String functionName, String... nvccArguments) Create a new KernelLauncher for the function with the given name, that is contained in the .CU CUDA source file with the given name.
Note: In order to make the function accessible by the name it has in the source code, the function has to be declared as anextern "C"
function:
extern "C"
__global__ void functionName(...)
{
...
}
The extension of the given file name is replaced with "ptx". If the PTX file with the resulting name does not exist, or is older than the .CU file, it is compiled from the specified source file using the NVCC (NVIDIA CUDA C Compiler) in a separate process. The optional nvccArguments are passed to the NVCC.
The NVCC has to be in a visible directory. E.g. for Windows, the NVCC.EXE has to be in a directory that is contained in the PATH environment variable. Alternatively, the path to the NVCC may be specified by callingsetCompilerPath(String)
with the respective path.- Parameters:
cuFileName
- The name of the source file.functionName
- The name of the function.nvccArguments
- Optional arguments for the NVCC- Returns:
- The KernelLauncher for the specified function
- Throws:
jcuda.CudaException
- If the creation of the PTX file fails, or the PTX may not be loaded, or the specified function can not be obtained.- See Also:
-
create
public static KernelLauncher create(String cuFileName, String functionName, boolean forceRebuild, String... nvccArguments) Create a new KernelLauncher for the function with the given name, that is contained in the .CU CUDA source file with the given name.
Note: In order to make the function accessible by the name it has in the source code, the function has to be declared as anextern "C"
function:
extern "C"
__global__ void functionName(...)
{
...
}
The extension of the given file name is replaced with "ptx". If the PTX file with the resulting name does not exist, or is older than the .CU file, it is compiled from the specified source file using the NVCC (NVIDIA CUDA C Compiler) in a separate process. The optional nvccArguments are passed to the NVCC.
If theforceRebuild
flag is 'true', then the PTX file will be recompiled from the given source file, even if it already existed or was newer than the source file, and the already existing PTX file will be overwritten.
The NVCC has to be in a visible directory. E.g. for Windows, the NVCC.EXE has to be in a directory that is contained in the PATH environment variable. Alternatively, the path to the NVCC may be specified by callingsetCompilerPath(String)
with the respective path.- Parameters:
cuFileName
- The name of the source file.functionName
- The name of the function.forceRebuild
- Whether the PTX file should be recompiled and overwritten if it already exists.nvccArguments
- Optional arguments for the NVCC- Returns:
- The KernelLauncher for the specified function
- Throws:
jcuda.CudaException
- If the creation of the PTX file fails, or the PTX may not be loaded, or the specified function can not be obtained.- See Also:
-
load
Create a new KernelLauncher which may be used to execute the specified function which is loaded from the PTX- or CUBIN (CUDA binary) file with the given name.- Parameters:
moduleFileName
- The name of the PTX- or CUBIN filefunctionName
- The name of the function- Returns:
- The KernelLauncher for the specified function
- Throws:
jcuda.CudaException
- If the PTX- or CUBIN may not be loaded, or the specified function can not be obtained.- See Also:
-
load
Create a new KernelLauncher which may be used to execute the specified function which is loaded from the PTX- or CUBIN data that is read from the given input stream.- Parameters:
moduleInputStream
- The stream for the PTX- or CUBIN datafunctionName
- The name of the function- Returns:
- The KernelLauncher for the specified function
- Throws:
jcuda.CudaException
- If the PTX- or CUBIN may not be loaded, or the specified function can not be obtained.- See Also:
-
forFunction
Create a new KernelLauncher which uses the same module as this KernelLauncher, but may be used to execute a different function. All parameters (grid size, block size, shared memory size and stream) of the returned KernelLauncher will be independent of 'this' one and initially contain the default values.- Parameters:
functionName
- The name of the function- Returns:
- The KernelLauncher for the specified function
- Throws:
jcuda.CudaException
- If the specified function can not be obtained from the module of this KernelLauncher.
-
getModule
public jcuda.driver.CUmodule getModule()Returns the module that was created from the PTX- or CUBIN file, and which contains the function that should be executed. This module may also be used to access symbols and texture references. However, clients should not modify or unload the module.- Returns:
- The CUmodule
-
setGridSize
Set the grid size (number of blocks per grid) for the function call.
This corresponds to the first parameter in the runtime call:
kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);
The default grid size is (1,1,1)- Parameters:
x
- The number of blocks per grid in x-directiony
- The number of blocks per grid in y-direction- Returns:
- This instance
- See Also:
-
setGridSize
Set the grid size (number of blocks per grid) for the function call.
This corresponds to the first parameter in the runtime call:
kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);
The default grid size is (1,1,1)- Parameters:
x
- The number of blocks per grid in x-directiony
- The number of blocks per grid in y-directionz
- The number of blocks per grid in z-direction- Returns:
- This instance
- See Also:
-
setBlockSize
Set the block size (number of threads per block) for the function call.
This corresponds to the second parameter in the runtime call:
kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);
The default block size is (1,1,1)- Parameters:
x
- The number of threads per block in x-directiony
- The number of threads per block in y-directionz
- The number of threads per block in z-direction- Returns:
- This instance
- See Also:
-
setStream
Set the stream for the function call.
This corresponds to the fourth parameter in the runtime call:
kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);
The default stream is null (0).- Parameters:
stream
- The stream for the function call- Returns:
- This instance
- See Also:
-
setup
Set the given grid size and block size for this KernelLauncher.- Parameters:
gridSize
- The grid size (number of blocks per grid)blockSize
- The block size (number of threads per block)- Returns:
- This instance
- See Also:
-
setup
public KernelLauncher setup(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize, int sharedMemSize) Set the given grid size and block size and shared memory size for this KernelLauncher.- Parameters:
gridSize
- The grid size (number of blocks per grid)blockSize
- The block size (number of threads per block)sharedMemSize
- The size of the shared memory- Returns:
- This instance
- See Also:
-
setup
public KernelLauncher setup(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize, int sharedMemSize, jcuda.driver.CUstream stream) Set the given grid size and block size, shared memory size and stream for this KernelLauncher.- Parameters:
gridSize
- The grid size (number of blocks per grid)blockSize
- The block size (number of threads per block)sharedMemSize
- The size of the shared memorystream
- The stream for the kernel invocation- Returns:
- This instance
- See Also:
-
call
Call the function of this KernelLauncher with the current grid size, block size, shared memory size and stream, and with the given arguments.
The given arguments must all be either of the typePointer
, or of a primitive type except boolean. Otherwise, a CudaException will be thrown.- Parameters:
args
- The arguments for the function call- Throws:
jcuda.CudaException
- if an argument with an invalid type was given, or one of the internal functions for setting up and executing the kernel failed.
-