Class KernelLauncher

java.lang.Object
jcuda.utils.KernelLauncher

public class KernelLauncher extends Object
This is a utility class that simplifies the setup and launching of CUDA kernels using the JCuda Driver API.

Instances of this class may be created using one of the following methods:

These instances may then be used to call a kernel function with the call(Object...) method. The actual kernel function arguments which are passed to this method will be set up automatically, and aligned appropriately for their respective size.

The setup of the execution may be performed similarly as the invocation of a kernel when using the Runtime API in C. Such a call has the form
    kernel<<<gridDim, blockDim, sharedMemorySize, stream>>>(...);
where
  • gridDim is a dim3 which specifies the number of blocks per grid
  • blockDim is a dim3 that specifies the number of threads per block
  • sharedMemorySize is the size of the shared memory for the kernel
  • stream is a stream for asynchronous kernel execution
Similarly, the KernelLauncher allows specifying these parameters in the setup(dim3, dim3, int, CUstream) method:

    kernelLauncher.setup(gridDim, blockDim, sharedMemorySize, stream).call(...);

When default values for some of the parameters should be used, one of the overloaded versions of the setup method may be called:

    kernelLauncher.setup(gridDim, blockDim).call(kernel);

The parameters may also be set individually:

    kernelLauncher.setGridSize(gridSize);
    kernelLauncher.setBlockSize(blockSize);
    kernelLauncher.call(...);

  • Method Summary

    Modifier and Type
    Method
    Description
    void
    call(Object... args)
    Call the function of this KernelLauncher with the current grid size, block size, shared memory size and stream, and with the given arguments.

    The given arguments must all be either of the type Pointer, or of a primitive type except boolean.
    compile(String sourceCode, String functionName, String... nvccArguments)
    Create a new KernelLauncher for the function with the given name, that is defined in the given source code.
    create(String cuFileName, String functionName, boolean forceRebuild, String... nvccArguments)
    Create a new KernelLauncher for the function with the given name, that is contained in the .CU CUDA source file with the given name.
    create(String cuFileName, String functionName, String... nvccArguments)
    Create a new KernelLauncher for the function with the given name, that is contained in the .CU CUDA source file with the given name.
    forFunction(String functionName)
    Create a new KernelLauncher which uses the same module as this KernelLauncher, but may be used to execute a different function.
    jcuda.driver.CUmodule
    Returns the module that was created from the PTX- or CUBIN file, and which contains the function that should be executed.
    load(InputStream moduleInputStream, String functionName)
    Create a new KernelLauncher which may be used to execute the specified function which is loaded from the PTX- or CUBIN data that is read from the given input stream.
    load(String moduleFileName, String functionName)
    Create a new KernelLauncher which may be used to execute the specified function which is loaded from the PTX- or CUBIN (CUDA binary) file with the given name.
    setBlockSize(int x, int y, int z)
    Set the block size (number of threads per block) for the function call.

    This corresponds to the second parameter in the runtime call:

    kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

    The default block size is (1,1,1)
    static void
    Set the path to the NVCC compiler.
    static void
    setDeviceNumber(int number)
    Set the number (index) of the device which should be used by the KernelLauncher
    setGridSize(int x, int y)
    Set the grid size (number of blocks per grid) for the function call.

    This corresponds to the first parameter in the runtime call:

    kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

    The default grid size is (1,1,1)
    setGridSize(int x, int y, int z)
    Set the grid size (number of blocks per grid) for the function call.

    This corresponds to the first parameter in the runtime call:

    kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

    The default grid size is (1,1,1)
    setSharedMemSize(int sharedMemSize)
    Set the size of the shared memory for the function call.

    This corresponds to the third parameter in the runtime call:

    kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

    The default shared memory size is 0.
    setStream(jcuda.driver.CUstream stream)
    Set the stream for the function call.

    This corresponds to the fourth parameter in the runtime call:

    kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

    The default stream is null (0).
    setup(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize)
    Set the given grid size and block size for this KernelLauncher.
    setup(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize, int sharedMemSize)
    Set the given grid size and block size and shared memory size for this KernelLauncher.
    setup(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize, int sharedMemSize, jcuda.driver.CUstream stream)
    Set the given grid size and block size, shared memory size and stream for this KernelLauncher.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • setCompilerPath

      public static void setCompilerPath(String path)
      Set the path to the NVCC compiler. For example:
      setCompilerPath("C:/CUDA/bin");
      By default, this path is empty, assuming that the compiler is in a path that is visible via an environment variable.
      Parameters:
      path - The path to the NVCC compiler.
    • setDeviceNumber

      public static void setDeviceNumber(int number)
      Set the number (index) of the device which should be used by the KernelLauncher
      Parameters:
      number - The number of the device to use
      Throws:
      jcuda.CudaException - If number<0 or number>=deviceCount
    • compile

      public static KernelLauncher compile(String sourceCode, String functionName, String... nvccArguments)
      Create a new KernelLauncher for the function with the given name, that is defined in the given source code.

      The source code is stored in a temporary .CU CUDA source file, and a PTX file is compiled from this source file using the NVCC (NVIDIA CUDA C Compiler) in a separate process. The optional nvccArguments are passed to the NVCC.

      The NVCC has to be in a visible directory. E.g. for Windows, the NVCC.EXE has to be in a directory that is contained in the PATH environment variable. Alternatively, the path to the NVCC may be specified by calling setCompilerPath(String) with the respective path.

      Note: In order to make the function accessible by the name it has in the source code, the function has to be declared as an extern "C" function:

      extern "C"
      __global__ void functionName(...)
      {
      ...
      }
      Parameters:
      sourceCode - The source code containing the function
      functionName - The name of the function.
      nvccArguments - Optional arguments for the NVCC
      Returns:
      The KernelLauncher for the specified function
      Throws:
      jcuda.CudaException - If the creation of the CU- or PTX file fails, or the PTX may not be loaded, or the specified function can not be obtained.
      See Also:
    • create

      public static KernelLauncher create(String cuFileName, String functionName, String... nvccArguments)
      Create a new KernelLauncher for the function with the given name, that is contained in the .CU CUDA source file with the given name.

      Note: In order to make the function accessible by the name it has in the source code, the function has to be declared as an extern "C" function:

      extern "C"
      __global__ void functionName(...)
      {
      ...
      }

      The extension of the given file name is replaced with "ptx". If the PTX file with the resulting name does not exist, or is older than the .CU file, it is compiled from the specified source file using the NVCC (NVIDIA CUDA C Compiler) in a separate process. The optional nvccArguments are passed to the NVCC.

      The NVCC has to be in a visible directory. E.g. for Windows, the NVCC.EXE has to be in a directory that is contained in the PATH environment variable. Alternatively, the path to the NVCC may be specified by calling setCompilerPath(String) with the respective path.
      Parameters:
      cuFileName - The name of the source file.
      functionName - The name of the function.
      nvccArguments - Optional arguments for the NVCC
      Returns:
      The KernelLauncher for the specified function
      Throws:
      jcuda.CudaException - If the creation of the PTX file fails, or the PTX may not be loaded, or the specified function can not be obtained.
      See Also:
    • create

      public static KernelLauncher create(String cuFileName, String functionName, boolean forceRebuild, String... nvccArguments)
      Create a new KernelLauncher for the function with the given name, that is contained in the .CU CUDA source file with the given name.

      Note: In order to make the function accessible by the name it has in the source code, the function has to be declared as an extern "C" function:

      extern "C"
      __global__ void functionName(...)
      {
      ...
      }

      The extension of the given file name is replaced with "ptx". If the PTX file with the resulting name does not exist, or is older than the .CU file, it is compiled from the specified source file using the NVCC (NVIDIA CUDA C Compiler) in a separate process. The optional nvccArguments are passed to the NVCC.

      If the forceRebuild flag is 'true', then the PTX file will be recompiled from the given source file, even if it already existed or was newer than the source file, and the already existing PTX file will be overwritten.

      The NVCC has to be in a visible directory. E.g. for Windows, the NVCC.EXE has to be in a directory that is contained in the PATH environment variable. Alternatively, the path to the NVCC may be specified by calling setCompilerPath(String) with the respective path.
      Parameters:
      cuFileName - The name of the source file.
      functionName - The name of the function.
      forceRebuild - Whether the PTX file should be recompiled and overwritten if it already exists.
      nvccArguments - Optional arguments for the NVCC
      Returns:
      The KernelLauncher for the specified function
      Throws:
      jcuda.CudaException - If the creation of the PTX file fails, or the PTX may not be loaded, or the specified function can not be obtained.
      See Also:
    • load

      public static KernelLauncher load(String moduleFileName, String functionName)
      Create a new KernelLauncher which may be used to execute the specified function which is loaded from the PTX- or CUBIN (CUDA binary) file with the given name.
      Parameters:
      moduleFileName - The name of the PTX- or CUBIN file
      functionName - The name of the function
      Returns:
      The KernelLauncher for the specified function
      Throws:
      jcuda.CudaException - If the PTX- or CUBIN may not be loaded, or the specified function can not be obtained.
      See Also:
    • load

      public static KernelLauncher load(InputStream moduleInputStream, String functionName)
      Create a new KernelLauncher which may be used to execute the specified function which is loaded from the PTX- or CUBIN data that is read from the given input stream.
      Parameters:
      moduleInputStream - The stream for the PTX- or CUBIN data
      functionName - The name of the function
      Returns:
      The KernelLauncher for the specified function
      Throws:
      jcuda.CudaException - If the PTX- or CUBIN may not be loaded, or the specified function can not be obtained.
      See Also:
    • forFunction

      public KernelLauncher forFunction(String functionName)
      Create a new KernelLauncher which uses the same module as this KernelLauncher, but may be used to execute a different function. All parameters (grid size, block size, shared memory size and stream) of the returned KernelLauncher will be independent of 'this' one and initially contain the default values.
      Parameters:
      functionName - The name of the function
      Returns:
      The KernelLauncher for the specified function
      Throws:
      jcuda.CudaException - If the specified function can not be obtained from the module of this KernelLauncher.
    • getModule

      public jcuda.driver.CUmodule getModule()
      Returns the module that was created from the PTX- or CUBIN file, and which contains the function that should be executed. This module may also be used to access symbols and texture references. However, clients should not modify or unload the module.
      Returns:
      The CUmodule
    • setGridSize

      public KernelLauncher setGridSize(int x, int y)
      Set the grid size (number of blocks per grid) for the function call.

      This corresponds to the first parameter in the runtime call:

      kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

      The default grid size is (1,1,1)
      Parameters:
      x - The number of blocks per grid in x-direction
      y - The number of blocks per grid in y-direction
      Returns:
      This instance
      See Also:
    • setGridSize

      public KernelLauncher setGridSize(int x, int y, int z)
      Set the grid size (number of blocks per grid) for the function call.

      This corresponds to the first parameter in the runtime call:

      kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

      The default grid size is (1,1,1)
      Parameters:
      x - The number of blocks per grid in x-direction
      y - The number of blocks per grid in y-direction
      z - The number of blocks per grid in z-direction
      Returns:
      This instance
      See Also:
    • setBlockSize

      public KernelLauncher setBlockSize(int x, int y, int z)
      Set the block size (number of threads per block) for the function call.

      This corresponds to the second parameter in the runtime call:

      kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

      The default block size is (1,1,1)
      Parameters:
      x - The number of threads per block in x-direction
      y - The number of threads per block in y-direction
      z - The number of threads per block in z-direction
      Returns:
      This instance
      See Also:
    • setSharedMemSize

      public KernelLauncher setSharedMemSize(int sharedMemSize)
      Set the size of the shared memory for the function call.

      This corresponds to the third parameter in the runtime call:

      kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

      The default shared memory size is 0.
      Parameters:
      sharedMemSize - The size of the shared memory, in bytes
      Returns:
      This instance
      See Also:
    • setStream

      public KernelLauncher setStream(jcuda.driver.CUstream stream)
      Set the stream for the function call.

      This corresponds to the fourth parameter in the runtime call:

      kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

      The default stream is null (0).
      Parameters:
      stream - The stream for the function call
      Returns:
      This instance
      See Also:
    • setup

      public KernelLauncher setup(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize)
      Set the given grid size and block size for this KernelLauncher.
      Parameters:
      gridSize - The grid size (number of blocks per grid)
      blockSize - The block size (number of threads per block)
      Returns:
      This instance
      See Also:
    • setup

      public KernelLauncher setup(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize, int sharedMemSize)
      Set the given grid size and block size and shared memory size for this KernelLauncher.
      Parameters:
      gridSize - The grid size (number of blocks per grid)
      blockSize - The block size (number of threads per block)
      sharedMemSize - The size of the shared memory
      Returns:
      This instance
      See Also:
    • setup

      public KernelLauncher setup(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize, int sharedMemSize, jcuda.driver.CUstream stream)
      Set the given grid size and block size, shared memory size and stream for this KernelLauncher.
      Parameters:
      gridSize - The grid size (number of blocks per grid)
      blockSize - The block size (number of threads per block)
      sharedMemSize - The size of the shared memory
      stream - The stream for the kernel invocation
      Returns:
      This instance
      See Also:
    • call

      public void call(Object... args)
      Call the function of this KernelLauncher with the current grid size, block size, shared memory size and stream, and with the given arguments.

      The given arguments must all be either of the type Pointer, or of a primitive type except boolean. Otherwise, a CudaException will be thrown.
      Parameters:
      args - The arguments for the function call
      Throws:
      jcuda.CudaException - if an argument with an invalid type was given, or one of the internal functions for setting up and executing the kernel failed.