jcuda.utils.KernelLauncher

public class KernelLauncher extends Object

This is a utility class that simplifies the setup and launching of CUDA kernels using the JCuda Driver API.

Instances of this class may be created using one of the following methods:

compile(String, String, String...) will compile a kernel from a String containing the CUDA source code
create(String, String, String...) will create a kernel for a function that is contained in a CUDA source file
load(String, String) will load a kernel from a PTX or CUBIN (CUDA binary) file.
load(InputStream, String) will load a kernel from PTX- or CUBIN data which is provided via an InputStream (useful for packaging PTX- or CUBIN files into JAR archives)

These instances may then be used to call a kernel function with the call(Object...) method. The actual kernel function arguments which are passed to this method will be set up automatically, and aligned appropriately for their respective size.

The setup of the execution may be performed similarly as the invocation of a kernel when using the Runtime API in C. Such a call has the form


     kernel<<<gridDim, blockDim, 
 sharedMemorySize, stream>>>(...);

where

gridDim is a dim3 which specifies the number of blocks per grid
blockDim is a dim3 that specifies the number of threads per block
sharedMemorySize is the size of the shared memory for the kernel
stream is a stream for asynchronous kernel execution

Similarly, the KernelLauncher allows specifying these parameters in the setup(dim3, dim3, int, CUstream) method:


     kernelLauncher.setup(gridDim, 
 blockDim, sharedMemorySize, stream).call(...);

When default values for some of the parameters should be used, one of the overloaded versions of the setup method may be called:


     kernelLauncher.setup(gridDim, 
 blockDim).call(kernel);

The parameters may also be set individually:


     kernelLauncher.setGridSize(gridSize);

     kernelLauncher.setBlockSize(blockSize);

     kernelLauncher.call(...);

Method Summary

Modifier and Type

Method

Description

void

call(Object... args)

Call the function of this KernelLauncher with the current grid size, block size, shared memory size and stream, and with the given arguments.

The given arguments must all be either of the type Pointer, or of a primitive type except boolean.

static KernelLauncher

compile(String sourceCode, String functionName, String... nvccArguments)

Create a new KernelLauncher for the function with the given name, that is defined in the given source code.

static KernelLauncher

create(String cuFileName, String functionName, boolean forceRebuild, String... nvccArguments)

Create a new KernelLauncher for the function with the given name, that is contained in the .CU CUDA source file with the given name.

static KernelLauncher

create(String cuFileName, String functionName, String... nvccArguments)

Create a new KernelLauncher for the function with the given name, that is contained in the .CU CUDA source file with the given name.

KernelLauncher

forFunction(String functionName)

Create a new KernelLauncher which uses the same module as this KernelLauncher, but may be used to execute a different function.

jcuda.driver.CUmodule

getModule()

Returns the module that was created from the PTX- or CUBIN file, and which contains the function that should be executed.

static KernelLauncher

load(InputStream moduleInputStream, String functionName)

Create a new KernelLauncher which may be used to execute the specified function which is loaded from the PTX- or CUBIN data that is read from the given input stream.

static KernelLauncher

load(String moduleFileName, String functionName)

Create a new KernelLauncher which may be used to execute the specified function which is loaded from the PTX- or CUBIN (CUDA binary) file with the given name.

KernelLauncher

setBlockSize(int x, int y, int z)

Set the block size (number of threads per block) for the function call.

This corresponds to the second parameter in the runtime call:

kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

The default block size is (1,1,1)

static void

setCompilerPath(String path)

Set the path to the NVCC compiler.

static void

setDeviceNumber(int number)

Set the number (index) of the device which should be used by the KernelLauncher

KernelLauncher

setGridSize(int x, int y)

Set the grid size (number of blocks per grid) for the function call.

This corresponds to the first parameter in the runtime call:

kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

The default grid size is (1,1,1)

KernelLauncher

setGridSize(int x, int y, int z)

Set the grid size (number of blocks per grid) for the function call.

This corresponds to the first parameter in the runtime call:

kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

The default grid size is (1,1,1)

KernelLauncher

setSharedMemSize(int sharedMemSize)

Set the size of the shared memory for the function call.

This corresponds to the third parameter in the runtime call:

kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

The default shared memory size is 0.

KernelLauncher

setStream(jcuda.driver.CUstream stream)

Set the stream for the function call.

This corresponds to the fourth parameter in the runtime call:

kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);

The default stream is null (0).

KernelLauncher

setup(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize)

Set the given grid size and block size for this KernelLauncher.

KernelLauncher

setup(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize, int sharedMemSize)

Set the given grid size and block size and shared memory size for this KernelLauncher.

KernelLauncher

setup(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize, int sharedMemSize, jcuda.driver.CUstream stream)

Set the given grid size and block size, shared memory size and stream for this KernelLauncher.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- setCompilerPath
  
  public static void setCompilerPath(String path)
  
  Set the path to the NVCC compiler. For example:
  setCompilerPath("C:/CUDA/bin");
  By default, this path is empty, assuming that the compiler is in a path that is visible via an environment variable.
  
  Parameters:
  
  path - The path to the NVCC compiler.
- setDeviceNumber
  
  public static void setDeviceNumber(int number)
  
  Set the number (index) of the device which should be used by the KernelLauncher
  
  Parameters:
  
  number - The number of the device to use
  
  Throws:
  
  jcuda.CudaException - If number<0 or number>=deviceCount
- compile
  public static KernelLauncher compile(String sourceCode, String functionName, String... nvccArguments)
  
  Create a new KernelLauncher for the function with the given name, that is defined in the given source code.
  
  The source code is stored in a temporary .CU CUDA source file, and a PTX file is compiled from this source file using the NVCC (NVIDIA CUDA C Compiler) in a separate process. The optional nvccArguments are passed to the NVCC.
  
  The NVCC has to be in a visible directory. E.g. for Windows, the NVCC.EXE has to be in a directory that is contained in the PATH environment variable. Alternatively, the path to the NVCC may be specified by calling setCompilerPath(String) with the respective path.
  
  Note: In order to make the function accessible by the name it has in the source code, the function has to be declared as an extern "C" function:
  
  extern "C" __global__ void functionName(...) { ... }
  
  Parameters:
  
  sourceCode - The source code containing the function
  
  functionName - The name of the function.
  
  nvccArguments - Optional arguments for the NVCC
  
  Returns:
  
  The KernelLauncher for the specified function
  
  Throws:
  
  jcuda.CudaException - If the creation of the CU- or PTX file fails, or the PTX may not be loaded, or the specified function can not be obtained.
  
  See Also:
  
  create(String, String, String...)
  
  create(String, String, boolean, String...)
  
  load(String, String)
- create
  public static KernelLauncher create(String cuFileName, String functionName, String... nvccArguments)
  
  Create a new KernelLauncher for the function with the given name, that is contained in the .CU CUDA source file with the given name.
  
  Note: In order to make the function accessible by the name it has in the source code, the function has to be declared as an extern "C" function:
  
  extern "C" __global__ void functionName(...) { ... }
  The extension of the given file name is replaced with "ptx". If the PTX file with the resulting name does not exist, or is older than the .CU file, it is compiled from the specified source file using the NVCC (NVIDIA CUDA C Compiler) in a separate process. The optional nvccArguments are passed to the NVCC.
  
  The NVCC has to be in a visible directory. E.g. for Windows, the NVCC.EXE has to be in a directory that is contained in the PATH environment variable. Alternatively, the path to the NVCC may be specified by calling setCompilerPath(String) with the respective path.
  
  Parameters:
  
  cuFileName - The name of the source file.
  
  functionName - The name of the function.
  
  nvccArguments - Optional arguments for the NVCC
  
  Returns:
  
  The KernelLauncher for the specified function
  
  Throws:
  
  jcuda.CudaException - If the creation of the PTX file fails, or the PTX may not be loaded, or the specified function can not be obtained.
  
  See Also:
  
  compile(String, String, String...)
  
  create(String, String, boolean, String...)
  
  load(String, String)
  
  load(InputStream, String)
- create
  public static KernelLauncher create(String cuFileName, String functionName, boolean forceRebuild, String... nvccArguments)
  
  Create a new KernelLauncher for the function with the given name, that is contained in the .CU CUDA source file with the given name.
  
  Note: In order to make the function accessible by the name it has in the source code, the function has to be declared as an extern "C" function:
  
  extern "C" __global__ void functionName(...) { ... }
  The extension of the given file name is replaced with "ptx". If the PTX file with the resulting name does not exist, or is older than the .CU file, it is compiled from the specified source file using the NVCC (NVIDIA CUDA C Compiler) in a separate process. The optional nvccArguments are passed to the NVCC.
  
  If the forceRebuild flag is 'true', then the PTX file will be recompiled from the given source file, even if it already existed or was newer than the source file, and the already existing PTX file will be overwritten.
  
  The NVCC has to be in a visible directory. E.g. for Windows, the NVCC.EXE has to be in a directory that is contained in the PATH environment variable. Alternatively, the path to the NVCC may be specified by calling setCompilerPath(String) with the respective path.
  
  Parameters:
  
  cuFileName - The name of the source file.
  
  functionName - The name of the function.
  
  forceRebuild - Whether the PTX file should be recompiled and overwritten if it already exists.
  
  nvccArguments - Optional arguments for the NVCC
  
  Returns:
  
  The KernelLauncher for the specified function
  
  Throws:
  
  jcuda.CudaException - If the creation of the PTX file fails, or the PTX may not be loaded, or the specified function can not be obtained.
  
  See Also:
  
  compile(String, String, String...)
  
  create(String, String, String...)
  
  load(String, String)
  
  load(InputStream, String)
- load
  public static KernelLauncher load(String moduleFileName, String functionName)
  
  Create a new KernelLauncher which may be used to execute the specified function which is loaded from the PTX- or CUBIN (CUDA binary) file with the given name.
  
  Parameters:
  
  moduleFileName - The name of the PTX- or CUBIN file
  
  functionName - The name of the function
  
  Returns:
  
  The KernelLauncher for the specified function
  
  Throws:
  
  jcuda.CudaException - If the PTX- or CUBIN may not be loaded, or the specified function can not be obtained.
  
  See Also:
  
  compile(String, String, String...)
  
  create(String, String, boolean, String...)
  
  load(String, String)
  
  load(InputStream, String)
- load
  public static KernelLauncher load(InputStream moduleInputStream, String functionName)
  
  Create a new KernelLauncher which may be used to execute the specified function which is loaded from the PTX- or CUBIN data that is read from the given input stream.
  
  Parameters:
  
  moduleInputStream - The stream for the PTX- or CUBIN data
  
  functionName - The name of the function
  
  Returns:
  
  The KernelLauncher for the specified function
  
  Throws:
  
  jcuda.CudaException - If the PTX- or CUBIN may not be loaded, or the specified function can not be obtained.
  
  See Also:
  
  compile(String, String, String...)
  
  create(String, String, boolean, String...)
  
  load(String, String)
  
  load(InputStream, String)
- forFunction
  
  public KernelLauncher forFunction(String functionName)
  
  Create a new KernelLauncher which uses the same module as this KernelLauncher, but may be used to execute a different function. All parameters (grid size, block size, shared memory size and stream) of the returned KernelLauncher will be independent of 'this' one and initially contain the default values.
  
  Parameters:
  
  functionName - The name of the function
  
  Returns:
  
  The KernelLauncher for the specified function
  
  Throws:
  
  jcuda.CudaException - If the specified function can not be obtained from the module of this KernelLauncher.
- getModule
  
  public jcuda.driver.CUmodule getModule()
  
  Returns the module that was created from the PTX- or CUBIN file, and which contains the function that should be executed. This module may also be used to access symbols and texture references. However, clients should not modify or unload the module.
  
  Returns:
  
  The CUmodule
- setGridSize
  public KernelLauncher setGridSize(int x, int y)
  
  Set the grid size (number of blocks per grid) for the function call.
  
  This corresponds to the first parameter in the runtime call:
  
  kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);
  
  The default grid size is (1,1,1)
  
  Parameters:
  
  x - The number of blocks per grid in x-direction
  
  y - The number of blocks per grid in y-direction
  
  Returns:
  
  This instance
  
  See Also:
  
  call(Object...)
  
  setup(dim3, dim3)
  
  setup(dim3, dim3, int)
  
  setup(dim3, dim3, int, CUstream)
- setGridSize
  public KernelLauncher setGridSize(int x, int y, int z)
  
  Set the grid size (number of blocks per grid) for the function call.
  
  This corresponds to the first parameter in the runtime call:
  
  kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);
  
  The default grid size is (1,1,1)
  
  Parameters:
  
  x - The number of blocks per grid in x-direction
  
  y - The number of blocks per grid in y-direction
  
  z - The number of blocks per grid in z-direction
  
  Returns:
  
  This instance
  
  See Also:
  
  call(Object...)
  
  setup(dim3, dim3)
  
  setup(dim3, dim3, int)
  
  setup(dim3, dim3, int, CUstream)
- setBlockSize
  public KernelLauncher setBlockSize(int x, int y, int z)
  
  Set the block size (number of threads per block) for the function call.
  
  This corresponds to the second parameter in the runtime call:
  
  kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);
  
  The default block size is (1,1,1)
  
  Parameters:
  
  x - The number of threads per block in x-direction
  
  y - The number of threads per block in y-direction
  
  z - The number of threads per block in z-direction
  
  Returns:
  
  This instance
  
  See Also:
  
  call(Object...)
  
  setup(dim3, dim3)
  
  setup(dim3, dim3, int)
  
  setup(dim3, dim3, int, CUstream)
- setSharedMemSize
  public KernelLauncher setSharedMemSize(int sharedMemSize)
  
  Set the size of the shared memory for the function call.
  
  This corresponds to the third parameter in the runtime call:
  
  kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);
  
  The default shared memory size is 0.
  
  Parameters:
  
  sharedMemSize - The size of the shared memory, in bytes
  
  Returns:
  
  This instance
  
  See Also:
  
  call(Object...)
  
  setup(dim3, dim3)
  
  setup(dim3, dim3, int)
  
  setup(dim3, dim3, int, CUstream)
- setStream
  public KernelLauncher setStream(jcuda.driver.CUstream stream)
  
  Set the stream for the function call.
  
  This corresponds to the fourth parameter in the runtime call:
  
  kernel<<<gridSize, blockSize, sharedMemSize, stream>>>(...);
  
  The default stream is null (0).
  
  Parameters:
  
  stream - The stream for the function call
  
  Returns:
  
  This instance
  
  See Also:
  
  call(Object...)
  
  setup(dim3, dim3)
  
  setup(dim3, dim3, int)
  
  setup(dim3, dim3, int, CUstream)
- setup
  public KernelLauncher setup(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize)
  
  Set the given grid size and block size for this KernelLauncher.
  
  Parameters:
  
  gridSize - The grid size (number of blocks per grid)
  
  blockSize - The block size (number of threads per block)
  
  Returns:
  
  This instance
  
  See Also:
  
  call(Object...)
  
  setup(dim3, dim3, int)
  
  setup(dim3, dim3, int, CUstream)
- setup
  public KernelLauncher setup(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize, int sharedMemSize)
  
  Set the given grid size and block size and shared memory size for this KernelLauncher.
  
  Parameters:
  
  gridSize - The grid size (number of blocks per grid)
  
  blockSize - The block size (number of threads per block)
  
  sharedMemSize - The size of the shared memory
  
  Returns:
  
  This instance
  
  See Also:
  
  call(Object...)
  
  setup(dim3, dim3)
  
  setup(dim3, dim3, int, CUstream)
- setup
  public KernelLauncher setup(jcuda.runtime.dim3 gridSize, jcuda.runtime.dim3 blockSize, int sharedMemSize, jcuda.driver.CUstream stream)
  
  Set the given grid size and block size, shared memory size and stream for this KernelLauncher.
  
  Parameters:
  
  gridSize - The grid size (number of blocks per grid)
  
  blockSize - The block size (number of threads per block)
  
  sharedMemSize - The size of the shared memory
  
  stream - The stream for the kernel invocation
  
  Returns:
  
  This instance
  
  See Also:
  
  call(Object...)
  
  setup(dim3, dim3)
  
  setup(dim3, dim3, int)
- call
  
  public void call(Object... args)
  
  Call the function of this KernelLauncher with the current grid size, block size, shared memory size and stream, and with the given arguments.
  
  The given arguments must all be either of the type Pointer, or of a primitive type except boolean. Otherwise, a CudaException will be thrown.
  
  Parameters:
  
  args - The arguments for the function call
  
  Throws:
  
  jcuda.CudaException - if an argument with an invalid type was given, or one of the internal functions for setting up and executing the kernel failed.

Class KernelLauncher

Method Summary

Methods inherited from class java.lang.Object

Method Details

setCompilerPath

setDeviceNumber

compile

create

create

load

load

forFunction

getModule

setGridSize

setGridSize

setBlockSize

setSharedMemSize

setStream

setup

setup

setup

call