Helper for Numba Simulation

Check if the Numba CUDA simulator is enabled

check for nvidia-smi avalablity

has_gpu


def has_gpu(
    
):

Check if the system has a GPU using subprocess nvidia-smi.

assert not has_gpu()

Check for simulation Flag

source

is_sim


def is_sim(
    
):

Check if we’re running in a simulator by checking the NUMBA_ENABLE_CUDASIM environment variable

assert not is_sim()

Set sim

it have to be called befor importing the cuda

source

set_sim


def set_sim(
    
):

Seting up Numba CUDA simulator

set_sim()
assert is_sim()

check if the cuda is available or not

Setup NumbaSim

check if nvdia-smi available
if not init the simulator
if yes, check if cuda is available

in the case 2, if we do not set the flag prior to importing numba it will throw an error

device api mimics torch.device

d = cuda.device_array(1)
type(d)

numba.cuda.simulator.cudadrv.devicearray.FakeCUDAArray

For a tensor d which is allocated in CUDA

isinstance(d, cuda.cudadrv.devicearray.DeviceNDArray)

So we are going to use copy_to_host for checking is the tensor is already present in the device and can be moved to host.

source

device


def device(
    x
):

assert device(d) == 'cuda'
assert device(d.copy_to_host()) == 'cpu'

source

test_close


def test_close(
    a, b, tol:float=0.0001
):

a = np.array([1.0, 2.0, 3.0], dtype=np.float32)
b = np.array([1.0001, 2.0001, 3.0001], dtype=np.float32)
assert test_close(a, b, tol=1e-4)

source

dim


def dim(
    base:float, th:float
):

assert dim(8, 5) == 2
assert dim(8, 8) == 1

Performace Capture

source

timer


def timer(
    
):

CPU-only setup for Numba CUDA simulator

with timer():
    time.sleep(0.01)  # 10ms sleep

10.1534 ms

NumbaSim Setup

@cuda.jit
def add_kernel(a, b, c):
    idx = cuda.grid(1)
    if idx < a.size:
        c[idx] = a[idx] + b[idx]

# Test data
N = 1
a = cuda.to_device(np.ones(N, dtype=np.float32))
b = cuda.to_device(np.ones(N, dtype=np.float32))
c = cuda.device_array(N, dtype=np.float32)

with timer():
    add_kernel[1, 1](a, b, c)

11.5806 ms