spikingjelly.activation_based.auto_cuda package

Module contents

spikingjelly.activation_based.auto_cuda.base.wrap_with_comment(code: str, comment: str)[源代码]
spikingjelly.activation_based.auto_cuda.base.startswiths(x: str, prefixes: tuple)[源代码]
class spikingjelly.activation_based.auto_cuda.base.CKernel(kernel_name: str)[源代码]

基类:object

参数:

kernel_name (str) – the name of kernel

The base python class for simplifying the using of custom CUDA kernel.

Some critical attributes:

cparams:

a dict for saving parameters name and type.

reserved_cnames:

a list for saving reserved variables names, which can not be used to name variable again.

Here is an example:

from spikingjelly.activation_based.auto_cuda import base

example_ck = base.CKernel(kernel_name='example_ck')
print(example_ck.full_codes)

The outputs are:

#include <cuda_fp16.h>
extern "C" __global__
void example_ck(
)
{}

A CKernel is composed of three parts: declaration, head, core, and tail. When setting logging level <= DEBUG, some debug information will be added to cuda codes or printed. And we can check where is each part. Here is an example:

import logging
logging.basicConfig(level=logging.DEBUG)
from spikingjelly.activation_based.auto_cuda import base

example_ck = base.CKernel(kernel_name='example_ck')
print(example_ck.full_codes)

The outputs are:

//------declaration start------

#include <cuda_fp16.h>
extern "C" __global__
void example_ck(
)

//------declaration end--------


//------head start------
{
//------head end--------


//------core start------

//------core end--------


//------tail start------
}
//------tail end--------

In most cases, CKernel is used as a base class. Refer to CKernel1D and CKernel2D for more details.

check_attributes(**kwargs)[源代码]
参数:

kwargs (dict) – a dict of attributes

返回:

if all value in kwargs[key] is identical to self.__getattribute__(key)

返回类型:

bool

This function can be used to check if a CKernel is changed by if any of its attributes changes.

property core
set_contiguous(py_dict: dict)[源代码]
参数:

py_dict (dict) – a dict whose value is torch.Tensor or cupy.ndarray

Check if all values in py_dict are torch.Tensor or cupy.ndarray and contiguous. If not, this function will raise an error.

get_device(py_dict: dict) int[源代码]
参数:

py_dict (dict) – a dict

Traverse the dict and return the device id of the first met torch.Tensor. If no torch.Tensor in py_dict, this function will raise an error.

check_device(device: int, py_dict: dict)[源代码]
参数:
  • device (int) – the cuda device id

  • py_dict (dict) – a dict

Check if the device id of each torch.Tensor or cupy.ndarray in py_dict is identical to device. If not, this function will raise an error.

check_keys(py_dict: dict)[源代码]
参数:

py_dict (dict) – a dict

Check if keys of py_dict are identical to keys of self.cparams. If not, this function will raise an error.

check_ctypes(py_dict: dict)[源代码]
参数:

py_dict (dict) – a dict

Check if the value in py_dict has the corresponding ctype in self.cparams, which includes:

torch.float or np.float32—— 'const float' or 'float'

torch.half or np.float16 —— 'const half2' or 'half2'

np.int_ —— 'const int' or 'int'

If not, this function will raise an error.

check_half2(py_dict: dict)[源代码]

This function is implemented for sub-class when needed.

get_ptrs(py_dict: dict)[源代码]
参数:

py_dict (dict) – a dict

返回:

a tuple of data ptr

返回类型:

tuple

Get the address of the first element of each torch.Tensor or cupy.ndarray in py_dict.

__call__(grid: tuple, block: tuple, py_dict: dict, *args_1, **kwargs)[源代码]
参数:
  • grid (tuple) – the grid number of CUDA kernel

  • block (tuple) – the block number of CUDA kernel

  • py_dict (dict) – the dict that contains parameters for CUDA kernel

Execute the CUDA kernel. *args_1, **kwargs are used as *args_1, **kwargs in cupy.RawKernel.

py_dict should contain key: value where key is the cuda kernel function param name, and value is the variable. This dict should be one-to-one correspondence to self.cparams.

For example, if self.cparams is

{
    'numel': 'const int &',
    'x': 'const float *',
    'y': 'const float *'
}

Then py_dict sould be

{
    'numel': numel,
    'x': x,
    'y': y
}

where numel, x, y should be torch.Tensor or cupy.ndarray with the corresponding data type, e.g., x in py_dict should have data type torch.float because x in self.cparams have value 'const float *' .

The keys order is arbitrary because this function will sort keys to align formal and actual parameters.

add_param(ctype: str, cname: str)[源代码]
参数:
  • ctype (str) – the type of the CUDA param

  • cname (str) – the name of the CUDA param

Add a param to self.cparams.

Note

When calling self.__call__, the params order in the CUDA kernel are sorted by the dictionary order. Thus, the user do not need to call add_param by some specific order.

Here is an example:

from spikingjelly.activation_based.auto_cuda import base

example_ck = base.CKernel(kernel_name='example_ck')
print('origin:')
print(example_ck.full_codes)


example_ck.add_param(ctype='const float*', cname='x')
example_ck.add_param(ctype='const float*', cname='y')
example_ck.add_param(ctype='float', cname='z')

print('after:')
print(example_ck.full_codes)
origin:

        #include <cuda_fp16.h>
        extern "C" __global__
        void example_ck(
        const int & numel
        )

after:

        #include <cuda_fp16.h>
        extern "C" __global__
        void example_ck(
        const int & numel, const float* x, const float* y, float z
        )
property declaration
property head
property tail
property full_codes

the full cuda codes :rtype: str

Type:

return

class spikingjelly.activation_based.auto_cuda.base.CKernel1D(*args, **kwargs)[源代码]

基类:CKernel

参数:

kernel_name (str) – the name of kernel

The 1D (element-wise) CUDA kernel, which is extended from CKernel. All input/output tensors will be regarded as 1D tensors.

Some critical attributes:

cparams:

A dict for saving parameters name and type. The default value is {'numel': 'const int &'}. numel represents the numel of elements for element-wise operations, which is also the numer of cuda threads.

reserved_cnames:

A list for saving reserved variables names, which can not be used to name variable again. The defaule value is ['index']. index represents the index of element, which is also the cuda thread index.

Now let us check what the empty 1d kernel looks like:

from spikingjelly.activation_based.auto_cuda import base
temp_kernel = base.CKernel1D(kernel_name='temp_kernel')
print(temp_kernel.full_codes)

The outputs are:

#include <cuda_fp16.h>
extern "C" __global__
void temp_kernel(
const int & numel
)

{
    const int index = blockIdx.x * blockDim.x + threadIdx.x;
    if (index < numel)
    {

    }
}

With setting logging level, we can check each part of the kernel:

import logging
logging.basicConfig(level=logging.DEBUG)
from spikingjelly.activation_based.auto_cuda import base
temp_kernel = base.CKernel1D(kernel_name='temp_kernel')
print(temp_kernel.full_codes)

The outputs are:

//------declaration start------

        #include <cuda_fp16.h>
        extern "C" __global__
        void temp_kernel(
        const int & numel
        )

//------declaration end--------


//------head start------

        {
            const int index = blockIdx.x * blockDim.x + threadIdx.x;
            if (index < numel)
            {

//------head end--------


//------core start------

//------core end--------


//------tail start------

            }
        }

//------tail end--------
self.code can be specified by user.

For example, if we want to write a heaviside kernel, we can implement it easily with the cuda code y[index] = x[index] >= 0.0f ? 1.0f: 0.0f;, and add two params x, y, which are inputs and outputs.

Here is the example:

from spikingjelly.activation_based.auto_cuda import base

c_heaviside = base.CKernel1D(kernel_name='heaviside')
c_heaviside.add_param(ctype='const float *', cname='x')
c_heaviside.add_param(ctype='float *', cname='y')
c_heaviside.core = '''
            y[index] = x[index] >= 0.0f ? 1.0f: 0.0f;
'''
print(c_heaviside.full_codes)

The outputs are:

#include <cuda_fp16.h>
extern "C" __global__
void heaviside(
const int & numel, const float * x, float * y
)

{
    const int index = blockIdx.x * blockDim.x + threadIdx.x;
    if (index < numel)
    {

    y[index] = x[index] >= 0.0f ? 1.0f: 0.0f;

    }
}

Here is an example of how to execute the kernel:

import torch
from spikingjelly.activation_based import cuda_utils

device = 'cuda:0'
x = torch.rand([4, 4], device=device) - 0.5
y = torch.zeros_like(x)

numel = x.numel()
threads = 1024
blocks = cuda_utils.cal_blocks(numel, threads)
print('x=')
print(x)

with cuda_utils.DeviceEnvironment(device=x.get_device()):
    numel = cupy.asarray(numel)
    py_dict = {
        'numel': numel,
        'x': x,
        'y': y
    }
    c_heaviside((blocks, ), (threads, ), py_dict)


print('y=')
print(y)

The outputs are:

x=
tensor([[-0.0423, -0.1383, -0.0238,  0.1018],
        [ 0.3422,  0.1449, -0.2938, -0.1858],
        [-0.3503,  0.0004, -0.4274, -0.2012],
        [-0.0227,  0.2229, -0.0776,  0.2687]], device='cuda:0')
y=
tensor([[0., 0., 0., 1.],
        [1., 1., 0., 0.],
        [0., 1., 0., 0.],
        [0., 1., 0., 1.]], device='cuda:0')
property head
property tail
check_half2(py_dict: dict)[源代码]
参数:

py_dict (dict) – a dict

Check value in py_dict. If the value is torch.Tensor with value.dtype == torch.half or cupy.ndarray with value.dtype == np.float16, this function will check whether the number of elements of value is even.

We assert when using half dtype, the numel should be even because we will use half2 in CUDA kernel.

Note

CKernel1D.__call__ will pad half tensor to even numel before executing the kernel. Thus, the user does not need to worry about padding.

__call__(grid: tuple, block: tuple, py_dict: dict, *args_1, **kwargs)[源代码]
参数:
  • grid (tuple) – the grid number of CUDA kernel

  • block (tuple) – the block number of CUDA kernel

  • py_dict (dict) – the dict that contains parameters for CUDA kernel

Execute the CUDA kernel. *args_1, **kwargs are used as *args_1, **kwargs in cupy.RawKernel.

py_dict should contain key: value where key is the cuda kernel function param name, and value is the variable. This dict should be one-to-one correspondence to self.cparams.

For example, if self.cparams is

{
    'numel': 'const int &',
    'x': 'const float *',
    'y': 'const float *'
}

Then py_dict sould be

{
    'numel': numel,
    'x': x,
    'y': y
}

where numel, x, y should be torch.Tensor or cupy.ndarray with the corresponding data type, e.g., x in py_dict should have data type torch.float because x in self.cparams have value 'const float *' .

The keys order is arbitrary because this function will sort keys to align formal and actual parameters.

Note

All tensors in py_dict will be regarded as 1D.

Note

If any tensor x in py_dict with data type torch.half or np.float16 but odd numel will be flattened and padded by x = [x, x[-1]] before executing the CUDA kernel. After execution, padded values in x will be removed, and x will be reshaped to the origin shape.

simple_call(**kwargs)[源代码]
参数:

kwargs (dict) – the dict that contains parameters for CUDA kernel

The simplified calling function, which is simplified from the standard calling function is CKernel1D.simple_call.

Compared with CKernel1D.simple_call, the device, numel, numbers of CUDA threads and blocks are calculated automatically from tensors in kwargs.

Here is the example:

import torch
from spikingjelly.activation_based import cuda_utils
from spikingjelly.activation_based.auto_cuda import base

c_heaviside = base.CKernel1D(kernel_name='heaviside')
c_heaviside.add_param(ctype='const float *', cname='x')
c_heaviside.add_param(ctype='float *', cname='y')
c_heaviside.core = '''
            y[index] = x[index] >= 0.0f ? 1.0f: 0.0f;
'''
device = 'cuda:0'

x = torch.rand([4, 4], device=device) - 0.5
y = torch.zeros_like(x)

print('x=')
print(x)
c_heaviside.simple_call(x=x, y=y)
print('y=')
print(y)

The outputs are:

x=
tensor([[-0.1706,  0.2063, -0.2077,  0.3335],
        [-0.0180, -0.2429,  0.3488,  0.1146],
        [ 0.0362,  0.1584,  0.4828, -0.1389],
        [-0.2684,  0.1898,  0.0560,  0.2058]], device='cuda:0')
y=
tensor([[0., 1., 0., 1.],
        [0., 0., 1., 1.],
        [1., 1., 1., 0.],
        [0., 1., 1., 1.]], device='cuda:0')
class spikingjelly.activation_based.auto_cuda.base.CKernel2D(kernel_name: str, reverse: bool = False)[源代码]

基类:CKernel

参数:
  • kernel_name (str) – the name of kernel

  • reverse (bool) – If True, then the for-loop in kernel is for(int t = index; t < numel; t += dt). If False, then the for-loop in kernel is for(int t = numel - N + index; t >= 0; t -= dt).

The 2D CUDA kernel, which is extended from CKernel.

All input/output tensors should have dimensions no more than 2. All 2D tensors will be regarded as shape = [T, N], where T is the sequence length and N is the elements number of data at one time-step

Some critical attributes:

cparams:

A dict for saving parameters name and type. The default value is {'numel': 'const int &', 'N': 'const int &'}.

N: the number of elements number of sequence data at one time-step (the numel of 1-th dimension)

numel: the numel of elements in input/output tensors, which is T * N

reserved_cnames:

A list for saving reserved variables names, which can not be used to name variable again. The defaule value is ['index', 'dt', 't'].

index: the index in 1-th dimension, which is also the CUDA thread index

t: the index in 0-th dimension

dt: used in CUDA kernel as the time-step stride. When x[t_py][j] in python code is identical to x[t] in CUDA code, then x[t_py + 1][j] in python code is identical to x[t + dt] in CUDA code.

Now let us check what the empty 2d kernel looks like:

from spikingjelly.activation_based.auto_cuda import base

 temp_kernel = base.CKernel2D(kernel_name='temp_kernel')
 print(temp_kernel.full_codes)

The outputs are:

#include <cuda_fp16.h>
extern "C" __global__
void temp_kernel(
const int & numel, const int & N
)

{
    const int index = blockIdx.x * blockDim.x + threadIdx.x;
    if (index < N)
    {
        const int dt = N;

        for(int t = index; t < numel; t += dt)
        {

        }

    }
}

With setting logging level, we can check each part of the kernel:

import logging
logging.basicConfig(level=logging.DEBUG)
from spikingjelly.activation_based.auto_cuda import base

temp_kernel = base.CKernel2D(kernel_name='temp_kernel')
print(temp_kernel.full_codes)

The outputs are:

//------declaration start------

#include <cuda_fp16.h>
extern "C" __global__
void temp_kernel(
const int & numel, const int & N
)

//------declaration end--------


//------head start------

        {
            const int index = blockIdx.x * blockDim.x + threadIdx.x;
            if (index < N)
            {
                const int dt = N;

//------pre_core start------

//------pre_core end--------


                for(int t = index; t < numel; t += dt)
                {

//------head end--------


//------core start------

//------core end--------


//------tail start------

                }

//------post_core start------

//------post_core end--------


            }
        }

//------tail end--------

self.pre_core, self.post_core, self.core can be specified by user.

Here is the example of how to implement the cumsum operation:

import torch
import cupy
from spikingjelly.activation_based.auto_cuda import base
from spikingjelly.activation_based import cuda_utils

cumsum = base.CKernel2D(kernel_name='cumsum')
cumsum.add_param(ctype='const float *', cname='x')
cumsum.add_param(ctype='float *', cname='y')

cumsum.core = '''
                    if (t - dt < 0)
                    {
                        y[t] = x[t];
                    }
                    else
                    {
                        y[t] = x[t] + y[t - dt];
                    }
'''

print(cumsum.full_codes)

T = 4
N = 3
device = 'cuda:0'

x = torch.randint(low=0, high=4, size=[T, N], device=device).float()
y = torch.zeros_like(x)

threads = 1024
blocks = cuda_utils.cal_blocks(N, threads)

with cuda_utils.DeviceEnvironment(device=x.get_device()):
    numel = cupy.asarray(T * N)
    N = cupy.asarray(N)
    py_dict = {
        'N': N,
        'numel': numel,
        'x': x,
        'y': y
    }
    cumsum((blocks, ), (threads, ), py_dict)

print('x=')
print(x)
print('y=')
print(y)

The outputs are:

#include <cuda_fp16.h>
extern "C" __global__
void cumsum(
const int & numel, const int & N, const float * x, float * y
)

{
    const int index = blockIdx.x * blockDim.x + threadIdx.x;
    if (index < N)
    {
        const int dt = N;

        for(int t = index; t < numel; t += dt)
        {

            if (t - dt < 0)
            {
                y[t] = x[t];
            }
            else
            {
                y[t] = x[t] + y[t - dt];
            }

        }

    }
}
x=
tensor([[3., 0., 2.],
        [2., 0., 0.],
        [2., 3., 2.],
        [2., 1., 0.]], device='cuda:0')
y=
tensor([[3., 0., 2.],
        [5., 0., 2.],
        [7., 3., 4.],
        [9., 4., 4.]], device='cuda:0')
property pre_core
property post_core
check_shape(py_dict: dict)[源代码]
check_half2(py_dict: dict)[源代码]
参数:

py_dict (dict) – a dict

Check value in py_dict. If the value is torch.Tensor with value.dtype == torch.half or cupy.ndarray with value.dtype == np.float16, this function will check whether the number of elements of value is even.

If the tensor x is 1D, it will be padded when x.numel() % 2 != 0. If the tensor x is 2D, it will be padded when x.shape[1] % 2 != 0.

We assert when using half dtype, the numel should be even because we will use half2 in CUDA kernel.

Note

CKernel2D.__call__ will pad half tensor to even numel before executing the kernel. Thus, the user does not need to worry about padding.

__call__(grid: tuple, block: tuple, py_dict: dict, *args_1, **kwargs)[源代码]
参数:
  • grid (tuple) – the grid number of CUDA kernel

  • block (tuple) – the block number of CUDA kernel

  • py_dict (dict) – the dict that contains parameters for CUDA kernel

Execute the CUDA kernel. *args_1, **kwargs are used as *args_1, **kwargs in cupy.RawKernel.

py_dict should contain key: value where key is the cuda kernel function param name, and value is the variable. This dict should be one-to-one correspondence to self.cparams.

For example, if self.cparams is

{
    'numel': 'const int &',
    'x': 'const float *',
    'y': 'const float *'
}

Then py_dict sould be

{
    'numel': numel,
    'x': x,
    'y': y
}

where numel, x, y should be torch.Tensor or cupy.ndarray with the corresponding data type, e.g., x in py_dict should have data type torch.float because x in self.cparams have value 'const float *' .

The keys order is arbitrary because this function will sort keys to align formal and actual parameters.

Note

All tensors in py_dict should be 1D or 2D.

Note

If any 1D tensor x in py_dict with data type torch.half or np.float16 but odd numel will be flattened and padded by x = [x, x[-1]] before executing the CUDA kernel.

If any 2D tensor x with shape [T, N] in py_dict with data type torch.half or np.float16 but N is odd, then x will be padded as x = [x, x[:, -1]], whose shape is [T, N + 1].

After execution, padded values in x will be removed, and x will be reshaped to the origin shape.

property head
property tail
simple_call(**kwargs)[源代码]
参数:

kwargs (dict) – the dict that contains parameters for CUDA kernel

The simplified calling function, which is simplified from the standard calling function is CKernel2D.simple_call.

Compared with CKernel2D.simple_call, the device, N, numel, numbers of CUDA threads and blocks are calculated automatically from tensors in kwargs.

Here is the example:

import torch
import cupy
from spikingjelly.activation_based.auto_cuda import base
from spikingjelly.activation_based import cuda_utils

cumsum = base.CKernel2D(kernel_name='cumsum')
cumsum.add_param(ctype='const float *', cname='x')
cumsum.add_param(ctype='float *', cname='y')

cumsum.core = '''
                    if (t - dt < 0)
                    {
                        y[t] = x[t];
                    }
                    else
                    {
                        y[t] = x[t] + y[t - dt];
                    }
'''

T = 4
N = 3
device = 'cuda:0'

x = torch.randint(low=0, high=4, size=[T, N], device=device).float()
y = torch.zeros_like(x)

cumsum.simple_call(x=x, y=y)
print('x=')
print(x)
print('y=')
print(y)

The outputs are:

x=
tensor([[0., 2., 1.],
        [1., 3., 1.],
        [2., 2., 0.],
        [2., 0., 1.]], device='cuda:0')
y=
tensor([[0., 2., 1.],
        [1., 5., 2.],
        [3., 7., 2.],
        [5., 7., 3.]], device='cuda:0')
class spikingjelly.activation_based.auto_cuda.base.CodeTyper(indent_num: int)[源代码]

基类:object

参数:

indent_num (int) – the number of indents

A CUDA code formatter with adding indents. The full code can be accessed by self.codes.

Here is an example:

from spikingjelly.activation_based.auto_cuda import base, cfunction

code0 = cfunction.if_else(z='z', x='x', y='y', mask='mask', dtype='float')
code1 = cfunction.sigmoid_backward(y='y', x='x', alpha=2., dtype='float')

codes = ''
codes += code0
codes += code1

print('// Without CodeTyper:')
print('// ------------------')
print(codes)
print('// ------------------')

ctyper = base.CodeTyper(4)
ctyper.append(code0)
ctyper.append(code1)
print('// With CodeTyper:')
print('// ------------------')
print(ctyper.codes)
print('// ------------------')
// Without CodeTyper:
// ------------------
z = x * mask + y * (1.0f - mask);const float sigmoid_backward__sigmoid_ax = 1.0f / (1.0f + expf(- (2.0f) * x));
y = (1.0f - sigmoid_backward__sigmoid_ax) * sigmoid_backward__sigmoid_ax * (2.0f);
// ------------------
// With CodeTyper:
// ------------------

    z = x * mask + y * (1.0f - mask);
    const float sigmoid_backward__sigmoid_ax = 1.0f / (1.0f + expf(- (2.0f) * x));
    y = (1.0f - sigmoid_backward__sigmoid_ax) * sigmoid_backward__sigmoid_ax * (2.0f);

// ------------------
append(codes: str)[源代码]
参数:

codes (str) – cuda codes to be added

Append codes in self.codes.

class spikingjelly.activation_based.auto_cuda.base.CodeBlock(env: CodeTyper)[源代码]

基类:object

参数:

env (CodeTyper) – a CodeTyper

A tool for adding a CUDA code block in CodeTyper.code. It is helpful when we want to calculate by intermediate variables.

Here is an example:

from spikingjelly.activation_based.auto_cuda import base

ctyper = base.CodeTyper(4)
with base.CodeBlock(ctyper):
    ctyper.append('// swap x and y')
    ctyper.append('float temp_var = x;')
    ctyper.append('x = y;')
    ctyper.append('y = temp_var;')

print(ctyper.codes)

The outputs are:

{
 // swap x and y;
 float temp_var = x;
 x = y;
 y = temp_var;
}
spikingjelly.activation_based.auto_cuda.cfunction.wrap_return_codes(y: Optional[str], codes: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.float2half2(y: Optional[str], x: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.constant(y: Optional[str], x: float, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.abs(y: Optional[str], x: str, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.power(z: Optional[str], x: str, y: str, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.if_else(z: Optional[str], x: str, y: str, mask: str, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.if_else_else(w: Optional[str], x: str, y: str, z: str, mask_x: str, mask_y: str, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.greater_equal(z: Optional[str], x: str, y: str, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.greater_than(z: Optional[str], x: str, y: str, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.minimal(z: Optional[str], x: str, y: str, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.maximum(z: Optional[str], x: str, y: str, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.add(z: Optional[str], x: str, y: str, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.sub(z: Optional[str], x: str, y: str, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.mul(z: Optional[str], x: str, y: str, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.div(z: Optional[str], x: str, y: str, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.neg(y: Optional[str], x: str, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.heaviside(y: Optional[str], x: str, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.exp(y: Optional[str], x: str, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.sigmoid(y: Optional[str], x: str, alpha: float, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.sigmoid_backward(y: str, x: str, alpha: float, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.atan_backward(y: str, x: str, alpha: float, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.piecewise_leaky_relu_backward(y: str, x: str, w: float, c: float, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.s2nn_backward(y: str, x: str, alpha: float, beta: float, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.q_pseudo_spike_backward(y: str, x: str, alpha: float, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.leaky_k_relu_backward(y: str, x: str, leak: float, k: float, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.fake_numerical_gradient_backward(y: str, x: str, alpha: float, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.cfunction.log_tailed_relu_backward(y: str, x: str, alpha: float, dtype: str)[源代码]
spikingjelly.activation_based.auto_cuda.neuron_kernel.neuronal_hard_reset(v_next: str, h: str, spike: str, v_reset: str, dtype: str = 'float')[源代码]
spikingjelly.activation_based.auto_cuda.neuron_kernel.neuronal_soft_reset(v_next: str, h: str, spike: str, v_th: str, dtype: str = 'float')[源代码]
spikingjelly.activation_based.auto_cuda.neuron_kernel.neuronal_fire(spike: str, v: str, v_th: str, dtype: str = 'float')[源代码]
class spikingjelly.activation_based.auto_cuda.neuron_kernel.NeuronFPTTKernel(hard_reset: bool, dtype: str)[源代码]

基类:CKernel2D

neuronal_charge() str[源代码]
返回:

CUDA code

返回类型:

str

Returns CUDA code for calculating \(H[t] = f(X[t], V[t-1], ...)\).

This function should define how h_seq[t] is calculated by x_seq[t], v_v_seq[t] and other params if the neuron needs.

For example, the IF neuron define this function as:

def neuronal_charge(self) -> str:
    # note that v_v_seq[t] is v_seq[t - dt]
    return cfunction.add(z='h_seq[t]', x='x_seq[t]', y='v_v_seq[t]', dtype=self.dtype)
property core
class spikingjelly.activation_based.auto_cuda.neuron_kernel.NeuronBPTTKernel(surrogate_function: Callable, hard_reset: bool, detach_reset: bool, dtype: str)[源代码]

基类:CKernel2D

property pre_core
property post_core
grad_h_next_to_v() str[源代码]
返回:

CUDA code

返回类型:

str

Returns CUDA code for calculating \(\frac{\mathrm{d} H[t+1]}{\mathrm{d} V[t]}\).

This function should define how grad_h_next_to_v is calculated. Note that grad_h_next_to_v has not been declared. Thus, this function should also declare grad_h_next_to_v.

For example, the IF neuron define this function as:

def grad_h_next_to_v(self) -> str:
    return cfunction.constant(y=f'const {self.dtype} grad_h_next_to_v', x=1., dtype=self.dtype)
grad_h_to_x() str[源代码]
返回:

CUDA code

返回类型:

str

Returns CUDA code for calculating \(\frac{\mathrm{d} H[t]}{\mathrm{d} X[t]}\).

This function should define how grad_h_to_x is calculated. Note that grad_h_to_x has not been declared. Thus, this function should also declare grad_h_to_x.

For example, the IF neuron define this function as:

def grad_h_to_x(self) -> str:
    return cfunction.constant(y=f'const {self.dtype} grad_h_to_x', x=1., dtype=self.dtype)
property core
class spikingjelly.activation_based.auto_cuda.neuron_kernel.IFNodeFPTTKernel(hard_reset: bool, dtype: str)[源代码]

基类:NeuronFPTTKernel

neuronal_charge() str[源代码]
class spikingjelly.activation_based.auto_cuda.neuron_kernel.IFNodeBPTTKernel(surrogate_function: Callable, hard_reset: bool, detach_reset: bool, dtype: str)[源代码]

基类:NeuronBPTTKernel

grad_h_next_to_v() str[源代码]
grad_h_to_x() str[源代码]
spikingjelly.activation_based.auto_cuda.neuron_kernel.if_requires_grad(items: Iterable)[源代码]
spikingjelly.activation_based.auto_cuda.neuron_kernel.scalar_to_cupy(py_dict: dict, ref: str = 'x_seq')[源代码]
spikingjelly.activation_based.auto_cuda.neuron_kernel.new_tensors(news: tuple, py_dict: dict, ref: str = 'x_seq')[源代码]
class spikingjelly.activation_based.auto_cuda.neuron_kernel.NeuronATGFBase[源代码]

基类:object

static pre_forward(py_dict: dict)[源代码]
参数:

py_dict (dict) – a dict built from the neuron’s forward autograd function. It should at least contain x_seq, v_init, v_reset

返回:

requires_grad, blocks, threads, py_dict

requires_grad: bool

if any tensor in py_dict requires grad, then requires_grad = True;else requires_grad = False

blocks: int

CUDA param used in calling CUDA kernel

threads: int

CUDA param used in calling CUDA kernel. The default value is spikingjelly.configure.cuda_threads

py_dict: dict

Compared with the input py_dict, the returned py_dict will:

  • convert all float/int scalars in py_dict to cupy.ndarray

  • add h_seq, spike_seq, v_v_seq to py_dict. h_seq, spike_seq are zero tensors with the same shape with x_seq. v_v_seq is concatenated from v_init and v_seq, which is zero tensors with the same shape with x_seq

  • add N, numel to py_dict. Note that x_seq.shape = [T, N] and numel = T * N. A specific case is that x_seq.dtype == torch.half, then N = math.ceil(N / 2), and numel = N * x_seq.shape[0]. Note that N, numel in the returned py_dict are cupy.ndarray

返回类型:

tuple

static ctx_save(ctx, requires_grad: bool, *args, **kwargs)[源代码]
参数:
  • ctxctx in torch.autograd.Function

  • requires_grad (bool) – if any tensor in forward params requires grad

  • args – tensors that need to be saved by ctx.save_for_backward

  • kwargs – items that need to be saved by ctx.xx = xx

Saves *args, **kwargs in ctx by ctx.save_for_backward(*args) and ctx.xx = xx for all xx in kwargs.items().

static pre_backward(ctx, grad_spike_seq: Tensor, grad_v_seq: Tensor)[源代码]
参数:
返回:

backward_kernel, blocks, threads, py_dict

backward_kernel: NeuronBPTTKernel

The CUDA kernel used for backward. It should be provided in ctx.backward_kernel

blocks: int

CUDA param used in calling CUDA kernel. It should be provided in ctx.blocks

threads: int

CUDA param used in calling CUDA kernel. It should be provided in ctx.threads

返回类型:

tuple

class spikingjelly.activation_based.auto_cuda.neuron_kernel.IFNodeATGF(*args, **kwargs)[源代码]

基类:Function

static forward(ctx, x_seq: Tensor, v_init: Tensor, v_th: float, v_reset: Optional[float], forward_kernel: IFNodeFPTTKernel, backward_kernel: IFNodeBPTTKernel)[源代码]
static backward(ctx, grad_spike_seq: Tensor, grad_v_seq: Tensor)[源代码]
class spikingjelly.activation_based.auto_cuda.neuron_kernel.LIFNodeFPTTKernel(decay_input: bool, hard_reset: bool, dtype: str)[源代码]

基类:NeuronFPTTKernel

neuronal_charge() str[源代码]
class spikingjelly.activation_based.auto_cuda.neuron_kernel.LIFNodeBPTTKernel(decay_input: bool, surrogate_function: Callable, hard_reset: bool, detach_reset: bool, dtype: str)[源代码]

基类:NeuronBPTTKernel

grad_h_next_to_v() str[源代码]
grad_h_to_x() str[源代码]
class spikingjelly.activation_based.auto_cuda.neuron_kernel.LIFNodeATGF(*args, **kwargs)[源代码]

基类:Function

static forward(ctx, x_seq: Tensor, v_init: Tensor, v_th: float, v_reset: Optional[float], decay: float, forward_kernel: LIFNodeFPTTKernel, backward_kernel: LIFNodeBPTTKernel)[源代码]
static backward(ctx, grad_spike_seq: Tensor, grad_v_seq: Tensor)[源代码]
class spikingjelly.activation_based.auto_cuda.neuron_kernel.ParametricLIFNodeFPTTKernel(decay_input: bool, hard_reset: bool, dtype: str)[源代码]

基类:NeuronFPTTKernel

neuronal_charge() str[源代码]
class spikingjelly.activation_based.auto_cuda.neuron_kernel.ParametricLIFNodeBPTTKernel(decay_input: bool, surrogate_function: Callable, hard_reset: bool, detach_reset: bool, dtype: str)[源代码]

基类:NeuronBPTTKernel

grad_h_next_to_v() str[源代码]
grad_h_to_x() str[源代码]
property head
property pre_core
property core
property tail
class spikingjelly.activation_based.auto_cuda.neuron_kernel.ParametricLIFNodeATGF(*args, **kwargs)[源代码]

基类:Function

static forward(ctx, x_seq: Tensor, v_init: Tensor, v_th: float, v_reset: Optional[float], decay: Tensor, forward_kernel: ParametricLIFNodeFPTTKernel, backward_kernel: ParametricLIFNodeBPTTKernel)[源代码]
static backward(ctx, grad_spike_seq: Tensor, grad_v_seq: Tensor)[源代码]