topi 是 tvm 的一个张量算子库，提供了很多常见的算子操作，例如 conv2D，transpose，等等。

tvm 在编译计算图的时候，会先将计算图从 relay ir 翻译成 tir，再将 tir 翻译成目标设备代码（比如llvm，比如c，比如cuda），但是 relay ir 和 tir 之间还有一种中间语言叫 tensor expression，这是一种专门为张量计算而设计的语言。relay ir 的绝大多数部分 op 其实都是通过 tensor expression 这种语言来实现的

https://tvm.apache.org/docs/reference/langref/relay_expr.html

An operator is a primitive operation, such as add or conv2d, not defined in the Relay language. Operators are declared in the global operator registry in C++. Many common operators are backed by TVM’s Tensor Operator Inventory.

topi 和 te 的关系：

te( tensor expression ) 是一门函数式编程语言，面向张量计算的
topi 是基于 te 实现的张量算子库

类似于编程语言和标准库之间的关系

1. topi 算子列表

源码目录：src/topi

src/topi 只是把算子通过 TVM_REGISTER_GLOBAL 注册到 tvm 的全局函数中，并没有太多细节

函数的实现，全部是以 inline 函数的形式，实现在 include/tvm/topi 中，由于算子都是通过 tensor expression 来实现的，因此所有算子的实现最终都会调用 tvm::te::compute 来进行计算

最常见的有4类操作：broadcast，elemwise，nn，reduction

broadcast	elemwise	nn	reduction
topi.add topi.subtract topi.multiply topi.divide topi.floor_divide topi.mod topi.floor_mod topi.maximum topi.minimum topi.power topi.left_shift topi.logical_and topi.logical_or topi.logical_xor topi.bitwise_and topi.bitwise_or topi.bitwise_xor topi.right_shift topi.greater topi.less topi.equal topi.not_equal topi.greater_equal topi.less_equal topi.broadcast_to	topi.acos topi.acosh topi.asin topi.asinh topi.atanh topi.exp topi.fast_exp topi.erf topi.fast_erf topi.tan topi.cos topi.cosh topi.sin topi.sinh topi.tanh topi.fast_tanh topi.atan topi.sigmoid topi.sqrt topi.rsqrt topi.log topi.log2 topi.log10 topi.identity topi.negative topi.clip topi.cast topi.reinterpret topi.elemwise_sum topi.sign topi.full topi.full_like topi.logical_not topi.bitwise_not	topi.nn.relu topi.nn.leaky_relu topi.nn.prelu topi.nn.pad topi.nn.space_to_batch_nd topi.nn.batch_to_space_nd topi.nn.nll_loss topi.nn.dense topi.nn.bias_add topi.nn.dilate topi.nn.flatten topi.nn.scale_shift_nchw topi.nn.scale_shift_nhwc topi.nn.pool_grad topi.nn.global_pool topi.nn.adaptive_pool topi.nn.adaptive_pool3d topi.nn.pool1d topi.nn.pool2d topi.nn.pool3d topi.nn.softmax topi.nn.log_softmax topi.nn.lrn topi.nn.binarize_pack topi.nn.binary_dense	topi.sum topi.min topi.max topi.argmin topi.argmax topi.prod topi.all topi.any

2. topi 编程示例

由于 topi 实现的所有算子，最终都会注册到 tvm 的全局函数中，并已 relay.op 的形式存在

因此，我们在描述计算图的时候，可以直接调用相关的算子，相当于我们平时写 c 代码的时候，有很多编译器的 builtin 函数一样

比如 topi.add 对应的 relay.op 是 relay.add，其他类推

a1 = relay.var("a1", shape=(1,), dtype="float32")
c1 = relay.const(10, 'float32')
c2 = relay.add(c1, a1)

和算子融合一样，常量折叠是编译领域里最常见的一个优化，简单来说，就是把常量表达式前置计算，在编译阶段就计算好，然后以常量的形式翻译成底层机器码，以提高执行效率，减少计算量

实际上大部分的编译器，常量折叠一般包含2种优化技术：常量折叠和常量传播

1. 基本概念

1.1. 常量折叠

constant folding，常量折叠，编译器优化技术之一，通过对编译时常量或常量表达式进行计算来简化代码。以下面的代码为例：

i = 320 * 200 * 32;

上面的代码中，编译器通常会在编译过程中直接对表达式进行求值，计算出320 * 200 * 32的结果，而不会生成2个乘法指令。

还有一些更复杂（但不清楚tvm是否支持，后面验证下）。比如，在执行一些复杂表达式的计算时，我们可以将表达式内部一些常量运算合并，最终起到简化的效果，如下

优化前（左边）每个表达式的运算量是 8 flop，优化后（右边）的运算量是 2 flop，运算效率极大提升了

不过，实际上 tvm 并没有做的那么高级，tvm 的常亮折叠只用来处理一些比较简单的场景

1.2. 常量传播

constant propagation，常量传播，同样也是编译器最常见的优化技术之一，在编译的过程中，对常量依赖做一些基本的推演和提前计算，再使用常量折叠技术来简化代码。如下：

int x = 14;
int y = 7 - x / 2;
return y * (28 / x + 2);

//常量传播

int x = 14;
int y = 7 - 14 / 2;
return y * (28 / 14 + 2);

//常量折叠

int x = 14;
int y = 0;
return 0;

一	二	三	四	五	六	日
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

成功，源于对美学的执著追求

每日归档： 2023年6月10日

深入浅出 tvm – (15) TVM Operator Inventory (TOPI)

1. topi 算子列表

2. topi 编程示例

深入浅出 tvm – (13) Relay Pass 之常量折叠

1. 基本概念

1.1. 常量折叠

1.2. 常量传播