Compiler, LLVM, Machine Learning, Software

ML Compilers Part 1: High-Level Intermediate Representation

High-level intermediate representations (IRs) play a crucial role in machine learning (ML) compilers, enabling optimization and code generation for various hardware platforms. These IRs provide an abstract representation of the machine learning models that can be transformed and optimized before being compiled into executable code. In this blog post, I will discuss the design objectives, and the type of high-level IRs used by popular, real-world ML compilers.

Workflow of ML compilers

Content

    1. Design Objectives
    2. Types of High-level IR
    3. Examples of High-level IR
      1. Relay (TVM)
      2. Torch.Fx

Design Objectives for High-Level IR

Optimizations

Optimizing the input ML model to enhance its runtime efficiency and reduce resource consumption is one of the primary objective of IRs. Unlike low-level compiler IRs which are closer to the hardware, high-level IRs are hardware-agnostic and provides a much-needed abstraction required for common transformations like kernel specialization, layout transformations, operator fusion to minimize memory access, shape inference for memory allocation optimization, and quantization to reduce numerical precision.

Ease of Transformations

It should be easy to transform the user-written ML model to the high-level IR.  Sometimes translating to the high-level IR is as simple as one-to-one operator translation, but this process can be more tricky. For example, Glow, a ML compiler developed by Meta, uses automatic code generation techniques (https://github.com/pytorch/glow/blob/master/docs/ClassGen.md) for generating Nodes in their High-level IR based on the input Ml model.

Support Dynamic Graphs

Supporting dynamic graphs or dynamic ML models in compilers is important because many real-world applications and scenarios require models with dynamic behaviors that can adapt to varying input sizes, shapes, or conditions. Dynamic graphs enable models to handle inputs of different lengths, dimensions, or resolutions without requiring extensive preprocessing or fixed-size padding.

Expressiveness

The high-level IR should be expressive enough to represent not only standard ML operations like Convolution or GEMM, but also user-defined, custom ML operators. Also, the ease with which the IR can be extended for user-defined operators is crucial for determining the usability of the ML compiler.

Types of High-level IRs

Types of High-level ML compiler IRs

DAG-based Symbolic or Imperative IR

DAG (Directed Acyclic Graph) is a graph structure where nodes represent operations or computations, and edges indicate data dependencies between these operations. A DAG-based IR represents a program’s computation flow using this graph structure and it can be either static, established at compile-time and unchanging during execution, or dynamic, evolving during runtime based on program behavior. Dynamic DAG-based representations are commonly used in just-in-time (JIT) compilation systems and some dynamic programming languages. They can enable more adaptive and context-sensitive optimizations, as the IR can be tailored to the specific data and execution paths encountered during runtime.

Additionally, DAG-based IRs can be both symbolic or imperative: A symbolic representation focuses on capturing the relationships and semantics of computations using symbolic expressions. On the other hand, an imperative DAG representation focuses on capturing the actual sequence of operations and their effects on data. It represents computations as a sequence of instructions that are executed step-by-step. MXNet provides both Symbolic and Imperative DAG based IRs.

Here’s a really good article on the tradeoffs between symbolic and imperative IR: https://mxnet.apache.org/versions/1.9.1/api/architecture/program_model

The simplicity of DAG-based IRs makes it easier to implement automatic differentiation and compile for heterogeneous execution environments (e.g., executing parts of the graph on specialized hardware).

Examples of DAG-based IR: Torch.Fx, XLA HLO, ONNX Computation Graph, Glow High-level IR, DLVM.

Let-binding based

Let-binding is a mechanism used in programming languages to associate a name (variable) with a value or expression. It allows you to introduce local variables within a scope, making it easier to reuse and manage values in an organized manner. Let-binding is commonly found in functional programming languages and plays a role in creating modular and readable code. In a let-binding IR, the compiler determines all the outcomes of the variables within the let expression and assembles a map of these variables. When a specific outcome is needed, the compiler consults this map to determine the result of the expression.

This approach enhances code organization and comprehension, facilitates reuse of intermediate computations, reduces redundancy, and simplifies complex expressions. By encapsulating intermediate values within a local scope, let-bindings enable efficient optimization opportunities, such as common subexpression elimination (CSE) and constant folding.

Let-binding IR is specifically useful for determining the scope of computation when the ML operator is control-flow heavy or when it contains closures. Example:  https://tvm.apache.org/docs/arch/relay_intro.html#why-we-might-need-let-binding

Examples of Let-binding IRs: MLTon, SML/NJ

Domain-Specific or Hybrid IRs

Hybrid IR like Relay (used in TVM) incorporates the benefits of both the DAG-based and let-binding IRs. Moreover, the TC (Tensor Comprehensions) ML compiler uses a domain-specific IR that is tailored to tensor computations. While TC’s IR is not explicitly classified as DAG-based or let-binding based, it has unique characteristics that are specific to tensor computations. The TC IR can be described as a combination of domain-specific abstractions and mathematical expressions that capture tensor operations and their relationships, enabling high-level optimization and efficient code generation for GPUs and other accelerators.

Examples of High-level IRs

Relay (TVM)

Relay is a high-level IR used by the Apache TVM compiler. Contrary to traditional high-level IRs, relay takes a PL (programming language) approach to designing an expressive, concise, and easy-to-optimize IR. Relay choose to support both the dataflow form and let bindings, thus it can be used by folks coming from the dataflow background and also by those familiar with functional programming and A-normal forms.

Here’s how Relay looks like:

# A Relay variable with a different shape.
x = relay.var('x', shape=(10, 1))
w = relay.op.add(x, x)
print(w)
z = x + x
print(z)
f = relay.Function([x], z)
print(f)

####### OUTPUTS ##########
---- # print(w)
free_var %x: Tensor[(10, 1), float32]
add(%x, %x)
---- # print(z)
free_var %x: Tensor[(10, 1), float32]
add(%x, %x)
---- # print(f)
fn (%x: Tensor[(10, 1), float32]) {
  add(%x, %x)
}

Torch.Fx

Torch.Fx is a symbolic, DAG-based IR where programs are represented as a Graph object which further contains a linear series of Node objects representing operations. Nodes have a string opcode, describing what type of operation the Node represents. Data dependencies between nodes are represented as edges between nodes.

Here’s how Torch.Fx IR looks like:

Torch.Fx IR

Conclusion

To conclude, in this blog post, we looked at the design objectives, and types of high-level IRs used by ML compilers. Different types of IRs offers different trade-offs between expressivity, optimizations, and ease of transformations and usage.

References:

[1] Xing, Yu, et al. “An in-depth comparison of compilers for deep neural networks on hardware.” 2019 IEEE International Conference on Embedded Software and Systems (ICESS). IEEE, 2019.
[2] Li, Mingzhen, et al. “The deep learning compiler: A comprehensive survey.” IEEE Transactions on Parallel and Distributed Systems 32.3 (2020): 708-727.

Tagged ,

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.