Compiler, LLVM, Pass, Software

Writing Your First LLVM Pass and Registering it in the Clang Toolchain

There are several detailed tutorials[1] on writing an LLVM pass so that I won’t cover it in much detail. However, as of today(May 2020), there is no detailed guide on registering an LLVM pass within the OPT and Clang toolchains. So, this post will be mainly regarding that.

Content:

  • Introduction
  • Types of LLVM passes
  • Writing a basic function pass in LLVM
  • Registering a pass within the
    • OPT toolchain
    • Clang toolchain

Introduction

LLVM is an extremely modular compiler infrastructure that provides back-end tools for code optimization and transformation. It works on an intermediate representation called LLVM IR. Clang, on the other hand, provides front-end tools for converting programs written in C, C++, and C# to LLVM IR. It then calls the LLVM toolchains for optimization and transformation(“lowering”) from LLVM IR to architecture-dependent machine code. Different code optimizations and transformations in LLVM are organized in the form of passes, and LLVM’s pass manager determines the order in which these passes are to be executed. That being said, we will now look at different types of LLVM passes and a basic example of an LLVM function pass.

Note:- LLVM IR is in the SSA format. However, it is not a strict SSA. Strict SSA requires that every variable is assigned only once, however, LLVM IR – just like any other SSA-based compiler IR – doesn’t need memory(heap-based) variables to follow SSA format. This is because, at compile-time, it isn’t always possible to determine the memory/heap location that will be affected by a particular statement. There is an LLVM pass called mem2reg that tries to promote memory reference to register references to bring LLVM IR in a more “stricter” SSA format. Check these: [2][3]

Types of LLVM passes

Following are various types of LLVM passes:

Module Pass

This category of LLVM passes is used for whole-program analysis, transformations, and optimizations as it considers the entire program as one single unit. What I mean by this is that module passes are called exactly once for a given program.

Function Pass

This type of LLVM passes is used for function-level analysis, optimizations, and transformations. It considers a function as a unit, and a function pass is called once for every program function.

CallGraphSCCPass

This type of pass is used to traverse the class graph in a bottom-up manner. Now since call graph can have cycles, we need to identify Strongly connected cycles(SCC) and then traverse SCC’s in topologically reverse order. That’s why this pass is called CallGraphSCCPass.

Loop Pass

Loop pass iterates the loop one at a time, irrespective of other loops in a function. In the case of nested loops, inner-most loops are iterated first.

Region Pass

A region is defined as a single entry, the single exit part of a function. Thus, a region pass traverses the regions one by one, with the outermost region executed at last.

Machine Function Pass

Being similar to the function pass, machine function pass also traverses one function at a time. The only difference is that in this case, functions are in their machine-dependent representation.

Writing a basic function pass in LLVM

AIM: Write a basic function pass that prints the corresponding function name. It should accept an option, passed via clang CLI interface, to select whether or not to print the function name. Moreover, it should only run with optimization level O2.

To write a function pass, we need to overload the runOnFunction() function. Consider the following code:

#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/Debug.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"
#include "llvm/IR/Instruction.h"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"
#include "llvm/IR/Module.h"
#include "llvm/IRReader/IRReader.h"
#include "llvm/Support/SourceMgr.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/InitializePasses.h"
#include "llvm/CodeGen/Passes.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetOptions.h"

namespace llvm{
  // opt option to disable NamePrinter by default/
  static cl::opt<bool> EnableNamePrinter("enable-name-printer", cl::init(false), cl::Hidden);
}

using namespace llvm;

namespace{
struct NamePrinter : public FunctionPass {

static char ID;

  // This is our pass running on function F.
  bool runOnFunction(Function &F) override {
     
     // Check whether the enable-name-printer option is passed via clang CLI or not
     if(!llvm::EnableNamePrinter) return false;
     
     errs()<<"In Name Printer Pass and this function is: "<<F.getName()<<"\n"

  // Return true only if this pass made any change in the IR, in this case, it does not.
  return false; 
  }

  void getAnalysisUsage(AnalysisUsage &AU) const {
  // use this function to add pass dependencies. For example, if we want this
  // pass to run after the StackProtector pass then add the following statement:
  //    AU.addRequired<StackProtector>();
  }

}; // end of struct NameFinder

}  // end of anonymous namespace

using namespace llvm;

// Register your pass with the LLVM pass manager
NamePrinter::NamePrinter() : FunctionPass(ID) {
  llvm::initializeNamePrinterPass(*PassRegistry::getPassRegistry());
}

char NamePrinter::ID = 0;

INITIALIZE_PASS_BEGIN(NamePrinter, "NamePrinter", " This pass print the function name, if enable-name-printer option is mentioned with -O2",
                                             false /* Only looks at CFG */,
                                                                          false /* Analysis Pass */)
// Add pass dependencies here:
INITIALIZE_PASS_DEPENDENCY(PromoteLegacyPass)

INITIALIZE_PASS_END(NamePrinter, "NamePrinter", " This pass print the function name, if enable-name-printer option is mentioned with -O2",
                    false /* Only looks at CFG */, false /* Analysis Pass */)

FunctionPass *llvm::createNamePrinterPass() {return new NamePrinter(); }

Put this code under llvm/lib/CodeGen/NamePrinter.cpp. To compile the above code, add the NamePrinter.cpp file to the MakeFile in llvm/lib/CodeGen folder.

Registering the NamePrinter pass with OPT and Clang toolchains

Follow these below mentioned steps. Note that for our NamePrinter pass, order in which passes are executed insn’t important. But in other cases, you might need to explictly take care of the exect location within the function(mentioned below) where you will put pass registeration statement.

  • add initializeNamePrinterPass(Registry)  to llvm/lib/CodeGen/CodeGen.cpp
  • add addPass(createNamePrinterPass());  inside TargetPassConfig::addISelPrepare() to llvm/lib/CodeGen/TargetPassConfig.cpp
  • add (void) llvm::createNamePrinterPass();  line inside ForcePassLinking() within llvm/include/llvm/LinkAllPasses.h
  • add void initializeNamePrinterPass(PassRegistry&);  in llvm/include/llvm/InitializePasses.h
  • addFunctionPass *createNamePrinterPass();  within llvm/include/llvm/CodeGen/Passes.h
  • add initializeNamePrinterPass(Registry);  in main() within llvm/tools/opt/opt.cpp
  • add if (OptLevel == 2) FPM.add(createNamePrinterPass());  in PassManagerBuilder::populateFunctionPassManager() within llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

That’s it! You are now ready to run your pass with clang and OPT. In order to pass an option to your LLVM pass, via clang CLI, use -mllvm -enable-name-printer

 

Feel free to comment below, if you are haveing any problem following the above steps.

References

  1. https://llvm.org/docs/WritingAnLLVMPass.html
  2. https://stackoverflow.com/questions/9791528/why-optimizations-passes-doesnt-work-without-mem2reg
  3. https://wiki.aalto.fi/display/t1065450/LLVM+SSA
  4. https://llvm.org/docs/WritingAnLLVMPass.html#the-machinefunctionpass-class
  5. https://eli.thegreenplace.net/2013/09/16/analyzing-function-cfgs-with-llvm

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.