Compiler llvm clang source code conversion example

Posted by utpal on Mon, 03 Jan 2022 09:23:35 +0100

Compiler {llvm} Lang} source code conversion example

How to obtain the source code of llvm project from git:

git clone https://github.com/llvm/llvm-project.git

After downloading the source code, enter the llvm project directory, including the following contents:

 

 

The llvm project / llvm directory includes the following contents:

 

 

 

CLANG actual combat

In practice, Clang is used to make its own compiler source to source source source code conversion

reference resources:

https://github.com/Ewenwan/llvm-clang-samples/blob/master/src_clang/tooling_sample.cpp

void foo(int* a, int *b) {

  if (a[0] > 1)

  {

    b[0] = 2;

  }

}

void bar(float x, float y); // just a declaration

Automatically add comments

// Begin function foo returning void

void foo(int* a, int *b) {

  if (a[0] > 1) // the 'if' part

  {

    b[0] = 2;

  }

}

// End function foo

void bar(float x, float y); // just a declaration

LLVM practice

Function signature

The function signature in C language consists of the following parts:

  • Return type
  • Function name
  • Number and type of parameters

such as

int add(int a, int b) {
    return a + b;
}

The function signature of the add function in this C program code is int add(int, int)

C program code to be processed

#include <stdio.h>
#include <stdlib.h>
 
void keep() {
    printf("\n");
}
 
int add(int a, int b) {
    return a + b;
}
 
int* getArr(int n) {
    return (int*)malloc(sizeof(int) * n);
}
 
int main(int argc, char** argv) {
    return 0;
}

The result of project operation is

 

 

Four functions including the main function are defined in the code to be processed, but the final result is six functions. This is because the printf function and malloc function in the C standard library are called, and the compiler adds the declarations of these two functions to the code in the preprocessing stage.

Another noteworthy point is that, unlike int, char and other types in C language, the types in the printed function signature are i32 and i8. In fact, this is because we first need to convert the C program code to be processed into LLVM IR bytecode, and then use the custom LLVM project to process it. In addition, the printed type is actually the type of LLVM IR, long corresponds to i64, float corresponds to f32, and double corresponds to f64. However, LLVM ir} void and pointer are the same as C language.

Function signature

The function signature in C language consists of the following parts:

Return type function name (number of parameters and parameter type)

// This program inputs the llvm IR file and outputs the function signature in IR
// The input IR file can be compiled by Lang
// For example, clang - emit llvm - C test c -o test. bc // test. C is the test procedure
// This program compilation command
// clang++ $(llvm-config --cxxflags --ldflags --libs) main.cpp -o main
// Run program
// ./main test.bc
 
// Import related LLVM header files
#include <llvm/IR/LLVMContext.h>
#include <llvm/IR/Function.h>
#include <llvm/IR/Module.h>
#include <llvm/IRReader/IRReader.h>
#include <llvm/Support/SourceMgr.h>
#include <llvm/Support/CommandLine.h>
 
using namespace llvm;
 
// LLVM context global variable
static ManagedStatic<LLVMContext> GlobalContext;
 
// Command line location parameter global variable, which means the file name of LLVM IR bytecode to be processed
static cl::opt<std::string> InputFilename(cl::Positional, cl::desc("<filename>.bc"), cl::Required);
 
int main(int argc, char **argv) {
    // Diagnostic example
    SMDiagnostic Err;
    // Format command line parameters,
    cl::ParseCommandLineOptions(argc, argv);
    // Read and format LLVM IR bytecode file and return LLVM Module(Module is the top-level container of LLVM IR)
    std::unique_ptr<Module> M = parseIRFile(InputFilename, Err, *GlobalContext);
    // error handling
    if (!M) {
        Err.print(argv[0], errs());
        return 1;
    }
    // Traverse each Function in the Module
    for (Function &F:*M) { // The c + + syntax scope for F is a reference to each function in the IR module
        // Filter out those to llvm Independent function at the beginning
        if (!F.isIntrinsic()) {
            // Print function return type
            outs() << *(F.getReturnType());
            // Print function name
            outs() << ' ' << F.getName() << '('; // The function name may be different from that in the c file (with some attribute descriptions)
            // Traverse each parameter of the function
            for (Function::arg_iterator it = F.arg_begin(), ie = F.arg_end(); it != ie; it++) {
                // Print parameter type
                outs() << *(it->getType());
                if (it != ie - 1) {
                    outs() << ", ";
                }
            }
            outs() << ")\n";
        }
    }
}

Project compilation run

Before compiling the project, you need to confirm the compilation and running environment:

Ÿ Operating system: Ubuntu 18.04 64 bit

Ÿ LLVM version: 9.0.0

Ÿ C program code file to be processed: test c

Ÿ Project code file: main cpp

Then obtain the LLVM IR bytecode of the C program code to be processed

clang -emit-llvm -c test.c -o test.bc

Recompile project code

clang++ $(llvm-config --cxxflags --ldflags --libs) main.cpp -o main

Finally, the results shown above are obtained

./main test.bc

 

Reference link:

https://www.freesion.com/article/3548547366/

https://www.freesion.com/article/4240352588/

https://zhuanlan.zhihu.com/p/102270840

https://github.com/Ewenwan/llvm-clang-samples/blob/master/src_clang/tooling_sample.cpp