Compiler {llvm} Lang} source code conversion example
How to obtain the source code of llvm project from git:
git clone https://github.com/llvm/llvm-project.git
After downloading the source code, enter the llvm project directory, including the following contents:
The llvm project / llvm directory includes the following contents:
CLANG actual combat
In practice, Clang is used to make its own compiler source to source source source code conversion
reference resources:
https://github.com/Ewenwan/llvm-clang-samples/blob/master/src_clang/tooling_sample.cpp
void foo(int* a, int *b) {
if (a[0] > 1)
{
b[0] = 2;
}
}
void bar(float x, float y); // just a declaration
Automatically add comments
// Begin function foo returning void
void foo(int* a, int *b) {
if (a[0] > 1) // the 'if' part
{
b[0] = 2;
}
}
// End function foo
void bar(float x, float y); // just a declaration
LLVM practice
Function signature
The function signature in C language consists of the following parts:
- Return type
- Function name
- Number and type of parameters
such as
int add(int a, int b) {
return a + b;
}
The function signature of the add function in this C program code is int add(int, int)
C program code to be processed
#include <stdio.h>
#include <stdlib.h>
void keep() {
printf("\n");
}
int add(int a, int b) {
return a + b;
}
int* getArr(int n) {
return (int*)malloc(sizeof(int) * n);
}
int main(int argc, char** argv) {
return 0;
}
The result of project operation is
Four functions including the main function are defined in the code to be processed, but the final result is six functions. This is because the printf function and malloc function in the C standard library are called, and the compiler adds the declarations of these two functions to the code in the preprocessing stage.
Another noteworthy point is that, unlike int, char and other types in C language, the types in the printed function signature are i32 and i8. In fact, this is because we first need to convert the C program code to be processed into LLVM IR bytecode, and then use the custom LLVM project to process it. In addition, the printed type is actually the type of LLVM IR, long corresponds to i64, float corresponds to f32, and double corresponds to f64. However, LLVM ir} void and pointer are the same as C language.
Function signature
The function signature in C language consists of the following parts:
Return type function name (number of parameters and parameter type)
// This program inputs the llvm IR file and outputs the function signature in IR
// The input IR file can be compiled by Lang
// For example, clang - emit llvm - C test c -o test. bc // test. C is the test procedure
// This program compilation command
// clang++ $(llvm-config --cxxflags --ldflags --libs) main.cpp -o main
// Run program
// ./main test.bc
// Import related LLVM header files
#include <llvm/IR/LLVMContext.h>
#include <llvm/IR/Function.h>
#include <llvm/IR/Module.h>
#include <llvm/IRReader/IRReader.h>
#include <llvm/Support/SourceMgr.h>
#include <llvm/Support/CommandLine.h>
using namespace llvm;
// LLVM context global variable
static ManagedStatic<LLVMContext> GlobalContext;
// Command line location parameter global variable, which means the file name of LLVM IR bytecode to be processed
static cl::opt<std::string> InputFilename(cl::Positional, cl::desc("<filename>.bc"), cl::Required);
int main(int argc, char **argv) {
// Diagnostic example
SMDiagnostic Err;
// Format command line parameters,
cl::ParseCommandLineOptions(argc, argv);
// Read and format LLVM IR bytecode file and return LLVM Module(Module is the top-level container of LLVM IR)
std::unique_ptr<Module> M = parseIRFile(InputFilename, Err, *GlobalContext);
// error handling
if (!M) {
Err.print(argv[0], errs());
return 1;
}
// Traverse each Function in the Module
for (Function &F:*M) { // The c + + syntax scope for F is a reference to each function in the IR module
// Filter out those to llvm Independent function at the beginning
if (!F.isIntrinsic()) {
// Print function return type
outs() << *(F.getReturnType());
// Print function name
outs() << ' ' << F.getName() << '('; // The function name may be different from that in the c file (with some attribute descriptions)
// Traverse each parameter of the function
for (Function::arg_iterator it = F.arg_begin(), ie = F.arg_end(); it != ie; it++) {
// Print parameter type
outs() << *(it->getType());
if (it != ie - 1) {
outs() << ", ";
}
}
outs() << ")\n";
}
}
}
Project compilation run
Before compiling the project, you need to confirm the compilation and running environment:
Operating system: Ubuntu 18.04 64 bit
LLVM version: 9.0.0
C program code file to be processed: test c
Project code file: main cpp
Then obtain the LLVM IR bytecode of the C program code to be processed
clang -emit-llvm -c test.c -o test.bc
Recompile project code
clang++ $(llvm-config --cxxflags --ldflags --libs) main.cpp -o main
Finally, the results shown above are obtained
./main test.bc
Reference link:
https://www.freesion.com/article/3548547366/
https://www.freesion.com/article/4240352588/
https://zhuanlan.zhihu.com/p/102270840
https://github.com/Ewenwan/llvm-clang-samples/blob/master/src_clang/tooling_sample.cpp