Software instrumentation with LLVM (total 30 points)

In this assignment, you will set-up LLVM environment and write passes for static and dynamic program profiling.

Preliminary

For the assignment, you will work for the MyLLVMPass module to create LLVM pass that examines and transforms your input source code. To confirm the correct operation of your pass, you will be given simple test programs. Write your passes following the instructions below and confirm correctness of your passes before submission using test programs.

Setting up LLVM environment

As a preparation, you need to install necessary LLVM components (LLVM, Clang, and MyLLVMPass) to your environment (Ubuntu Linux system recommended). Download installation package and follow the instruction from LLVM Quick Guide.docx. The files are also posted on eLearning website.

Running skeleton module pass

After completing the installation and compilation of LLVM, Clang, and skeleton code for module pass (MyLLVMPass), you are now ready to modify code for MyLLVMPass (/src/lib/Transforms/MyLLVMPass/MyLLVMPass.cpp).

To confirm the installation and its operation, from runOnModule() add the following line and build.

...
    virtual bool runOnModule(Module &M) {

        bool changed = false;

        errs() << "Hello! I am in the module" << "\n";  // Test code

        return changed;
    }
...

Note: errs() is used for printing debug outputs to standard error.

Then you will find the updated shared object (MyLLVMPass.so) is generated under ./build/Release+Asserts/lib/MyLLVMPass.so. For any give input LLVM bitcode, you can run the following shell script to confirm our modified module outputs the message.

$ ./compile_myllvmpass.sh
-------------------------------------
llvm[0]: Compiling MyLLVMPass.cpp for Release+Asserts build (PIC)
llvm[0]: Linking Release+Asserts Loadable Module MyLLVMPass.so
-------------------------------------
-rwxrwxr-x 1 kjee kjee 14384 Nov  7 16:44 ./build/Release+Asserts/lib/MyLLVMPass.so
-------------------------------------
$ ./run_myllvmpass.sh input.bc output.bc
Hello! I am in the module

In case your pass includes any code transformation, the script stores transformed output to the second argument (in this case, output.bc). However, since our example code prints out a message not performing any code transformation, output bitcode will be identical to its input bitcocde.

Now, you see the hello message, you are ready to add you code and create your own pass!

[Part 1] Static function call profiling (10 points)

From this part of the assignment, you will statically iterate over functions in a module, basic blocks in a function, then LLVM IRs in a basic block to write static function call profiler. As we covered from the class, module pass allows you to iterates over functions, basic blocks ,and LLVM IRs.

You pass will statically work through LLVM functions and IRs to find caller and callee relationship between functions. To achieve this, from a function, you need to locate callInst IR and extract its callsite.

// build 
// clang -O0 -emit-llvm -o foobar.bc -c foobar.c

#include <stdio.h>

int foo() {
    printf("im in foo\n");
    return 1;
}

int foobar() {
    printf("im in foobar\n");
    return 2;
}

int bar(int i) {
    printf("im in bar\n");
    if (i)  {
        int b = foo();
        return 1 + b;
    } else {
        int c = foobar();
        return 2 + c;
    }
}

int main() {
    int d;
    printf("input: ");
    scanf("%d", &d);
    printf("%d\n", bar(d));
    return 0;
}
$ ./run_myllvmpass.sh foobar.bc foobar2.bc

foo --> printf (@call)
foobar --> printf (@call)
bar --> printf (@call)
bar --> foo (@call1)
bar --> foobar (@call2)
main --> printf (@call)
main --> __isoc99_scanf (@call1)
main --> bar (@call2)
main --> printf (@call3)

NOTE:

[Part 2] Dynamic function call profiling (10 points)

In this part of the assignment, you will dynamically generate function call trace output using code instrumentation. For a given source code (or bitcode), you will insert LLVM instructions that would call log_msg() function, already included in the original program source.

void log_msg(char* msg); // For part-2 

int main() {
...
}

void log_msg(char* msg) {
    fprintf(stderr, "%s", msg);
}

First, you need to construct a string that represents caller and callee relation similar to part-1. Say, “func1 –> func2”. Before for each call-site where CallInst IR exists, you will insert another function call to log_msg function passing the string as an argument. Again, the source (or bitcode) already includes log_msg(). Thus you only need to find it and call.

lease note that you do not want to instrument functions calls inside log_msg(char* msg) . Or you will fall into a situation where log_main() recursively called from its inside, eventually overflow the call stack and crash the program.

For the above test program, you can run the output bitcode using lli command. As the log_msg() function outputs its message to standard error, you want to redirect standard error output to a separate file and see the result.

./run_myllvmpass.sh foobar.bc foobar2.bc
$ lli ~/foobar2.bc 2> /tmp/errs
$ cat /tmp/errs
main --> printf (@call)
main --> __isoc99_scanf (@call1)
main --> bar (@call2)
bar --> printf (@call)
bar --> foo (@call1)
foo --> printf (@call)
main --> printf (@call3)

While the above output still similar to the part-1’s result, you only see the subset of function calls that actually invoked for a given input. Calls from and to are function foobar() are not included in this output.

[Part 3] Function call counter (10 points)

In this part of the assignment, you will add a global variable that keeps track of the total number of CallInst instructions invoked at runtime. To achieve this, you will create a counter as a global variable (initialized with zero) and insert a few LLVM IRs to increase the value of the global variable before each CallInst.

While the global variable will be maintained and accumulated during the program execution, its value will be printed out by calling log_call_count(int cnt) function before the main() function returns. Please note that the following helper functions (log_msg(), log_call_count()) already included to the source code for you.

void log_msg(char* msg);
void log_call_count(int cnt);

int main() {
...
}

void log_msg(char* msg) {
    fprintf(stderr, "%s", msg);
}

void log_call_count(int cnt) {
    fprintf(stderr, "CallInst called %d times\n", cnt);
}

The example output of the assignment will be like below.

$ lli ~/foobar2.bc 2> /tmp/errs
$ cat /tmp/errs
main --> printf (@call)
....
CallInst called 7 times

NOTES: In this part of assignment, you need to add various LLVM IRs operations for global variable, constant value and so on. While the standard reference is always the best place to find such information, as a shortcut, you can refer to llc -march output when you want to create a specific the LLVM IR sequence to instrument (e.g., Global variable declaration, function calls).

Outputs to submit