LLVM Intermediate Representation (IR) is a form used to represent code in a compiler, that is designed to support analysis and transformations usually done in the optimising phase of a compiler. LLVM IR provide a good platform for implementation of numerous specialized code analysis tools, such as those we want to develop: program crash analysis with coredumps or backtraces, crash security analysis, and SELinux rule analysis.
We are implementing a service providing an IR representation of C/C++ binaries in Fedora operating system. To ease the implementation of analysers, the service will provide the IR binaries with all dynamic libaries statically linked. The binaries will contain metadata about code origin and DWARF debugging information. The only way how to run external code from the binary remain system calls, exec, and dlopen.
To provide as many C/C++ binaries as possible in the IR representation, a RPM spec file parser, which allow altering common build script and compilation invocations in RPM spec files, must be created. The altered build will use LLVM Clang compiler to generate unoptimized IR files.
First part of the project is about getting LLVM IR bitcode from source RPM packages.
sloccount, ohcount,
or cloc.The second part of the project is about providing statically-linked LLVM bitcode binaries to users.