Introduction to C++ Compilation on the Command Line

April 9, 2021

·

C++ Build Process

There may come a day where technology has advanced to the point that we never need to worry about which flags are set for compilation, or which files are being included. But it is not this day.

When using Unreal Engine 4 or another high level framework, the compilation of your C++ code is largely hidden from view. This is a major convenience when it works, but it becomes an obstacle when you need to change anything. The only way to avoid becoming slaves to our creations is to understand how they work in the first place.

With that in mind, this article will show how to use the command line to compile (very rudimentary) C++ code. This will give us a better understanding of the compilation process and the ways in which it can be configured. Subject to interest and time, I may eventually make this into a longer series about build systems and tooling. For now, let's get into it.

Know Your Compiler

A compiler transforms your C++ code into binary machine instructions and addresses that can be understood by your computer. It often fades into the background, but make no mistake: the compiler is the most important tool in your toolbox. Indeed, a compiler and a minimal text editor are all you need to write game-changing software. Other tools that are built on top (build systems, IDEs, etc.) are just there to make the drafting and compilation process more pleasant.

There are several compilers in active use today. However, the most common, by platform, are:

  • Visual C++ (aka MSVC) for Windows
  • GNU Compiler Collection (aka GCC) for Linux
  • Clang for Mac

This article will focus on MSVC and GCC, but the principles should translate well to Clang and other compilers.

MSVC

MSVC is Microsoft's flagship compiler. It was originally a standalone product, but it has since been folded into the Visual Studio package. To obtain a free copy, just install the latest version of Visual Studio and the compiler will be included. The compiler, named cl.exe, will be located in the install directory, somewhere similar to the following:

C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/{Version}/bin/{Host}/{Host}/cl.exe

You can also easily find it by opening a Developer Command Prompt, also included with Visual Studio, and entering the following command:

where cl

This works in the Developer Command Prompt because it has certain PATH entries, such as the binary directory for MSVC, that are not set in the generic Command Prompt. In other words, it knows where to look to find the appropriate compiler. For this reason, those using Windows are recommended to use the Developer Command Prompt to follow along.

GCC

GCC has been around since 1987 and is the compiler of choice for many Unix/Linux distributions. It comes installed on many distributions, located in the usr/bin directory but if you do not have it installed, run the following commands:

sudo apt-get install gcc
sudo apt-get install g++
sudo apt-get install libc++abi-dev

You can then confirm that it has been installed, using the next command:

which gcc

For those familiar with Windows, getting started with Linux may seem like a chore. And, apparently, it used to be. It used to require either dual booting your machine, starting a virtual machine, or running a remote Linux server. With Windows 10, the process is way easier. Windows partnered with Canonical, the team behind Ubuntu, to deliver the Windows Subsystem for Linux, aka WSL. Just follow the painless setup instructions here, so that you can follow along with MSVC and GCC.

Compiling a Minimal Program

We are going to take this one very small step at a time. The smallest C++ program consists of just a single file, which we will call main.cpp. This is a source file, identified by the file extension .cpp or .cxx, and is different from a header file, identified by the file extension .h or .hpp. It contains just a few lines of code.

#include <iostream>

#define GOODBYE std::cout << "Goodbye World!" << std::endl;

int main()
{
    std::cout << "Hello World!" << std::endl;
    GOODBYE;
#ifdef KIDDING
    std::cout << "Just kidding!" << std::endl;
#endif
}

As you can see, this is a minimal program that prints Hello World to the console. To convert this source file into something that can run on our computer, it must be compiled in four steps: preprocessing, compilation, assembling, and linking.

Preprocessing

Preprocessing is the first step in the compilation process. The preprocessor is essentially a cut-and-paste robot that adds, removes, and replaces source material within each source file. It works by identifying preprocessor directives laced throughout your source text, like #include, #define, and #ifdef and performing a corresponding transformation.

Our main.cpp file has a few different directives. The first is #include <iostream>. The #include directive looks for the file that follows it (in this case iostream) and replaces the directive with the full text of that file. It does this recursively, so one #include file may itself #include several other files.

The next directive is #define GOODBYE .... This allows us to define a macro identifier that refers to some text, in this case another print statement. We can then use that macro identifier in our source code, and it will be replaced by the reference text during preprocessing. In our example, GOODBYE will be replaced by std::cout << "Goodbye World" << std::endl; wherever it appears.

The last pair of directives is #ifdef KIDDING and #endif. If the KIDDING macro is defined, the source text in between will be included in the preprocessor output. Otherwise, it will be excluded. We could just define KIDDING within the file itself, but maybe we want to use this definition in several places. Instead, we can define KIDDING on the command line to make it #define-d for all source files.

Try running the following commands to see the output of the preprocessor for our unassuming twelve-line file. For MSVC, the /P switch tells the compiler to create just the preprocessor output, sending the content to main.i. For GCC, the -E switch does the same, but it prints the output to the console instead (that should be fun). The /D switch on MSVC and the -D switch on GCC allow you to define macros globally. Try toggling this switch to see how the KIDDING content appears and disappears.

cl main.cpp /P /DKIDDING        #MSVC
gcc main.cpp -E -D KIDDING      #GCC

You may be surprised by size of the output. With MSVC, preprocessing resulted in a transformed source file with over 60,000 lines. This transformed source file is what gets sent to the next phase: compilation. After preprocessing, the source file is also commonly referred to as a translation unit.

Compilation

Compilation is the second stage in the compilation process (apparently no one had a thesaurus on hand). During this stage, your preprocessed code is converted into a series of assembly instructions. These instructions are defined by your CPU. For example, a program designed to run on the x86 CPU architecture (i.e. targeted for that platform) must complete all of its work using instructions defined by that architecture. This is the instruction set for the x86 architecture, for reference.

We can use the /FA switch on MSVC and the -S switch on GCC to generate just the assembly code (.asm with MSVC and .s with GCC).

cl main.cpp /FA     #MSVC
gcc main.cpp -S     #GCC

Assembly

Assembly is the third step in the compilation process. On all platforms, the assembler takes the assembly code generated during compilation and converts it into binary code. For instance, a computer cannot understand the instruction ADD . But it does understand the operation code for ADD, which is 00000000 (at least on x86). We can use the /c switch on MSVC and the -c switch on GCC to create standalone object files (.obj with MSVC and .o with GCC).

cl main.cpp /c      #MSVC
gcc main.cpp -c     #GCC

You can inspect the results using dumpbin on Windows and nm on Linux.

dumpbin main.obj    #Windows
nm main.o           #Linux

Linking

The last step in compilation is linking. During linking, the compiler strings the object file(s) together into a final package, which can be either an executable or a library. If an object file refers to functions or types that are not defined within it, it looks at other object files to try to find those definitions. If the definitions are found in another library, you need to tell the compiler (or rather, the linker) about it.

To link existing object files, use the link command (in the same folder as cl.exe) on MSVC and the -o switch on GCC.

link main.obj               #MSVC
g++ -o main main.o        #GCC (linking may fail when using gcc)

Finally, we have an executable file, main.exe with MSVC and main with GCC. Try running the file, and you should see the following poetry appear:

Hello World!
Goodbye World!
Just kidding!

Up Next

Having assessed each step in turn, we now have a better understanding of how compilation of a single source file works. Each step can be configured further, to add more files, to create debug information, and to do a host of other things. It's yours to go crazy with.

However, few meaningful programs can be written with a single source file. Indeed, modern software is defined by functionality that is split across packages which can shared and reused. To that end, we next turn to creating a modular C++ project using static and shared libraries.



© 2021 Mustafa Moiz.