Complex C++ Compilation from the Command Line

April 13, 2021

·

C++ Build Process

In the last article, we learned how to compile a single C++ source file from the command line. We also discussed the four stages of compilation: preprocessing, compilation, assembly, and linking. However, a program with only a single source file won't get us very far. Modern software is often comprised of millions of lines of code spanning thousands of source files.

Understanding how to compile projects with multiple source files from the command line will give us insight into the structure of these larger projects and will also provide a foundation for understanding build systems, which simply provide a convenient interface for the behaviors we are about to discuss.

Units of Organization

A C++ program is organized into several units of organization: executables, static libraries, and shared libraries. Each unit provides distinct functionality.

Binaries: When downloading software, you often have a choice between source and binaries. A binary simply refers to the packaged version of the software, pre-compiled for your platform. Both executables and libraries can come in binary form. Obtaining the software as source means you need to compile it for your platform yourself.

Executable

The easiest unit of organization to understand is the executable, a file that (for lack of a better phrase) can be executed. Every program that runs on your device is, by definition, an executable. This includes games like Hades or League of Legends, applications like Microsoft Word, and services that run in the background like Discord.

The executable contains all of your instructions and data structures in binary format. When you run the executable, either from the command line or by double-clicking it, the operating system will load the program into memory, set the program counter to point to the first instruction of the program (the main function, in the case of a C++ program), and iterate over the instructions until completion.

The binary code in an executable is typically layed out using relative addresses, which are converted into absolute memory addresses by the operating system upon load. This allows the compiler to be more flexible (it can provide the same relative layout to a computer that has 2KB of RAM vs. 2GB of RAM), and leaves the memory distribution of the program to the operating system, which is much better positioned to do so in an efficient manner (e.g., through the allocation of frames, paging, etc.).

This file can take different forms depending on the context in which it appears. On Windows, an executable typically has the .exe extension. On Linux, and other Unix-based systems, there is no such equivalent file extension. Instead, whether a file is executable or not depends simply on its file permissions. Specifically, a file must have the execute (no different than read or write) permission to be an executable. That said, there is nothing stopping a Linux executable from having a file extension. Indeed, by default, GCC will create an executable with the extension .out.

A compiler will create an executable by default from the provided source files. For instance, when we compile the lone main.cpp from the previous article

cl  main.cpp                # MSVC
g++ main.cpp -o main.out    # GCC

we end up with main.exe with MSVC and main.out with GCC.

Static Library

The next unit of organization is the static library. A library is a distinct set of code that represents some thematic functionality. It is sometimes, but not always, designed to be reusable. Consider a technical project for which you have written several math functions like absolute, min, and max. These math functions may be useful in many contexts besides just the current project. In this case, you might split the math functions into their own math library. This makes it easy to use in other projects. Even when you don't want to reuse the code, splitting it off into a library can help organize your project into logical sections.

Once compiled, a static library is similar to an executable in that it contains functions and data structures in binary form. It is really just a pack of object files. The primary difference is that a static library cannot be run by itself. It must be linked into an executable to be useful. On Windows, a static library file has the extension .lib. On Linux, the file will have the extension .a. By convention, libraries on Linux are also prefixed with lib, as in libm instead of m for the standard math library.

It is not strictly true that a library cannot be used as an executable. Some libraries will define a main function, allowing them to be compiled and run as executables. For example, SDL defines main in its library and serves as a good starting point for understanding the practice of implementing entry on the library side rather than on the client side. However, exploration of this practice is beyond the scope of this article.

We can create a static library of main.cpp with MSVC in two steps. First create the object file. Then use the lib tool (in the same directory as cl) to create the static library.

cl  main.cpp /c      # Create object file without linking
lib main.obj         # Link object file into library

This will create main.lib, our static library file. Doing the same with GCC is a two-step process as well. First create the object file. Then use the ar (as in archive) tool to create the static library.

g++ main.cpp -c             # Create object file without linking
ar  rvs main.a main.o       # Link object file into library (rvs = options to ignore for now)

This will create main.a, our static library file.

Shared Library

A shared (or dynamic) library is similar to a static library in terms of functionality and logical structure. The difference relates to the manner in which the library is linked to an executable.

The contents of a static library are baked in to an executable. That is, the code of the static library ends up side by side with the loose code of the executable itself. This makes distribution very easy, as you simply need to distribute the executable. There is also a slight speed advantage to using a static library. However, because all of that code is packaged together, it can make for a very large executable, all of which must be loaded into memory on program start.

By contrast, a shared library is not included in the executable. Rather, references to functions or types in the shared library are added to the executable during linking. When the program is started and a function in the shared library is called, only then is the shared library loaded into memory. This has the additional benefit of allowing various programs to share the same library. For example, most services provided by Windows (like opening a window or outputting to console) are exposed as shared libraries so that multiple active programs can share them while minimizing memory footprint. The tradeoff is that any shared library that is not already installed on the target operating system must travel with the executable, which can make distribution more cumbersome.

On Windows, a shared library file has the extension .dll. On Linux, the file will have the extension .so. We can create a dynamic library of main.cpp with MSVC in two steps, as before, first creating the object file and then using the link tool with the /DLL switch.

cl   main.cpp /c     # Create object file without linking
link main.obj /DLL   # Link object file into shared library

This will create a main.dll, our shared library file. We also need a main.lib, which is created when our objects export functions or types. Because main.cpp does not export any functions that can be called by other projects, no library file is created. We can create the shared library with GCC, as well, using the -fpic switch while creating object files and the -shared switch when linking.

g++ -fpic -c main.cpp               # Create object file without linking
g++ -shared -o libmain.so main.o    # Link object file into shared library

This will create libmain.so, our shared library file.

You may be wondering why the MSVC process needs to output two files (.dll and .lib) while the GCC process needs to output only one (.a). The difference lies in the way that functions and types (collectively, symbols) are exported by libraries on different platforms. On Windows, you must explicitly specify which symbols to export (discussed later). The .dll file contains the object files while the .lib contains an export table that helps the linker resolve references to the shared library. Despite sporting the same extension, the .lib generated during shared library creation is different than the .lib created during static library creation. The former is merely a stop-gap until the object files in the .dll can be found at run-time, while the latter contains the object files themselves.

In contrast, Linux exports all symbols by default. Accordingly, the .so file contains both the object files that will be referenced at run-time and the export data to be used by the linker at compile-time. See here for more details.

Note that libraries, like executables, can link to other libraries.

Putting the Pieces Together

Now that we have a theoretical understanding of how projects are chopped up, let's put that understanding to work by implementing a simple project that contains client code, a static library, and a shared library.

Make a Static Library

Create two files called static.h and static.cpp. The header file just contains a declaration for a function.

// static.h
#pragma once

void static_hello();

Because the object files in a static library are baked into an executable, we do not need to export the static_hello function.

The source file will implement this function.

// static.cpp
#include "static.h"
#include <iostream>

void static_hello()
{
    std::cout << "Hello static world!" << std::endl;   
}

Now build the static library with this source file.

# MSVC
cl  static.cpp /c
lib static.obj

# GCC
g++ static.cpp -c
ar rvs libstatic.a static.o

This will create static.lib with MSVC and libstatic.a with GCC.

The #pragma once directive is a non-standard (but widely supported) preprocessor directive that tells the compiler to include this header file only once in any given translation unit. So even if the source file #include-s static.h fifteen times, the text of static.h will only appear in the translation unit once. Every header file should contain this directive.

Make a Dynamic Library

Create two files called shared.h and shared.cpp. As with the static library, the header file contains the function declaration.

// shared.h
#pragma once

#ifdef _WIN32
    #ifdef EXPORTING
        #define SHARED_API __declspec(dllexport)
    #else
        #define SHARED_API __declspec(dllimport)
    #endif
#else
    #define SHARED_API
#endif

SHARED_API void shared_hello();

This header file is a bit more complex than it was for the static library. As noted earlier, on Windows you need to explicitly export shared library functions that you want to call from client code. The SHARED_API macro helps us do this. When targeting Windows (in which situation, MSVC defines _WIN32), we define SHARED_API depending on whether symbols are being exported or not. When compiling the shared library, we will define the EXPORTING macro, which means that SHARED_API will equal __declspec(dllexport). This is a Windows-specific attribute that marks the function (or type) for exporting. When using the shared library in client code, we will not define the EXPORTING macro, and SHARED_API will equal __declspec(dllimport).

When not targeting Windows, _WIN32 is not defined and SHARED_API resolves to empty space. In other words, the macro is simply ignored during GCC/Linux compilation.

The source file implements the shared_hello function.

// shared.cpp
#include "shared.h"
#include <iostream>

void shared_hello()
{
    std::cout << "Hello shared world!" << std::endl;
}

Now build the shared library with this source file.

# MSVC
cl   shared.cpp /c /DEXPORTING
link shared.obj /DLL

# GCC
g++ -fpic -c shared.cpp
g++ -shared -o libshared.so shared.o

This will create shared.dll and shared.lib with MSVC and libshared.so with GCC.

Make a Client Executable

Create a single file called main.cpp and implement the main function.

// main.cpp
#include "static.h"
#include "shared.h"
#include <iostream>

void local_hello()
{
    std::cout << "Hello local world!" << std::endl;
}

void main()
{
    local_hello();
    static_hello();
    shared_hello();
}

A few different things are happening here. First, we #include the respective header files for the static and shared libraries. This is required by compilation proper (step 2 of the compilation process), so that it can validate your code. For example, we are asserting in main.cpp that static_hello is a function that takes zero parameters, but the compiler cannot confirm this unless we provide it with the original declaration of the function contained in static.h. The same goes for shared_hello.

Next, we call three functions in the body of main: one defined locally, one defined in a static library, and one defined in a shared library.

Now build the executable, linking in the static and shared libraries.

# MSVC
cl   main.cpp /c
link main.obj static.lib shared.lib /OUT:test.exe

# GCC
g++ main.cpp -c
g++ main.o libstatic.a libshared.so -o test

This will create test.exe with MSVC and test with GCC. We are almost at the finish line. At this point, on Windows, you can run test.exe and it should just work. However, if you try to run test on Linux, you will get the following error:

./test: error while loading shared libraries: libshared.so: cannot open shared object file: No such file or directory

As noted above, when you run a program that relies on a shared library, the operating system needs to go out and find it. In this regard, each operating system looks in different places to find these shared libraries. Windows starts by searching in the directory where the executable resides. Because shared.dll is in the same directory as test.exe, Windows finds it with no problems and the program completes successfully. You can find the Windows shared library search order here.

On Linux, we need to set the shared library search path manually, so that it looks in the executable directory, as follows:

LD_LIBRARY_PATH=/path/where/executable/lives/   # don't forget the trailing slash
export LD_LIBRARY_PATH

Now it should run on Linux as well. Learn more about the Linux shared library search order here.

Keen readers may have noticed one more gap in our link process. Where does iostream come from? This header is part of the standard library, which provides core functionality for C++ in a cross-platform way. Like any other library, it needs to be linked into executables or libraries that use it. The standard library is unique, though, because it comes with the compiler. The compiler will typically take care of linking to the standard library for you.

Here's the final output.

Hello local world!
Hello static world!
Hello shared world!

On the Horizon

Et voila! We have just created a small but enlightening program that demonstrates the modular structure that characterizes modern software. On top of that, working with the command line has given us an appreciation for some of the available options and enabled us to search for other options as necessary.

Though this project is more complex than the first one, each module still only contains a handful of files. While a project of any complexity can hypothetically be created from the command line, it can be tedious and error-prone to type out all of the arguments manually (e.g. a growing list of source files). Multiply this complexity by an order of magnitude when you are trying to maintain these arguments across platforms and compilers. Grappling with these issues is the domain of another set of tools built on top of the compiler - build systems. We turn to these tools next.



© 2021 Mustafa Moiz.