C/C++ Compilation process

by Hristo Iliev | Published on 2021-01-02
category C / C++

The compilation process of C and C++ projects and application is a bit different than most other languages. First I need to mention for those that don’t know – there are two types of programming languages – interpreted and compiled.

The interpreted languages (aka scripting languages) are evaluated when they are run. This is slower because computer doesn’t understand text – it understand certain commands and operations for computations. The computer will have to first read the text and translate it into computer commands or computation. Only then will it be able to give the commands to the CPU for execution.

Compiled languages produce some sort of intermediate bytecode (assembly-like instructions). Assembly is the processor language – it consists of a few commands and memory managment. It is not convinient to write directly in assembly so people have created a higher-level of abstraction by compiling understandable by humans text into those commands for the computer.

How does C/C++ compilers work
If not understood correctly…
Headers and using external libraries
Header guards
Interlinked classes in C++
Conclusion

How does C/C++ compilers work

In this article I won’t talk directy how each compiler approaches the process so it will be purely theoretical. The compilers for other languages are easier to work with as they will more easily do what you expect them to do. The C language compilation process on the other hand is split into two parts – compilation and linking. Compilation process would take a single file and produce the CPU instructions for that file.

When you do an application though you rarely have one single file. You also take all compiled files and combine them into an executable. This is called linking and it is done by the linker.

If not understood correctly…

This process makes it very easy to make mistakes if you don’t fully understand it. In each line of C/C++ code if you execute something the single file that will be compiled needs to know of its existance.

For example if you have read my introduction into C++ article you would know that functions can have a declaration that defines it signature. They also have a definition that describes the actual calculations that the function executes. The signature of the function includes what the function name is, what it returns and what type of arguments it accepts. In fact not only functions can be declared and used separately there are also global variables, structs, classes, etc. We also refer to all of these as symbols. A symbol must have a declaration before used.

If we look at the introduction to C++ example you can put the function definition into a separate file. But for the file that uses that function to know of its existance you need to also add a declaration into the file that uses that function. This has to be done because the files are compiled separately. This is how it works for any symbol type and is one of the most common errors.

The second thing you need to consider is the linking process. You can have a single file compiled with only the function declaration. It will produce machine code (assembly) that would refer to this function’s execution but it will be like an empty space referring to something it doesn’t exactly know about. The linker’s job is to find among all compiled files the definition of the symbol and use that. There are two common errors here – symbol not defined and symbol defined multiple times. You must be careful to avoid both errors because your program will not be able to produce an executable or a library.

Headers and using external libraries

When you compile or get an external library and you want to use them into your own project you have to link the output file of the compilation process of this library. But to be able to compile your program to use the functions of that library you would need to add declarations to these library functions into your own code. In fact even when you have mutliple files in your own code you can add declarations to function in each .c or .cpp file you write.

This is the repetitive and hard way of doing things. The language offers a method to copy and paste a file contents automatically for you using the #include “<filepath>” statement. You do that by writing the path to a file and the compiler will read this file and copy its content replacing the include statement with it. The most common name for these files is header files. Header files are files that usually have the extension .h or .hpp and are used to simplify the declaration of symbols. In headers you should mainly write declarations. If you have definitions in them and include them in two or more translation units (two .c or .cpp files) you would get the symbol defined multiple times kind of error. Just a note here, this can be avoided by using header guards.

The symbol not defined type of error will happen when you have declared a symbol and used it but the actual implementation or definition of this symbol is missing from the library file or from any translation units that you compiled.

Header guards

Another common errors is that headers can be reincluded. You can include a header that includes another header and end up with repeating declarations in your file. This is why header guards exist. The header guards should surround all code in your headers like this:

// ExampleHeader.h

#ifndef SOME_UNIQUE_NAME_HERE
#define SOME_UNIQUE_NAME_HERE
 
// your declarations (and certain types of definitions) here
 
#endif

This would at least be the safest way to compile your headers. Some but not all compilers also support another header guard called #pragma once. It is put on top of the header file:

// ExampleHeader.h
#pramga once

// your declarations (and certain types of definitions) here

Interlinked classes in C++

In C++ you can also have two classes that are coupled together. In games for example you can have an entity that has multiple components. It will need to update them, query them, add them, remove them, etc.

But the component should also know about the entity that owns them. They could then call methods to find other components that the entity has for example. To do this both classes have to know for each other. You can only really achieve this cleanly if you have multiple files.

// Entity.h
#pragma once

# We need forward declaration
# this declaration only states the existance of the class
class Component;

class Entity {
public:
  std::vector<Component> Components;
}

// Component.h
#pragma once

// There is no problem to directly include this here 
// But you cannot have both file include each other as it will loop around itself
// as the compiler would try to endlessly include both files.
#include "Entity.h"

// You can expand the declaration of the component class
class Component {
  // ...some component stuff...
}

In the implementation files that you actually compile you could then include both header files and when defining the function they will then have the full knowledge of both classes.

Conclusion

This is it about the compilation process. In other compiled languages like C# or Java this is simplified by not including other files. Instead directly including the symbols from them which solves the most common errors. In C++ it is a bit harder but it is also more powerful. When using external libraries they would provide their own header files. This way you don’t have to declare all the functions yourself. If you’re going to write a library that will be used by other people remember to separate all public functions in some sort of a public header file(s). You would later be distribute that file(s) with the compiled library.