Intro

When I first began programming and was flooded by the innerworkings of computing, software development, computer architecture, programming language abstraction, low level systems engineering, etc, it was easy to simply push that aside and simply focus on developing ComputationalThinking. After all, programming is :

  • Decomposition
  • Pattern recognition
  • Abstraction
  • Algorithm design

But as I learned new languages and my understanding grew, I began to ask how do these different programs and languages work together?

// Mind you this was before I understood APIs, data serialization, different markup notations, and the various other ways they communicate.

Explanation

Some programming projects have different languages working on different parts of the program, completing different tasks. There are two types for how this manifests.

Multi Process

These are projects that spin up 2+ processes, separate address spaces on memory, to run the cohesive whole of the program. Think of Web apps.

Many full stack frameworks are multi-language projects/tools because of the tasks they achieve. With different parts moving together, to achieve different outcomes and objectives. For example, Django uses Python for the app’s backend but still has the typical JS suite you’d expect for the client side code.

Single Process

What about programs and projects where the different languages work in tandem as a Single Process? They can’t all be part of the same binary because each has it’s own compiler, right?

We must first discuss Compilation

Compilers

They don’t just convert code into executables, that is indeed the final result but it’s not a direct, one step process.

Steps

  • Pre-processing
  • Compiling
  • Assembling
  • Linking

Pre-processing

Prepares the source code of the program by readying it for compilation. It does this by:

  • Removing comments
  • Expanding macros
  • Resolving conditional compilation, e.g. OS based conditions…
  • Resolving imports and includes. The output is still code that you can read, in the same language, but it’s been massaged and ready for the next step.

Compiling

The processed code is then translated into Assembly, the flavor and dialect of which will depend on several factors including the computer’s architecture.

Assembling

This is done by the Assembler (another compiler) that takes the assembly code which is still human readable and translates it to binary code for the CPU. This is known as an ObjectFile. But is not runnable yet as a binary.

Linking

The position of the functions within the binary will still need to be resolved and placed correctly, as your program may use functions from external libraries that will need to be compiled as well and linked (placed) in the correct position within your program’s Object file for it to be run.

Static Linking

is the easy way, take the machine code of each function and method used and copy it into place in the object file.

Dynamic Linking

To avoid duplication and bloat of repeatedly used functions’ binaries being saved over and over again. Libraries are precompiled to files known as a Dynamic Shared Library. // often a .so file on Unix systems and .dll on Windows.

The contain executable code for the library functions, but without a main entry point.

Linkers won’t copy the binary code into the object file. Instead:

  1. Insert a pointer for the machine code of that function.
  2. This code is then loaded by the OS into the address space of the process.

This approach is way more efficient as it saves disk and memory space. Final result is the executable file that can be run by your CPU.

Compilers and the different stages

Some compilers like GCC can stop mid process, allowing developers to inspect the files of the intermediate stages. It can even take an assembly file as input and then link it and produce and executable. // Do you see where this is going?

Different parts of the program can be written in different languages based on the needs and specifications. For example, perhaps you trust Rust’s compiler more than Python’s interpreter for critical parts that must be performant; Or you write part of a C program in Assembly to ensure it behaves the way you want it to.

you can then pass the file to the compiler, GCC for the 2nd example, and it’ll manage compiling, assembling, and linking it all together into a single executable.

Compilers’ structure

Compilers like these are actually a pipeline of different tools each responsible for a task in this multi-stage process. These tools can be interchanged, plugging in different parts for different tasks handling different files. That’s why GCC handles C, C++, C#, Objective C…etc, that’s why GCC stands for GNU Compiler Collection.

When working with multiple high level languages that form a single program, they may have their own compiler and assembler but it all comes down to the Linker.

Linker’s role

The different languages and their compiled and/or assembled files don’t have to be from the same Compiler suite/toolchain.

Example: calling a Rust function from a C program.

  1. Compile the Rust code into a static or dynamic library.
  2. Declare and use it in the C code.
  3. Compile the C code
  4. Link it all together using an appropriate Linker.
  5. Final result is a working executable.

It's common for Rust programs to call on C libraries, since C is older and its ecosystem is well established.

Application Binary Interface ABI

These interfaces that define the rules for how the binaries of different high-level programming languages gel, talk, and interact with each other.


Resources