Why Some Projects Use Multiple Programming Languages - en - Twincloud's Youtube Subtitle Extractor

This video was sponsored by Let's Get Rusty. Today, we're going to cover some

low-level concepts that you probably never have to think about unless you're

working at the systems level. One of the most frequent video

suggestions I receive is to explain why some projects seem to involve multiple

programming languages in their development. Explaining this can be

either extremely easy or extremely difficult depending on the type of

project. Take a full stack framework like Django for example. Python is used

to handle the backend which runs on the server and HTML, CSS and JavaScript to

build the user interface displayed on the client side.

This is a multi- language project. But in this case, it's easy to understand

how everything works in production because we're essentially developing two

separate processes that communicate remotely at runtime using some form of

interprocess communication. But there are other types of projects

where components written in different programming languages are meant to run

together as a single process. So how are these kinds of projects even possible?

Hi friends, my name is George and this is Core Dumpt.

If you follow this channel, I'm pretty sure you love low-level systems. And

that's exactly why I'm excited to say that this video is sponsored by Let's

Get Rusty. Let's be honest, Rust is no longer the language of the future. It's

the language of the present. Don't take my word for it. Just look at the

industry. Big companies are betting big on Rust for building critical systems.

Google, Microsoft, even the Linux kernel itself are now integrating Rust into

their core systems. That's not hype. That's happening. And if you're thinking

about leveling up your Rust skills, whether for personal growth or to land a

job working on real systems, Let's Get Rusty is the go-to place for Rust

training. Created by a fellow YouTuber and one of

the most beloved names in the Rust online community, Let's Get Rusty has

helped thousands of developers, myself included, by the way, master the

language and break into systems programming. They're running a new

cohort very soon. And since spots are limited, now's a great time to check it

out. Visit let's getrusty.com/startwithjorge

or just click the link in the pinned comment below. Big thanks to let's

getrusty for supporting the channel. And now, let's get into today's video.

For simplicity, let's start by considering only programming languages

that compile down to machine code. Generally, each programming language has

its own dedicated compiler. So, we can't just take, say, a Rust file and compile

it using the Go compiler. That's where things start to get

interesting and a bit confusing. If most programming languages have separate

compilers, runtimes, and memory models, how can they possibly live inside the

same binary? What often confuses people is the common

oversimplification that compilers are just tools that turn source code

directly into executable files. Now, don't get me wrong, compilers do produce

executable files, but that's only the final result of a much more complex

multi-step process that we usually don't see.

To illustrate this, let's look at a simple C program. If you are not a C

developer, don't worry. This program simply prints a message, but the message

it prints changes depending on the operating system you're running it on.

On most GNU Linux systems, the go-to compiler for C is GCC. It used to stand

for GNU C compiler, but that's no longer true. And in a moment, you'll understand

why. To compile and run our C program, we

usually just call GCC and pass it the file or files we want to compile. Then

an executable is generated. From our perspective, it's just two

simple steps. One to compile the program and another one to run it. But under the

hood, the compiler is doing a whole lot more. Internally, GCC goes through four

main phases to turn a C file into a working executable. Now, GCC deserves

its own deep dive, but I'll break it down quickly here.

The first step is pre-processing. This prepares the source code by doing things

like removing comments, expanding macros, resolving conditional

compilation, and crucially resolving includes. When you use include, the C

pre-processor replaces that line with the contents of the header file and all

the headers it includes, effectively inserting that code into our file before

compilation begins. So, the output is still C code, but pre-processed for the

next step. Next comes compilation, but not directly

into machine code. Instead, the pre-processed code is translated into

assembly language, which is the instructions that the computer will

execute, but still in a human readable language.

So, here's our first myth busted. A compiler doesn't always convert source

code into machine code. In fact, many compilers convert source code into an

intermediate representation like assembly or even into another

programming language. The third step involves the assembler,

which is technically another compiler, but it takes the human readable assembly

code from the previous phase and translates it into machine code, the

ones and zeros your CPU understands. The result is called an object file.

But here's the catch. This object file isn't runnable yet. GCC still needs to

resolve the position within the binary where functions will be placed.

In our simple example, we're just printing text to the console. But

remember, the actual implementation of the printf function lives in the C

standard library. So that library also needs to go through the same compilation

steps we just described. This brings us to the final step,

linking. At this stage, we may have multiple object files, some from our

code, others from external libraries we included during development. The

linker's job is to combine all these object files into a single

self-contained executable. There are two ways to do this. The easiest is to take

the machine code of each required function from the library and copy it

into the final executable. This is called static linking. All the

library functions our program needs are embedded directly into the output file.

Everything is self-contained and hence ready to run whenever we want.

But another option is dynamic linking. Think about how many programs on your

system use the print function from the standard library. If every one of those

programs statically included its own copy of that function, you'd end up with

thousands of identical copies stored across your disk.

With dynamic linking, libraries are pre-ompiled into a special type of file

called a dynamic shared library. On Unix like systems, these libraries

have the SO file extension. While on Windows, they are identified by the DLL

extension. These dynamic shared libraries are

similar to executable files in that they contain executable code for the

functions provided by the library. The key difference is that they don't

contain an entry point to start execution, which makes sense as

libraries typically don't have a main function which is used to start a

program. When our program is compiled with dynamic linking, the linker won't

copy the functions from the library directly into the executable. Instead,

it will simply insert a reference to the library that contains the machine

instructions for that function. At runtime, if the program needs a

function from that dynamic library, the operating system will load the required

function into the program's address space so the program can use it as if it

were part of the executable. While this may sound a bit strange at

first, it's actually incredibly efficient. Instead of storing multiple

copies of the same function across different programs, the system only

stores the library once. Each program that needs it simply references the

shared library and loads it only when necessary on demand at runtime. This

saves both disk space and memory. A huge advantage, especially on systems with

lots of programs that depend on common libraries. It's also more flexible since

you can update or patch a library without having to recompile every

program that uses it. Linking, both static and dynamic, is a deep topic that

honestly deserves its own video. If you're interested in learning more, let

me know in the comments, and I'll dedicate a full episode to explaining

how linking works, including the ins and outs of dynamic libraries.

Now, back to the compilation steps. You might be wondering, what's the point of

all this modularization? Why break the process into so many

phases if the compiler could just go directly from source to executable?

Well, the reason we don't normally see all these intermediate steps is because

compilers like GCC are configured by default to hide them. They just show you

the final result, the executable. But with the right flags, we can expose

all those phases. For example, using GCC, if you compile a

program and add the save temps flag, you'll get not just the final

executable, but also all the intermediate files.

We can even stop the process at a specific stage. For example, the S flag

makes the process stop after generating assembly.

This is incredibly useful in educational settings where you might want to see how

highle C code translates to assembly or machine code. In professional

environments, this is also used to inspect performance critical code. You

can look at the generated assembly to verify whether the compiler is producing

efficient instructions. Even more interesting, we can start from

any phase in the pipeline. We can pass GCC the assembly file and simply tell it

to assemble and link it. This is huge because it means we can

write part of our code in assembly, pass it to the compiler at different stages,

and then the linker will take care of mixing them together into an executable

file. This already starts to answer our

original question. Let's walk through an example. Suppose we need to write a

program that calculates how many prime numbers exist between zero and a given

number. And we want it to be as fast as possible. We could write the whole thing

in C. But let's say we don't trust the compiler optimizations. So we decide to

write the heavy calculation function directly in assembly and just call it

from C. Then we pass both files to GCC which

will compile and assemble the C code, assemble the assembly code and link both

object files into a single executable file.

And voila, we've just compiled a multi- language project.

This technique is used by real world systems like the Linux kernel, ffmpeg,

open SSL, and many embedded projects. They often contain C for most of the

logic, but fall back to assembly when performance really matters.

Now, here's a fact that a lot of you might have already concluded, but I'm

still going to mention anyways. What we casually call the C compiler, like GCC,

isn't just one compiler. It's actually a tool chain, a pipeline of tools that are

executed in sequence. Each stage consumes the output of the previous one.

And each of these tools is pluggable. We can replace parts of the tool chain or

feed in our own files at various points. This is why GCC doesn't just support C.

It also supports C++, Objective C, Forran, ADA, D, and even Go depending on

how it's configured. Originally GCC stood for GNU C compiler but over time

it evolved into a compiler suite that supports many programming languages

beyond C. Because of this expansion the name GNU C compiler became misleading.

So the acronym GCC was redefined to mean GNU compiler collection.

I think this is really important to understand. Every time someone casually

calls it the GNU C compiler, it can unintentionally reinforce the idea that

this whole system is a single black box that transforms CC code into a runnable

file. But the truth is, it hasn't been just that for many years now.

Okay, but assembly isn't for everyone. And to be fair, since assembly is

already part of the compilation pipeline, using it feels a bit like

cheating. So what about mixing highle languages instead? For example, instead

of writing a function in assembly, what if we implement part of our project in

FORRAN? Well, this is totally possible and

actually more common than it might seem. In this case, however, we usually need

multiple steps. One to compile and assemble the forran file, another one to

compile and assemble the C file, and a third to assemble both object files into

a single executable. Unlike assembly which is already

embedded in the C compilation pipeline, forran has its own pipeline, its own

compiler and sometimes even its own runtime dependencies. And by now it

should be super clear that the answer to our original question, how can different

languages live inside a single executable comes down to the linker. You

see, the different languages involved don't even need to come from the same

compiler suite as GCC. Take Rust for example. It has a completely different

tool chain from C. Different compiler, different build system, and a different

philosophy altogether. I could spend hours talking about the insane

engineering behind its compiler. But what we care is that when it comes time

to produce the final binary, guess what? Rust 2 relies on a linker.

So if we want to call a Rust function from C, here's how we do it. We

implement the function in Rust. We compile the Rust code into a static or

dynamic library. We declare and use the function in the C code. Then we compile

the C code and link it with the Rust compiled library.

And of course, it works the other way, too. We can call C functions from Rust.

It all depends on what we're trying to achieve. In fact, it's more common to

call C code from Rust than the other way around. C is older and many mature

libraries and system APIs are written in C. Rust developers often need to hook

into that existing ecosystem, especially in areas like graphics, cryptography, or

operating system APIs. There are several reasons why you might want to mix

multiple programming languages in a single project. But another one that

comes to mind is performance. In many projects, the entire system

doesn't need to be blazing fast, just certain parts. So what a lot of

developers do is write most of the project in a highle language for

convenience and development speed and then implement only the performance

critical components in a lower level language like C.

Before we wrap up there's one more really important point to understand.

Let's say we have two highle languages language A and language B. Just because

both of them have a final linking phase doesn't automatically mean they can be

correctly linked together into one executable. Here's a very simple

example. We've implemented a function in language B and we're calling it from

language A. Even if both compilers emit assembly for the same architecture, they

might make different assumptions about how data is passed between functions.

For example, the compiler for language A might pass the two function parameters

in registers 0 and one, but the compiler for language B might expect the

parameters in registers one and two. Both are producing valid machine code,

but since their calling conventions differ, the result will be undefined

behavior at runtime. Language A will place arguments in the wrong place, and

language B will perform operations using incorrect data.

And it doesn't stop there. In this example, there's another problem. After

computing the result, the function writes it to register one and returns.

But language A expects the result to be in register zero. So, not only does

language B compute the wrong result, but language A doesn't even see or use that

result at all. The same example, but this time with two

languages X and Y. Here, both languages use register zero and register 1 to pass

and receive parameters. But let's say language X uses pass by reference for

all function arguments. It puts the addresses of variables in the registers

which is not the same thing as putting the values of those variables directly

in the registers. Meanwhile, language Y passes by value.

So, it expects the actual values in the registers. That's why here it

immediately add the content on those registers as soon as being called at

runtime. This mismatch causes language Y to interpret memory addresses as actual

values, adding those addresses instead of the values stored at those addresses,

leading to completely wrong behavior or even a crash.

So even though both compilers produce valid and executable assembly, the final

linked binary is inconsistent unless both sides agree on how to talk to each

other. These kinds of low-level rules are

defined by what's known as the application binary interface or ABI.

Just as an API defines functions at the application level, an AI defines how

different components of binary code interact with each other through the

hardware. So when we're mixing two different

languages, it's not enough that they both produce object files. at least one

of them or specifically the part of it that interacts with the other language

must conform to the others AI expectations.

In our language X and language Y example, one way to make this work is by

modifying language Y to dreference the values. That way the function will first

fetch the data from the memory addresses provided, loading the actual values into

the registers before performing the addition. In this case, language Y is

being made to conform to the AI expectations of language X.

But we could also take the opposite approach. Make language X conform to the

AI expectations of language Y by simply loading the argument values directly

into the registers instead of their addresses. This way, the function in

language Y can immediately add the values when it's called. As I mentioned

at the beginning of this video, these are low-level details that we usually

don't have to think about unless working at the system level. The good news is

language designers know this. Modern languages provide tools, keywords, and

compiler flags to make this process easier. In C, you might declare an

external function using extern. In Rust, you'd use the extern keyword and the no

mangle attribute. In forran, you can use the bind

attribute. In Go, you can use a special block of

comments placed directly above the line import C to include C header files. And

it even lets you write inline C code directly in your Go source files. Every

language has its own way of doing this. But at compile time, these declarations

all serve the same purpose. They tell the compiler, hey, this function will

interact with code written in another language. Please make sure the generated

assembly follows the expected ABI. And let's wrap things up for now. In the

next part, we will cover how to mix compiled languages with interpreted

languages. So, make sure to subscribe because you won't want to miss it.

Don't forget to check Let's Get Rusty linked in the pinned comment below. And

if you liked this video or learned something new, please hit the like

button. It's free and that would help me a lot. See you in the next one.