What is memory? Part 4: Stack allocations, dynamic allocations, and the heap
Welcome back to the "What is memory?" blog series. In our last post, we covered a lot of ground on some pretty low-level, complex topics. Let's do a quick review of what was discussed:
- We learned that a program is executed by something called a thread.
- A thread executes on a CPU by reading and writing to registers and memory. Registers are small pieces of storage located on the CPU, and which allow the thread to control program execution, and read and write to main memory.
- A thread executes instructions, which are bytes located in a readable, executable
.text
segment in a process's address space. In x86-64, the register%rip
stores the address of the currently executing instruction. When the instruction is executed, the CPU updates%rip
to point to the next instruction. For some instructions, this is just a simple increment to the next instruction in the function. For other instructions such ascall
, this can cause the CPU to update%rip
to point to an entirely different sequence of instruction bytes in the function being called. - A thread runs on a stack, which is a range of memory where all locally-scoped variables and state is stored. This can include local variables, the values of registers before a function is called or during a function's execution, and also frame pointers.
- Frame pointers are a mechanism that threads use to record the state necessary to return from function calls, and can also be used to walk a thread's call stack by tools such as debuggers and profilers. When a function is called, the current value of the base pointer
%rbp
is stored on the stack at%rsp
, and then the pointer to that stack address is stored in%rbp
so it can be accessed later when returning from the function, or when walking the thread's call stack. Frame pointers can be a bit confusing at first, so if that doesn't make sense, I suggest re-reading the last post or leaving a comment so I can further clarify.
We've learned a lot about memory so far in the series, but at a high level, there are two major contexts that we can point to as having covered:
- Statically allocated memory – This is memory that is allocated before the process even begins to run. For example, if we take another quick look back at our trusty lucky_number.c
example program, we can see an example of static memory allocation:
// lucky_number.c
#include <stdio.h>
static int lucky_number = 7;
int main() {
printf("My lucky number is %d!\n", lucky_number);
lucky_number = 2;
printf("My lucky number is now %d!\n", lucky_number);
return 0;
}
That static int lucky_number
variable is statically allocated. The dynamic linker / loader allocates space for that int in the new process's address space before it begins running. When the process is running, whenever it needs to read or write lucky_number
, it uses that statically allocated address.
- Stack allocated memory – This is memory that is allocated for storing state that is local to a specific function call or logical scope. Let's take a look at our other example program from the last post, local_scope.c
, to see what we mean:
// local_scope.c
#include <stdio.h>
static int add_eight(int num) {
int sum;
sum = num + 8;
return sum;
}
int main() {
int my_num = 8;
return add_eight(8);
}
When the add_eight()
function is called, the variable int sum
will be allocated on the stack. When add_eight()
returns, and %rsp
is incremented to point to an address above where sum was stored, that address is no longer safe to access as the thread may eventually call another function that uses the same address for another purpose.
Note that scope does not just mean function scope. A function itself can have smaller scopes as well. For example, a loop, or an if statement, both have their own smaller scope which may not be accessible to other parts of the function.
Stack allocations are only safe in scope-local context!
To drive the point home, let's take a look at an example of a buggy program, local_scope_unsafe.c
, that indicates how it can be unsafe to reference a stack memory in the wrong scope:
// local_scope_unsafe.c
#include <stdio.h>
static int add_eight(int num) {
int sum;
sum = num + 8;
return sum;
}
static int *add_eight_pointer(int num)
{
int sum, *ptr;
sum = num + 8;
/* This is pretty scary! */
ptr = ∑
return ptr;
}
int main() {
int *sum1_ptr, sum1_before, sum1_after, sum2;
sum1_ptr = add_eight_pointer(56);
sum1_before = *sum_ptr;
sum2 = add_eight(0);
sum1_after = *sum_ptr;
/*
* sum1_ptr should have contained the address of the value
* 64 on add_eight_pointer()’s stack, as the value was not
* yet corrupted.
*
* Expected output:
* BEFORE -- *sum1_ptr: 64
*/
printf(“BEFORE -- *sum1_ptr: %d\n”, sum1_before);
/*
* Now that we’ve called add_eight(), which used the
* stack for its own purposes, the value stored at the
* address in sum1_ptr could be anything.
*
* Expected output:
* AFTER -- *sum1_ptr: ??
*/
printf(“AFTER -- *sum1_ptr: %d\n”, sum1_after);
return *sum1_ptr + sum2;
}
Let’s see what happens if we try to compile and run the program:
$ gcc -o local_scope_unsafe -O0 local_scope_unsafe.c
BEFORE -- *sum1_ptr: 64
AFTER -- *sum1_ptr: 0
Interesting. The first line shows us what we expected. We called add_eight_pointer(56)
, which should have returned a pointer to 56 + 8 == 64
. The value at the address stored in the int *sum1_ptr
pointer therefore appears to be correct. After calling add_eight()
though, the value at that address has been changed to 0. Why? It's because when the thread calls add_eight()
, it ends up overwriting the contents stored at the stack address in sum1_ptr
with some other value.
Let’s see exactly what's going on by taking a look at the instructions of this program, starting with add_eight_pointer()
. Note that this is recapping some of what we covered in the last post when going over the assembly code for local_scope.c. This stuff can be a bit hard to wrap your head around, so going over another example just to be sure it really makes sense seems prudent. If any of this is confusing, try reading through the last post again, or leave a comment below with any questions.
With that, let’s look at add_eight_pointer()
:
000000000000116a <add_eight_pointer>:
116a: 55 push %rbp
116b: 48 89 e5 mov %rsp,%rbp
116e: 48 83 ec 18 sub $0x18,%rsp
1172: 48 89 7d e8 mov %rdi,-0x18(%rbp)
1176: 48 8b 45 e8 mov -0x18(%rbp),%rax
117a: 48 83 c0 08 add $0x8,%rax
117e: 48 89 45 f0 mov %rax,-0x10(%rbp)
1182: 48 8d 45 f0 lea -0x10(%rbp),%rax
1186: 48 89 45 f8 mov %rax,-0x8(%rbp)
118a: 48 8b 45 f8 mov -0x8(%rbp),%rax
118e: c9 leave
118f: c3 ret
Before we go over these instructions, let's start by defining the current value of %rsp
as %rsp_{init}
. In other words, %rsp_{init}
is the initial value of %rsp
before we invoke any instructions in the function. This will be a useful reference later when we're calculating stack addresses to see how the stack was corrupted. For now, let's move onto the first instruction:
116a: 55 push %rbp
This is the first of two instructions for storing the frame pointer for this function call. We decrement %rsp
by 8 (thus "allocating" 8 bytes of local storage on the stack), and store the value of %rbp
for the calling function in that 8 byte slot. Later on, when we return from the function call, we can recover the value of %rbp
by reading this 8 byte address on the stack. After this instruction, %rsp == %rsp_{init} - 0x8
.
116b: 48 89 e5 mov %rsp,%rbp
Now that we've stored the previous value of %rbp
on the stack, we can store the address of that previous %rbp
entry by copying %rsp
into %rbp
. Recovering the previous value of %rbp
is therefore simply a matter of dereferencing the 8-byte stack address stored in %rbp
.
116e: 48 83 ec 18 sub $0x18,%rsp
We're now decrementing %rsp
by 0x18
bytes, which in effect allocates another 0x18
bytes of stack space for us to use for the rest of the function. At this point, %rsp
is now 0x8 + 0x18 == 0x20
bytes below %rsp_{init}
.
1172: 48 89 7d e8 mov %rdi,-0x18(%rbp)
Recall that according to the x86-64 calling convention, %rdi
contains the first argument of a function. This is 56
for us, so this instruction is storing the value 56
at the address 0x18
bytes below %rbp
. Or, in other words, we're storing 56
at the address currently stored in %rsp
.
1176: 48 8b 45 e8 mov -0x18(%rbp),%rax
In the previous instruction, we stored the argument of the function, 56
, on the stack at -0x18(%rbp)
. This instruction therefore copies the value 56
into the %rax
register.
117a: 48 83 c0 08 add $0x8,%rax
The function we're analyzing is called add_eight_pointer()
, and this instruction is where we "add eight". This adjusts the value of %rax
from 56
to 64
. Recall from the last post that, according to x86-64 calling conventions, %rax
is the register that contains the return value of the function call. 64 == 56 + 8
is the correct sum, but because we're returning a pointer (i.e. a variable containing an address) from this function, this will not be the final value we store in %rax
. At some point later in the function, we'll instead store an address in %rax
that points to the sum we computed here.
117e: 48 89 45 f0 mov %rax,-0x10(%rbp)
And voila, the very next instruction is where we store that sum on the stack. We're writing the total sum, 64
, into the stack address that's located at %rbp - 0x10 == %rsp + 0x8
. This means that the final sum whose address we're returning to the caller is located at %rsp_{init} - 0x18
. Let’s call this value %rsp_{sum}
.
1182: 48 8d 45 f0 lea -0x10(%rbp),%rax
We just stored the computed sum on the stack, and we need to return that stack address to the caller by putting it in %rax
. This next instruction, mnemonically called "load effective address", accomplishes this. This reads the address in %rbp
, subtracts 0x10
, and then stores that final computed address in %rax
. This is the %rsp_{sum}
value that we'll eventually return to the caller. When we dereference sum1_ptr
and store the result in sum1_before
and sum1_after
, this is the address we're dereferencing (by the way, "dereference" means, "look up the value stored at that address").
1186: 48 89 45 f8 mov %rax,-0x8(%rbp)
118a: 48 8b 45 f8 mov -0x8(%rbp),%rax
These are completely unnecessary instructions that the compiler emits simply because we told gcc
that it should compile our program without any optimizations via the -O0
flag. We can just ignore these. The end effect is the same. %rax
holds the address of the sum of 64
on the stack at address %rsp_{sum}
.
118e: c9 leave
This restores %rbp
to contain the value that we stashed on the stack in the first instruction. We adjust %rsp
to point to %rbp
, then read the value stored on the stack at %rsp
, write it to %rsp
, and increment %rsp
by 8 bytes. At this point, %rsp == %rsp_{init}
. Let's hope that no instructions use the stack from here on out, or the value of the sum computed in add_eight_pointer()
, stored at %rsp_{sum}
, may be corrupted (foreshadowing!).
118f: c3 ret
This instruction does a popq
of the address of the instruction in main()
following the call
to add_eight_pointer()
from the stack, and updates %rip
to contain that next instruction. Recall that a popq
increments %rsp
by 8 bytes, so at this point, %rsp
points to %rsp_{init} + 0x8
.
After we return from add_eight_pointer()
, main()
does a couple of things:
- It first stashes the value stored in
%rax
, i.e. the pointer to the sum fromadd_eight_pointer()
, on the stack. All of these variables are locally scoped, so this is expected. Earlier inmain()
, before we issued acall
instruction foradd_eight_pointer()
, we decremented%rsp
so thatmain()
itself could also have some stack storage space.main()
stores the pointer returned byadd_eight_pointer()
, which contains the address where our sum is stored, in part of that stack space that it allocated whenmain()
was first called. - Next, it dereferences the address stored in the pointer to read the sum that was computed in
add_eight_pointer()
. It then stores this value elsewhere on the stack. Note that this stack allocation corresponds to the local variablesum1_before
. Because we haven't corrupted the stack yet by callingadd_eight()
,sum1_before
contains the correct sum value 64. main()
then loads the value0
into%rdi
(as it's the first argument toadd_eight()
), and then issues anothercall
instruction to invokeadd_eight()
. This call instruction decrements%rsp
by 8 bytes, records the address of the next instruction inmain()
, and then updates%rip
to the first instruction inadd_eight()
. At this point,%rsp == %rsp_{init}
, and we're about to executeadd_eight()
. Let's take a look at the assembly output foradd_eight()
:
0000000000001139 <add_eight>:
1139: 55 push %rbp
113a: 48 89 e5 mov %rsp,%rbp
113d: 48 83 ec 28 sub $0x28,%rsp
1141: 48 89 7d d8 mov %rdi,-0x28(%rbp)
1145: 48 c7 45 f8 08 00 00 movq $0x8,-0x8(%rbp)
114c: 00
114d: 48 8b 45 d8 mov -0x28(%rbp),%rax
1151: 48 89 45 f0 mov %rax,-0x10(%rbp)
1155: 48 8b 55 f8 mov -0x8(%rbp),%rdx
1159: 48 8b 45 f0 mov -0x10(%rbp),%rax
115d: 48 01 d0 add %rdx,%rax
1160: 48 89 45 e8 mov %rax,-0x18(%rbp)
1164: 48 8b 45 e8 mov -0x18(%rbp),%rax
1168: c9 leave
1169: c3 ret
Going over the instructions, we have:
1139: 55 push %rbp
113a: 48 89 e5 mov %rsp,%rbp
This is the same as the preamble for add_eight_pointer()
. We're storing the frame pointer for add_eight()
's stack frame. Before these instructions, %rsp == %rsp_{init}
. After these instructions, %rsp == %rbp == %rsp_{init} - 0x8
. Recall that the address of our sum from add_eight_pointer()
is at %rsp_{sum} == %rsp_{init} - 0x18
, so at this point, %rsp_{sum} == %rbp - 0x10
. Let’s keep an eye out for any writes to -0x10(%rbp)
!
113d: 48 83 ec 28 sub $0x28,%rsp
We're now subtracting an additional 0x28
bytes from %rsp
. This brings %rsp
down to %rsp_{init} - 0x30
, and effectively allocates 0x28
bytes in the stack for local allocations.
1141: 48 89 7d d8 mov %rdi,-0x28(%rbp)
%rdi
contains the argument to the function, which was 0. So we're storing the 8-byte value 0 at %rbp - 0x28
. This is the same as %rsp_{sum} - 0x10
, so we haven't corrupted the value of sum from add_eight_pointer()
quite yet. Let's keep going:
1145: 48 c7 45 f8 08 00 00 movq $0x8,-0x8(%rbp)
add_eight()
adds 8 to whatever parameter was passed by the user. This highly unoptimized compiler output is just storing the value 8 on the stack, so it can later be stored in a register, and used in an add
instruction. This is an example of the compiler being extremely non-optimal actually being confusing because of how pointless it is. Anyways, moving on, we've now stored the value 8 at %rsp_{sum} + 0x8
. We're getting close, but the value of the sum from add_eight_pointer()
on the stack is still not yet corrupted.
114c: 00
114d: 48 8b 45 d8 mov -0x28(%rbp),%rax
We stored %rdi
, the argument to the function, at this address on the stack two instructions above. This instruction therefore moves the value 0 into %rax
.
1151: 48 89 45 f0 mov %rax,-0x10(%rbp)
Ding, ding, ding! This stores the value 0 (contained in %rax
), into the address %rbp - 0x10
. But recall that %rsp_{sum} == %rbp - 0x10
. Voila! We’ve found our stack corruption. Now, next time we deference sum1_ptr
, it will return 0 instead of 64.
I'll let you walk through the remainder of the instructions on your own, as we've gone over all of these before. Note, however, that it doesn’t really matter what these instructions do. No matter what, the value of sum1_ptr
will always be corrupted. It was overwritten, and can never again be retrieved. This program is unequivocally broken.
1155: 48 8b 55 f8 mov -0x8(%rbp),%rdx
1159: 48 8b 45 f0 mov -0x10(%rbp),%rax
115d: 48 01 d0 add %rdx,%rax
1160: 48 89 45 e8 mov %rax,-0x18(%rbp)
1164: 48 8b 45 e8 mov -0x18(%rbp),%rax
1168: c9 leave
1169: c3 ret
After we've returned from add_eight()
, main()
again dereferences sum1_ptr
, and this time stores it into sum1_after
. We already know that the value on the stack at %rsp_{sum}
is 0, so sum1_after
will have the value 0, despite our best intentions.
Just to recap: the value in sum1_ptr
is a stack address, and as a thread executes, it uses that stack only for storage that is visible to the current execution scope. Once we'd returned from add_eight_pointer()
, it was completely unsafe and incorrect to trust the address stored in sum1_ptr
, because any other function call in a separate scope could and did overwrite the value.
As a side note: what do you think would happen if we got rid of sum1_after
and instead just dereferenced sum1_ptr
directly in our second printf call? The answer? Who knows – it depends completely on what the printf()
implementation does.
One more thing I want to note. The example we gave above showed how it can be unsafe to use stack pointers once you've left the context where the stack pointer was allocated. That doesn't mean, however, that it's never safe to use stack pointers. You just have to make sure that the pointer refers to an address that is within scope. Take a look at this example which shows how to use stack pointers safely:
// local_scope_safe.c
#include <stdio.h>
static void add_eight_pointer(long *arg)
{
*arg += 8;
}
int main() {
long arg1 = 0, arg2 = 48, sum;
add_eight_pointer(&arg1);
add_eight_pointer(&arg2);
sum = arg1 + arg2;
printf("arg1: %ld, arg2: %ld:, sum %ld\n", arg1, arg2, sum);
return 0;
}
If we run this program, we get the following output as expected:
$ ./local_scope_safe
arg1: 8, arg2: 56:, sum 64
Why is this program safe? We're passing pointers to arg1
and arg2
to add_eight_pointer()
, but this time they're never being overwritten. I encourage you to think about why it's safe on your own, and let me know if you need any help in the comments.
Non-scope local memory can be dynamically allocated using the heap
Ok, so we now understand how stack allocation works, and why it's completely unsafe for us to reference an address to stack-allocated memory once we've returned from the scope where that stack allocation was made. That begs the completely reasonable question: how can we make allocations that persist across scopes? In other words, how can we safely return a pointer / address from a function that is safe to dereference elsewhere in the program?
The answer is that we use the third major type of memory in a process, heap memory:
Dynamically allocated and managed heap memory is memory that is allocated dynamically by a process at runtime, and is stored in a segment of its address space called the heap.
The heap is a single readable, writable, and resizable region of memory in a process. The size of the heap changes throughout the runtime of a program according to how much memory it needs, which means that it can be used to make allocations whose size is not known at compilation time. For example, let's go back to our trusty lucky_number.c
program:
// lucky_number.c
#include <stdio.h>
static int lucky_number = 7;
int main() {
printf("My lucky number is %d!\n", lucky_number);
lucky_number = 2;
printf("My lucky number is now %d!\n", lucky_number);
return 0;
}
We know that this program has exactly one lucky number, so we statically allocate it via the static int lucky_number
variable when the program is linked and loaded. We could add more lucky numbers if we wanted to. If we had two lucky numbers, the program might look like this:
// lucky_numbers.c
#include <stdio.h>
static int lucky_number[2] = {7, 11};
int main() {
printf("My lucky numbers are %d and %d!\n",
lucky_number[0], lucky_number[1]);
lucky_number[0] = 2;
lucky_number[0] = 777;
printf("My lucky numbers are now %d and %d!\n",
lucky_number[0], lucky_number[1]);
return 0;
}
We knew that we had exactly two lucky numbers, so we could write the program to statically allocate two integers at compile time. But what if we didn't know how many lucky numbers we'd need until the program ran? It's not possible for us to statically allocate memory for that without potentially being wasteful by allocating more than we need, or not allocating enough. We also probably can't use stack allocations for this. If someone wants to store 100,000,000 lucky numbers, we could end up causing a stack overflow if we tried to stack-allocate an array of 100,000,000 32 bit integers.
The heap solves this problem for us by allowing us to make dynamic allocations. Let's take a look at an example program that uses dynamic allocations to solve our problem of arbitrarily large lucky numbers:
// lucky_number_dynamic.c
#include <stdio.h>
#include <stdlib.h>
int main() {
int i, num_lucky_numbers, *lucky_numbers;
/*
* Allow the user to specify how many lucky numbers
* we should collect by entering it on the command line.
*/
printf("How many lucky numbers are we collecting? ");
fflush(stdout);
scanf("%d", &num_lucky_numbers);
if (num_lucky_numbers <= 0) {
fprintf(stderr, "Please specify a positive number\n");
return -1;
}
// Freed below.
lucky_numbers = malloc(num_lucky_numbers * sizeof(int));
if (lucky_numbers == NULL) {
fprintf(stderr, "Failed to allocate %d lucky numbers\n",
num_lucky_numbers);
return -1;
}
printf("\n");
for (i = 0; i < num_lucky_numbers; i++) {
/* Let the user specify each lucky number, one by one. */
printf("Specify lucky number %d: ", i + 1);
fflush(stdout);
scanf("%d", &lucky_numbers[i]);
}
printf("\n");
printf("Printing lucky numbers!\n");
for (i = 0; i < num_lucky_numbers; i++)
printf("Lucky number %d: %d\n", i + 1, lucky_numbers[i]);
// Allocated above.
free(lucky_numbers);
return 0;
}
The program is pretty simple:
- Ask the user to specify the number of lucky numbers we should collect.
- Verify that the user provided a positive number, and then perform a dynamic heap allocation of an array of that many integers. This allocation was made using the
malloc()
libc library function. - Check that we were able to perform the allocation by verifying that the pointer that was returned to us is non-NULL.
malloc()
might ask the operating system to give the process more memory, so it's possible that the operating system could deny our request if none is available. Any good system software needs to always check for these corner cases, or you will inevitably end up with bugs that are annoying and difficult to find. - Ask the user to specify all of the lucky numbers in a loop. Store the value for each lucky number in the dynamically allocated array.
- Iterate over all of the lucky numbers that the user provided, and print them out.
- Release the dynamic allocation by calling
free()
on the pointer that was returned to us bymalloc()
. Note that because this pointer does not point to locally-scoped memory, we are responsible for manually freeing it. This also means that we could pass the pointer to other functions if we wanted to. Scope is not important here. The pointer is global, and anyone in the entire process can read it, as long as the memory has not been freed.
This is why the C language is often called a "manage your own memory" (MYOM) language. If you ask for dynamically allocated memory, you are responsible for freeing it exactly once to ensure that it does not leak (i.e. is not wasted for the runtime of the program). You're also responsible for ensuring that nobody tries to read the memory after it was freed, that it was only freed once, etc. As one might imagine, this is an endless source of bugs, and people are constantly writing tools such as sanitizers to try and make it easier to finds such memory corruption bugs.
Let's see some example output from our program:
$ ./lucky_number_dynamic
How many lucky numbers are we collecting? 3
Specify lucky number 1: 2
Specify lucky number 2: 11
Specify lucky number 3: 777
Printing lucky numbers!
Lucky number 1: 2
Lucky number 2: 11
Lucky number 3: 777
Great! Our program works exactly as we expected thanks to dynamically allocated memory. Try playing with the program with various sizes of input. Experimenting like this is often a great way to learn. For example, say you tried this:
$ ./lucky_number_dynamic
How many lucky numbers are we collecting? 9999999999999999999999999999999999999999999
Please specify a positive number
For some reason, the program thinks we're specifying a negative number even though what we specified is positive. This is a phenomenon called integer overflow, where we can’t represent the number in the number of bits available to us. This is another common source of bugs in system software, so it's good to be aware of.
Moving on, let's talk a little bit more about what exactly the heap really is.
The heap is a dynamically sized region of writable memory
Recall from What is memory? (part 2): The anatomy of a process, that a .data segment in a process is a readable and writable region of memory. Such segments are statically sized and allocated by the dynamic linker / loader according to the information contained in the process's ELF file.
Conceptually, the heap is a single region of memory which functions as a dynamically sized .data segment for the process. Unlike a thread's stack, which starts at a high address and grows downwards as the thread executes and calls functions, the heap starts at a base address, and grows upwards as more allocations are made and more memory is required from the operating system to satisfy allocation requests. Here's a visual representation:
As you can see, the thread stacks are close to the top of a process's address space, and grow down as a thread executes and requires more stack space. There can be more than two threads, but only two are shown for brevity. As more threads are spawned in the process, more and more memory would be used from the top of the address space. As a fun side note, notice that the kernel is actually located at the top of the process's address space. We'll talk more about this in another post, but for now, just marvel at the beauty of how all of this works together.
On the other side of the diagram, the statically allocated memory resides closer to the bottom of the process's address space, with the heap being located above that. The heap grows upwards towards the center of the process's address space as more and more dynamic allocations are made. This arrangement makes sense: the heap must be located above all of the statically sized segments in the process's address space, or it wouldn't have space to dynamically grow.
This clarifies what direction the heap grows in, but how and when does this growth actually occur? Just like with a stack, we don't want to just statically reserve the entire middle region of this diagram, as we might not need all of it. If we did that, we'd only have enough memory for one or two processes on the entire system. On the other hand, we don't want to reserve too small of a portion, or we may run out of heap memory when we could have actually reserved more. Furthermore, some processes may only need a tiny bit of heap memory, while others may need a lot more. To address this, processes have what is called a program break, which specifies the current address at the top of the process's heap. In other words, the currently allocated portion of the heap is defined as the region between its base address, and its program break. If the bottom of a process's heap is at address 0x10000
, and the program break is at address 0x11000
, the process currently has 0x11000 - 0x10000 == 0x1000
bytes of memory available in the heap.
So how is the program break (and thus the size of the heap) adjusted? Threads in a process may dynamically adjust the location of the program break using the brk()
and sbrk()
system calls. As the documentation page for those system calls explains, brk()
sets the hard-coded address of the program break, and sbrk()
specifies the number of bytes that the program break should be incremented by. sbrk()
returns a pointer to the previous value of the program break, so an allocation / free could be implemented as follows:
void *prev_break = sbrk(0x1000);
brk(prev_break);
To drive home the point, let's take another look at our example lucky_number_dynamic.c
program, but update it so that we manage the heap on our own rather than relying on malloc()
. Note that unlike our two-line code snippet above, we actually do our program error-case checking in this program:
// lucky_number_sbrk.c
#include <stdio.h>
#include <unistd.h>
int main() {
int i, num_lucky_numbers, *lucky_numbers, err;
printf("How many lucky numbers are we collecting? ");
fflush(stdout);
scanf("%d", &num_lucky_numbers);
if (num_lucky_numbers <= 0) {
fprintf(stderr, "Please specify a positive number\n");
return -1;
}
// Freed below.
lucky_numbers = sbrk(num_lucky_numbers * sizeof(int));
if (lucky_numbers == (void *)-1) {
fprintf(stderr, "Failed to increase the size of the heap\n");
return -1;
}
printf("\n");
for (i = 0; i < num_lucky_numbers; i++) {
printf("Specify lucky number %d: ", i + 1);
fflush(stdout);
scanf("%d", &lucky_numbers[i]);
}
printf("\n");
printf("Printing lucky numbers!\n");
for (i = 0; i < num_lucky_numbers; i++)
printf("Lucky number %d: %d\n", i + 1, lucky_numbers[i]);
// Allocated above.
err = brk(lucky_numbers);
if (err)
fprintf(stderr, "Failed to reset the program break!\n");
return 0;
}
If we compile and run our program, the output is as expected:
$ ./lucky_number_sbrk
How many lucky numbers are we collecting? 2
Specify lucky number 1: 10
Specify lucky number 2: 20
Printing lucky numbers!
Lucky number 1: 10
Lucky number 2: 20
So why use malloc()
at all? Why not just use sbrk()
ourselves rather than using these other APIs that manage the heap on your behalf? There are a number of reasons, but they mostly all boil down to: memory allocation is a really hard problem. We'll touch on these problems briefly, and then end the article.
Memory allocators manage the heap on behalf of a program
So why exactly is memory allocation hard? It seems like all we need to do is increment sbrk()
when necessary, use that memory, and then free it with brk()
. Well, that might be true for our little example program, but it's most definitely not true for real world programs. Here are a few things to keep in mind:
- Memory is often allocated concurrently from multiple threads. We don't want one thread to call
brk()
to reset the program break while another thread is still using the memory they had allocated withsbrk()
. That would result in a crash, or as it's more commonly known, a use after free. - Dynamic allocations are very often not one-time-use. You're typically doing something many, many times in a row. Consider a web server as an example. Every time it gets a request, it probably has to do a dynamic allocation. We don't want to have to ask the kernel for memory every single time we need it, as that's slow and can cause latency spikes. It would be better to ask for some memory one time, and then reuse it for multiple separate allocations.
- A common problem encountered when managing memory is dealing with memory fragmentation. The basic idea here is that the amount of memory allocated from the kernel with
brk()
orsbrk()
may not be used optimally. Consider, for example, if you allocated0x40
bytes for an allocation, and then when you no longer needed it, decided that you'd reuse the same pointer to fulfill an0x18
byte allocation in another context. You could certainly do this, but then you're leaving0x28
bytes unused from that original0x40
sized chunk. As memory is allocated and freed, this fragmentation can get worse and worse, with more and more memory going to waste. One of the goals of a memory allocator is trying to minimize this fragmentation.
These are just a few of the many, many challenges that are encountered when trying to efficiently use dynamically allocated memory. There's a lot more to say about this, but this article has already gotten quite long so I'll stop it here. In the next post, we'll learn about malloc()
, the most common, standard memory allocator provided by most libc implementations. By the end of it, we'll know enough to write and use our own allocators!
For now, let’s do a quick recap of this post:
- Static allocations are allocations in a process's address space which are made by the dynamic linker / loader before a process starts running.
- Stack allocations are allocations made by a thread on its stack to store local state. Stack allocations are only safe to reference within the scope that they were allocated, because anything on a stack can be overwritten later after execution has left that scope.
- Dynamic allocations are allocations made using a process's heap. The heap is a readable and writable region of memory in a process which can be dynamically resized as the process runs, and which allows the process to share pointers across multiple scopes throughout its runtime.
The stack of a process grows downwards as a thread executes and calls functions. As more threads are spawned in a process, lower and lower regions of a process's address space are used to store more and more thread stacks. The heap, conversely, is a single readable and writable region of memory in a process's address space that grows upwards as more and more dynamically allocated memory is required.
The size of the heap is controlled by the process's program break. The entire region of memory between the base of the heap and the program break is available for use by the process. The program break can be adjusted using the brk()
and sbrk()
system calls, though this is almost never done in practice. Rather, programs will use memory allocators that manage access to the heap on their behalf, and which provide APIs such as malloc()
to provide an ergonomic and simple model for allocating memory.
As always, thanks for reading! Please feel free to leave questions in the comments if anything is unclear, and have a great day.
Comments
Sign in or become a Byte Lab member to join the conversation.