Custom Memory Allocator in C

This project is my introductory dive into manual memory management, something I had no prior experience with. Having previously worked with languages that use garbage collectors, managing memory and working with pointers were entirely new concepts for me. This project showcases my attempt to understand these concepts and implement a memory allocator inspired by C's built-in allocator.

The allocator is designed to handle dynamic memory requests from the program and manage memory efficiently. By tracking memory blocks through a linked list, the allocator can dynamically allocate, free, and reallocate memory while minimizing fragmentation.

For more information and to explore the codebase, please refer to the GitHub repository.

Project Overview

The memory allocator is designed to manage a single large pool of memory, dividing it into blocks as needed by the program. It dynamically allocates, frees, and reallocates memory by implementing C's well-known functions: malloc, free and realloc. To organize the memory, a linked list is used to track both free and used blocks.

In the subsequent sections we will inspect the key components of the allocator, including how it utilizes the memory for organizing metadata, how linked lists are used to track memory blocks, and interesting code snippets from the codebase.

Key Components

The project is structured around several core components.

Allocator Object (Allocator): The allocator source file contains key function implementations, such as allocator_malloc, allocator_free, and allocator_realloc, along with important auxiliary functions that ensure the allocator works properly. In create_allocator, there is a single call to C's malloc to obtain the initial memory pool that the allocator will manage. This pool is then divided into two parts: a user pool and a reserved pool.
The user pool contains blocks of allocated data, while the reserved pool holds metadata related to the allocator, such as the linked list, nodes, and member variables. Inspired by the heap-stack relationship, the reserved pool grows downwards, and the user pool grows upwards.
Linked List for Memory Blocks (LinkedList): A linked list is utilized to keep track of memory blocks that are free or in use within the user pool.
During allocator creation, the linked list instantiates a single node, which contains the entire user pool set to free. When the allocator_malloc function is called, depending on the search strategy, the allocator will go through the linked list to find a node with an available memory block. If found, it will split the node into two: one containing the allocated memory block, and the other containing the remaining free memory.
Data Object (MemoryData): Since the node implementation is generic, the payload of a node is a MemoryData object. This object contains all the necessary information about the memory block the node holds, such as the start address of the memory block, its size, and whether it is free or allocated.

Static Variable (current_alloc): To avoid the need to pass a pointer to the allocator object as an argument when using the allocator functions, the static variable current_alloc is used instead. By using the set_allocator and release_allocator functions, any allocator function that requires the allocator object can retrieve it from current_alloc.

Code Overview

In this section, we will highlight essential parts of the code that make up the memory allocator.

`create_allocator.c` Script

The create_allocator function is responsible for initializing the allocator, allocating the heap, and setting up the initial metadata. To set up the metadata, we must be diligent as we deal with pointer arithmetic. As previously mentioned, the reserved pool grows downward, starting from the highest memory. This is reflected in the following pseudocode of the create_allocator function. When we instantiate an object in the reserved pool, we first have to make room for it by increasing the size of the reserved pool.

The align_size function is simply turning the integer value into a multiple of eight to accommodate best practices in 64-bit systems, which organize memory in blocks of eight.

Allocator* create_allocator(size_t heap_size) {
    // Realign to a factor of 8 for memory efficency
    heap_size = align_size(heap_size);

    // Utilize the built-in C malloc to acquire the managed heap
    char* heap_start = (char*) malloc(heap_size);
    if (heap_start == NULL) {
        // Memory error from malloc()
        return NULL;
    }

    // Initialize Allocator object at the end of the heap
    char* heap_end = heap_start + heap_size;
    Allocator* alloc = (Allocator*) (heap_end - align_size(sizeof(Allocator)));

    // Set Allocator member variables (pseudocode)
    // alloc->member_variable = ...;

    // Increase the reserved pool to accommodate for the LinkedList
    alloc->reserved_pool_border -= align_size(sizeof(LinkedList));
    alloc->reserved_pool_size += align_size(sizeof(LinkedList));

    // Initialize LinkedList object
    LinkedList* list = (LinkedList*) alloc->reserved_pool_border;

    // Set LinkedList member variables (pseudocode)
    // list->head = ...;

    // Set Allocator member variable
    alloc->list = list;

    // Initialize a Node referencing the entire user memory pool
    void* memory_start = heap_start;

    // Increase the reserved pool to accommodate for the MemoryData
    alloc->reserved_pool_border -= align_size(sizeof(MemoryData));
    alloc->reserved_pool_size += align_size(sizeof(MemoryData));

    MemoryData* data = (MemoryData*) alloc->reserved_pool_border;

    // Set MemoryData member variables (pseudocode)
    // data->member_variable = ...;

    // Increase the reserved pool to accommodate for the Node
    alloc->reserved_pool_border -= align_size(sizeof(Node));
    alloc->reserved_pool_size += align_size(sizeof(Node));

    // Create Node containing the MemoryData
    Node* node = (Node*) alloc->reserved_pool_border;

    // Set Node member variables (pseudocode)
    // node->member_variable = ...;

    /*
     * The Node is a generic structure that can carry different types of data.
     * In this line, we set the payload of the Node to the previously created
     * MemoryData, which stores information about the memory block.
     */
    node->data = data;


    // Add the newly created Node to the LinkedList
    add(list, node);

    return alloc;
}

`allocator_malloc.c` Script

The allocator_malloc function can be considered as having two parts. The first is concerned with finding a node that has an available memory block. Multiple search strategies are available, such as first-fit, best-fit, and next-fit. Once a suitable node has been found, we enter the second part. Here, we determine how to utilize the found node. In the implementation, if the block is larger than needed, it is split. The block is divided, allocating the required portion and leaving the remainder as a free block.

The following is the pseudocode for the allocator_malloc function.

void* allocator_malloc(size_t required_size) {
    // Realign to a factor of 8 for memory efficency
    required_size = align_size(required_size);

    // Search for a Node with an available memory block
    Node* available_node = naive_search(required_size);

    if (available_node == NULL) {
        /*
         * No available Node was found, perform memory
         * pool cleansing (reduce fragmentation, remove unused metadata).
         */
        cleanse_user_pool();
        cleanse_reserved_pool();

        // Try again to find an available memory block
        available_node = naive_search(required_size);

        if (available_node == NULL) {
            // The managed heap is full
            return NULL;
        }
    }

    /*
     * A Node has been found. Inspect the Node to see how to split
     * up into allocated memory Node and residual memory Node.
     */

    // Retrieve Node memory block data
    MemoryData* available_data = (MemoryData*) available_node->data;
    size_t node_block_size = available_data->block_size;

    /* 
    * Logic for splitting up the Node into allocated memory
    * and residual memory Nodes (pseudocode).
    */
    // ...

    // Return the pointer to the start of the allocated memory
    return available_data->memory_start;
}

`allocator_free.c` Script

Similar to the allocator_malloc function, the allocator_free function consists of two parts. Firstly, it verifies that the input pointer ptr was previously allocated by the allocator. Thereafter, if a corresponding node is found, the function marks it as free and tries to merge the node with adjacent nodes that are also free in order to reduce memory fragmentation.

The following is the pseudocode for the allocator_malloc function.

void allocator_free(void* ptr) {
    if (current_alloc == NULL) {
        // There is no Allocator object to process
        return;
    }

    LinkedList* list = current_alloc->list;
    LinkedListIterator iter;
    iter.current = get_head(list);

    // Search for the Node corresponding to 'ptr' in the LinkedList
    Node* matched_node = NULL;
    while (has_next(&iter)) {
        Node* node = next(&iter);
        MemoryData* data = (MemoryData*) node->data;

        if (data->memory_start == ptr && !data->is_free) {
            matched_node = node;  // Node found
            break;
        }
    }

    if (!matched_node) {
        // The Allocator did not allocate this pointer
        return;
    }

    // Mark the memory block as free
    MemoryData* matched_data = (MemoryData*) matched_node->data;
    matched_data->is_free = true;

    // Reset the iterator to look for adjacent free Nodes
    iter.current = get_head(list);
    while (has_next(&iter)) {
        Node* node = next(&iter);

        // Skip if its the matched Node itself
        if (node->id == matched_node->id) { continue; }

        MemoryData* data = (MemoryData*) node->data;

        /*
         * Check if the Node is adjacent to the freed memory block,
         * and if it is free, merge the Nodes (pseudocode).
         */
        if (data->is_free && is_adjacent(matched_data, data)) {
            merge_meta_data_nodes(list, matched_node, node);

            // Update the matched Node and memory boundaries (pseudocode)
            // matched_memory_start = ...;
            // matched_memory_end = ...;
        }
    }
}

Conclusion

This project serves as a starting point for exploring concepts such as manual memory management, pointer manipulation, and the structure of a simple memory allocator. In creating efficient allocators, the process can be viewed as a game of placement: how can we place allocated data most efficiently, with the constraint that once placed, it cannot be moved, only freed when no longer needed. Memory allocators focus on two important aspects: designing good search strategies, with the simplest being the naive first-fit search, and creating a solid allocator design that allows for efficient defragmentation. In this project, we touched on both aspects, making it a valuable entry point into memory allocation.

Future improvements could include developing a more specialized linked list and node structure, rather than relying on generic implementations. We could also use system calls like mmap, which would allow us to completely detach from C's built-in allocator, giving us full control. Furthermore, implementing segregated linked lists, where free nodes and nodes in use are kept in separate lists could improve efficiency.