Learning-based memory allocation for C++ server workloads summary

3 min readOct 13, 2020


# introduction
c++ servers have memory footprints that vary over time, causing heap fragmentation. c++ memory managers cannot move objects
To handle multiple requests simultaneously, current os tend to use huge pages
The paper tries to solve the problem that reducing fragmentation with huge pages

get the prediction of lifetime of the objects based on the object generating context

But we also want to avoid profiling in deployment because its overhead is too high compared to normal memory allocation.

# background

feeding data consists of allocation’s full stack trace, a timestamp of the allocation, object size, alignment and the stack and processer ID and deallocation timestamp.

data preprocessing
classify data into 7 classes 10ms, 100ms, 1s, infinite

learning model
LSTM is used in natural language processing field, capturing long term dependencies. stack trace structure can be viewed as a sequence of tokens.
each stack frame token is represented as looking up embedding vector that is stored in a matrix A. A will map tokens with similar meanings close together.

#lifetime aware allocator design

  1. assign each active huge page with lifetime classes, seperated by at least an order of magnitude. So we can give a prediction of the lifetime of allocation object.
  2. huge page management.
    each page has 3 states: open, active, free. The first allocation into a huge page makes it open and determines its LC and the allocation of the following object on this page will have the same LC.
    each page is divided into multiple blocks 8kb size.
  3. limiting fragmentation by recycling blocks.
    prefers to use free blocks from a longer-lived active huge page. because if the predicator is accurate or overestimates the lifetime class, the program with high probability will free shorter-lived objects onrecycled blocks before it frees residual blocks with the same LC as the huge page. the allocator may reuse these blocks many times while the longer lived objects on the huge page are in use, reducing the maximum heap footprint. If the predicator underestimates lifetime, the objects will have more time to be freed.
  4. tolerating prediction errors.
    llama tolerates mispredictions by tracking block and huge page lifetime using deadlines. It promotes huge pages with under-predicted objects lifetime to the next longer LC and huge pages with over-predicted objects to the next shorter lifetime class.

# low latency and accurate prediction

traditional malloc is quick ns. running neural network is slow microseconds. cache predictions. At each allocation, we compute a hash of the return address, stack and object size, and index a thread-local hashmap. temporal locality. lookup will hit L1 cache.

my thoughts

  1. any good points
    shed some light on how to integrate ML method into the memory allocation algorithm to reduce memory fragmentation. How we can tolerate misprediction.
    based on the frequency of access of the KV pair in the LSM tree. when we do KV substitution in the merging process. we can use neural network to help us determine which one will be move down to the next level? naive and make no sense.
  2. any shortages
    we can use some other ML methods to do the prediction. like GCN to capture graph information.

we can dynamically allocate memory to NVM or DRAM based on the prediction of frequency of usage of the memory. that doesn’t make sense.




Software engineer at microsoft, do research on computer system software