Skip to main content

A template metaprogrammed memory pool allocator

Submitted by lorien on
Forum

As I said elsewhere I've been trying to make heap allocation almost as fast as stack allocation for a realtime scripting language I'm making. Here is the allocator code.

I was going to make this a tutorial, but instead I'm going to try and get something about this published in somewhere like the C++ Users Journal.

For some info on metaprogramming have a look at http://osl.iu.edu/~tveldhui/papers/Template-Metaprograms/meta-art.html and/or do some googling. There is a fair amount of info out there for such an exotic technique, and there are even some libraries like the boost mpl.

Enjoy [:)]

[code]
#ifndef __METAPOOL_HH__
#define __METAPOOL_HH__

/********************************************************************************
** A cross-platform template metaprogrammed memory pool allocation system. **
** **
** Copyright (C) 2003-2005 Lorien Dunn and The Applied Computing Research **
** Institute, La Trobe University. **
** **
** Released at http://sumea.com.au/forum/topic.asp?TOPIC_ID=2716 under **
** the Boost license (repreoduced below) on the 7th of Feb 2005. The API for **
** normal use is at the end of this file. **
** **
** In addition to the boost license I ask that bug fixes, optimisations **
** and other improvements be posted on sumea at the URL given above. **
** And if you use this and meet me perhaps you could buy me a beer :) **
** **
** This system manages a variable number of pools, each of which caters for **
** a different size allocation. The metaprogramming is used to find which **
** pool bests fits a particular size allocation request at compile time. **
** A runtime search is also provided (for when a size can't be known at **
** compile time). **
** **
** Pool chunkSizes are arranged like this: **
** 16 24 32 48 64 96 128 192 or **
** 24 32 48 64 96 128 192 256 **
** depending on whether the value defined by METAPOOL_MIN_CHUNK_SIZE **
** is a power of two or a power of two + 1/2 times the same power of two. **
** **
** Compared to previous versions this release vastly improves memory **
** wastage and scalability. METAPOOL_HEAP_SIZE_IN_K can be set to 4 **
** megs and METAPOOL_MAX_CHUNK_SIZE to 2 megs without problem. There is a **
** small runtime penalty for this scalabilty, and synthetic benchmarks that **
** allocate tons of memory probably won't perform hugely better than malloc. **
** It will perform much better in real programs. **
** **
** It should be 64-bit clean, providing you change one macro, but I've not **
** been able to verify this yet. It will not work on 16-bit CPUs. **
** **
** Note pools with a chunkSizes that are multiples of 16 bytes will return **
** 16 byte allocations, making them suitable for use with SSE vector **
** operations. **
** **
** Normal use API appears at the end of this file. It is very similar to **
** the c standard library: **
** **
** void* alloc(); **
** void* alloc(); **
** void* alloc(size_t size); **
** void* realloc(void* mem); **
** void* realloc(void* mem, size_t newSize); **
** void free(void* mem); **
** **
** The Allocator classes from the previous versions are gone, but a base **
** class allocator for use with the "curiously recursive template pattern" **
** is provided, as are preprocessor macros. **
** **
** Forget about using this with MSVC 6.0 (even with SP 5). 6.0 is way too **
** broken (particularly in templates), and the headers are missing the memory **
** alignment functions. AFAIK the only version of MSVC this code will compile **
** with is .net 2003 because it requires partial template specialisation. **
** If you are using MinGW make sure it is >= 3.4.0 because the memory **
** alignment functions are missing from the headers in 3.3.x **
** Works fine with Intel C++ 7.1 on win32 with MSVC6 headers, but have to hak **
** alignedAlloc and alignedFree to use normal malloc/free :( **
** I have no idea if it works with Borland compilers, and to be honest I **
** don't care- I've had nothing but trouble with them. **
** **
********************************************************************************/

/********************************************************************************
** Boost Software License - Version 1.0 - August 17th, 2003 **
** **
** Permission is hereby granted, free of charge, to any person or organization **
** obtaining a copy of the software and accompanying documentation covered by **
** this license (the "Software") to use, reproduce, display, distribute, **
** execute, and transmit the Software, and to prepare derivative works of the **
** Software, and to permit third-parties to whom the Software is furnished to **
** do so, all subject to the following: **
** **
** The copyright notices in the Software and this entire statement, including **
** the above license grant, this restriction and the following disclaimer, **
** must be included in all copies of the Software, in whole or in part, and **
** all derivative works of the Software, unless such copies or derivative **
** works are solely in the form of machine-executable object code generated by **
** a source language processor. **
** **
** THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR **
** IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, **
** FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT **
** SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE **
** FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, **
** ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER **
** DEALINGS IN THE SOFTWARE. **
********************************************************************************/

#include
#include //for std::bad_alloc
//not using headers because they causes
//compatibility issues when including c header files.
#include //for memcpy,
#include

//Re-implimentation of the Loki library static assert.
//Makes a compile error if the condition is false.
template struct CTAssert;
template <> struct CTAssert {};
#define CTASSERT(expr,msg) CTAssert<((expr) != false)> ERROR_##msg;

//All memory will be aligned on 16 byte boundaries. Do not change.
#define BYTE_ALIGNMENT 16

#ifdef METAPOOL_USE_MALLOC
#warning "METAPOOL_USE_MALLOC overridden"
#else
#define METAPOOL_USE_MALLOC
#endif //METAPOOL_USE_MALLOC

//change this to 0 on a 64 bit system (untested atm). I wish
//sizeof would work in the preprocessor!
#ifdef THIRTY_TWO_BIT
#warning "THIRTY_TWO_BIT overridden"
#else
#define THIRTY_TWO_BIT 1
#endif //THIRTY_TWO_BIT

//You might want to change these according to your needs.
//Chunk sizes should be be powers of two or memory alignment
//will likely be completely wrong, and smaller than 16 bytes
//on 32-bit systems and 32 bytes on 64-bit systems
//will mean more space wasted than allocated for the Pools
//with the smaller chunkSizes.
//add warnings if overidden
#ifdef METAPOOL_MAX_SPARE_HEAPS
#warning "METAPOOL_MAX_SPARE_HEAPS overridden"
#else
#define METAPOOL_MAX_SPARE_HEAPS 20
#endif //METAPOOL_MAX_SPARE_HEAPS

#if THIRTY_TWO_BIT == 1
#ifdef METAPOOL_HEAP_SIZE_IN_K
#warning "METAPOOL_HEAP_SIZE_IN_K overridden"
#else
#define METAPOOL_HEAP_SIZE_IN_K 1024
#endif //METAPOOL_HEAP_SIZE_IN_K

#ifdef METAPOOL_MIN_CHUNK_SIZE
#warning "METAPOOL_MIN_CHUNK_SIZE overridden"
#else
#define METAPOOL_MIN_CHUNK_SIZE 16
#endif //METAPOOL_MIN_CHUNK_SIZE
#else //64-bit
#ifdef METAPOOL_HEAP_SIZE_IN_K
#warning "METAPOOL_HEAP_SIZE_IN_K overridden"
#else
#define METAPOOL_HEAP_SIZE_IN_K 2048
#endif //METAPOOL_HEAP_SIZE_IN_K

#ifdef METAPOOL_MIN_CHUNK_SIZE
#warning "METAPOOL_MIN_CHUNK_SIZE overridden"
#else
#define METAPOOL_MIN_CHUNK_SIZE 32
#endif //METAPOOL_MIN_CHUNK_SIZE
#endif //THIRTY_TWO_BIT == 1

#ifdef METAPOOL_MAX_CHUNK_SIZE
#warning "METAPOOL_MIN_CHUNK_SIZE overridden"
#else
#define METAPOOL_MAX_CHUNK_SIZE ((METAPOOL_HEAP_SIZE_IN_K*1024)/2)
#endif //METAPOOL_MAX_CHUNK_SIZE

//do a little preprocessor sanity checking.
//Real power-of-2 and power-of-2 + 1/2-power-of-2 sanity
//checking is done with a template metaprogram.

#if METAPOOL_HEAP_SIZE_IN_K % 2 != 0
#error "METAPOOL_HEAP_SIZE_IN_K must be even"
#endif //METAPOOL_HEAP_SIZE_IN_K % 2 == 0

#if METAPOOL_MAX_CHUNK_SIZE < METAPOOL_MIN_CHUNK_SIZE
#error "METAPOOL_MAX_CHUNK_SIZE < METAPOOL_MIN_CHUNK_SIZE"
#endif //METAPOOL_MAX_CHUNK_SIZE < METAPOOL_MIN_CHUNK_SIZE

#if METAPOOL_MIN_CHUNK_SIZE % 2 != 0
#error "METAPOOL_MIN_CHUNK_SIZE must be even"
#endif //METAPOOL_MIN_CHUNK_SIZE % 2 != 0

#if METAPOOL_MAX_CHUNK_SIZE % 2 != 0
#warning "METAPOOL_MAX_CHUNK_SIZE must be even"
#endif //METAPOOL_MAX_CHUNK_SIZE % 2 != 0

#if METAPOOL_MAX_CHUNK_SIZE / 1024 > METAPOOL_HEAP_SIZE_IN_K
#error "METAPOOL_MAX_CHUNK_SIZE / 1024 > HEAP_SIZE"
#endif //METAPOOL_MAX_CHUNK_SIZE / 1024 > METAPOOL_HEAP_SIZE_IN_K

#if THIRTY_TWO_BIT == 1
#if METAPOOL_MIN_CHUNK_SIZE < 16
#warning "minimum chunk size is less than housekeeping size"
#endif //METAPOOL_MIN_CHUNK_SIZE < 16
#else //64 bit
#if METAPOOL_MIN_CHUNK_SIZE < 32
#warning "minimum chunk size is less than housekeeping size"
#endif //METAPOOL_MIN_CHUNK_SIZE < 32
#endif //THIRTY_TWO_BIT

#define MAX_CHUNKS_PER_HEAP (METAPOOL_HEAP_SIZE_IN_K/ METAPOOL_MIN_CHUNK_SIZE)

//You should probably get rid of this and use a
//better Singleton class beacuse static storage
//is rather slow to access.
//I plan on using a thread local storage singleton,
//so each thread gets it's own MetaPool. This will
//make everything threadsafe without using any
//mutexes. Providing the same thread that allocates
//a given piece of memory also frees it...
template
class Singleton
{
public:
inline static T* instance()
{
static T instance;
return &instance;
}
};

//Portability functions for allocating memory aligned on a
//particular bytes boundary. This allocation system always
//uses 16-byte alignment.

#ifdef unix
#include //cstdlib doesn't have posix_memalign!

//WARNING: arg order reversed from prev versions
inline void* alignedAlloc(size_t bytes, size_t align=BYTE_ALIGNMENT)
{
void *rtn = NULL;
#ifdef METAPOOL_USE_MALLOC
posix_memalign(&rtn,align,bytes);
#else
rtn = malloc(bytes);
#endif
if(!rtn) throw std::bad_alloc();
return rtn;
}

inline void alignedFree(void *mem)
{
assert(mem);
free(mem);
}

#elif defined WIN32
#include
//WARNING: arg order reversed from prev versions
inline void* alignedAlloc(size_t bytes,size_t align=BYTE_ALIGNMENT)
{
#ifdef METAPOOL_USE_MALLOC
void* rtn = malloc(bytes);
#else
void* rtn = _aligned_malloc(bytes,align);
#endif
if(!rtn) throw std::bad_alloc();
return rtn;
}

inline void alignedFree(void *mem)
{
assert(mem);
#ifdef METAPOOL_USE_MALLOC
free(mem);
#else
_aligned_free(mem);
#endif
}

#else
#error "Either unix or WIN32 was not defined, or you have an unsupported system"
#endif //unix

namespace Memory
{
class MetaPool;

namespace Details
{

class Heap;

////////////////////////////////////////////////////////////////////////
// //
// Struct Node. A header for a chunk and also a linked //
// list node. The list structure eliminates searching for //
// free chunks. //
// //
////////////////////////////////////////////////////////////////////////

struct Node
{
//Holds the actual size of the allocation to speed up realloc
//and test if free
size_t mAllocSize;
//next node in this heap
Node* mNext;
//heap that owns this node (BIG freeChunk speedup)
Heap* mHeap;
//To be used for reference counted smart pointers.
//Also ensures sizeof(Node) is 16 (32 on 64-bit systems) bytes to
//eliminate memory alignment issues
size_t mRefCount;
};

CTASSERT(sizeof(char)==1,CharSizeIsNot1Byte);
#if THIRTY_TWO_BIT == 1
CTASSERT(sizeof(Node)==16,NodeSizeIsNot16Bytes)
#else
CTASSERT(sizeof(Node)==32,NodeSizeIsNot32Bytes);
#endif //THIRTY_TWO_BIT

/*
** NB I use c-style casts in places because it is much
** easier to read. IMHO the reinterpret_cast syntax
** was never designed for such crazy casting.
*/

//Finds the mHeap member in the node that this memory
//belongs to.
inline Heap* getOwnerHeap(void* mem)
{
// *(Heap**) (((char*)mem) - HeapOffsetIntoNodeInBytes);
return *(Heap**) (((char*)mem) - sizeof(size_t) - sizeof(Heap*));
}

//returns the Node that owns this memory.
inline Node* getNode(void* mem)
{
return (Node*)(((char*)mem) - sizeof(Node));
}

class Pool;

////////////////////////////////////////////////////////////////////////
// //
// Class Heap. Uses a very strange and (eventually) tangled //
// linked list of nodes and chunks that get allocated as //
// one block of ram, arranged like this: //
// //
// |--------------| The tangle comes from nodes and chunks //
// | Node | being reused, and the linked list means //
// |--------------| there is no searching through the heap //
// | Allocated | for a free node/chunk combination. //
// | chunk | The nodes are headers for the chunks //
// |--------------| //
// | Next Node | Strangely this class becomes much slower //
// |--------------| if I make this block of memory part of //
// |Next Allocated| the instance rather than dynamically //
// | chunk | allocating it. //
// |--------------| //
// //
////////////////////////////////////////////////////////////////////////

class Heap
{
friend class Pool;
//the mem that gets split into Nodes and chunks
void* mMemory;
//sizeof(Node) + chunkSize
size_t mNodeSize;
//the next unused node
Node* mNextFree;
//the last node in the heap
Node* mLastNode;
//the size of the mem chunks this heap contains
size_t mChunkSize;
//the noOfChunks in this heap that have been allocated
size_t mAllocCount;
//the max allocs this heap can hold
size_t mMaxAllocs;
//used in cleanup phase
Pool* mParent;
//the requested heapSize
const size_t mSizeInBytes;
//size of the heap + sizeof the max possible no of nodes
const size_t mRealSize;

inline void setup(Pool* parent, size_t chunkSize)
{
mParent = parent;
mChunkSize = chunkSize;
mNodeSize = chunkSize + sizeof(Node);
mMaxAllocs = mRealSize / mNodeSize;
mAllocCount = 0;

//Set all the pointers. This loop will cause a little
//lag for small chunkSize heaps, but it can't be avoided :(
Node *n;
for(int i=0;imHeap = this;
n->mNext = (*this)[i+1]; //this will go past the end, but it gets
//set to NULL on the last Node below
n->mAllocSize = 0;
}

// set pointer to first free node.
mNextFree = reinterpret_cast(mMemory);
// set pointer to last free Node.
mLastNode = (*this)[mMaxAllocs-1];
mLastNode->mAllocSize = 0;
mLastNode->mHeap = this;
mLastNode->mNext = NULL;
}

public:

//make absolutely sure this thing throws bad_alloc
//independent of compiler stupidities
inline void* operator new(size_t size)
{
void* mem = malloc(size);
if(!mem)
throw std::bad_alloc();
return mem;
}

inline void operator delete(void* mem)
{
assert(mem);
::free(mem);
}

inline Heap(Pool* parent, size_t chunkSize, size_t size) :
mChunkSize(chunkSize),
mMemory(NULL),
mSizeInBytes(size),
mRealSize(size+(MAX_CHUNKS_PER_HEAP*sizeof(Node)))
{
mMemory = alignedAlloc(mRealSize);
setup(parent, chunkSize);
}

inline ~Heap()
{
if(mMemory)
alignedFree(mMemory);
}

//makes a node and chunk appear as one block of memory,
//and a heap as an array of them. mNodeSize is sizeof(Node)
//plus the size of the chunk.
inline Node* operator[](size_t idx) const
{
return reinterpret_cast((char*)mMemory + (idx*mNodeSize));
}

inline size_t chunkSize() const { return mChunkSize; }

inline size_t sizeInBytes() const { return mSizeInBytes; }

inline size_t allocCount() const { return mAllocCount; }

inline size_t maxAllocs() const { return mMaxAllocs; }

inline bool full() const { return !mNextFree; }

inline bool empty() const { return mAllocCount == 0; }

//Returns a free chunk from this heap and sets up the next
//allocation. This method will never be called if mNextFree
//is NULL
inline void* allocChunk(size_t allocSize)
{
void *chunk = NULL;
mNextFree->mAllocSize = allocSize;
mNextFree->mRefCount = 0;
chunk = (void*)(((char*)mNextFree) + sizeof(Node));
mNextFree = mNextFree->mNext;
mAllocCount++;
return chunk;
}

//Frees a chunk of memory and sets up all the pointers so no
//searching for free chunks is needed. This is what causes
//the tangles in the list. When the heap this belongs to is
//empty it gets passed to the parent pool for re-use.
//Implemented after Pool because it needs to call a Pool method.
inline void freeChunk(void *mem);
};

//A wrapper function for alignedAlloc that places a Node structure
//prior to the usable memory. Used when the requested allocation
//size will not fit in a Pool. The Node structure makes the allocation
//able to be freed and realloced by Memory::free and Memory::realloc.
inline void* bigAlloc(size_t size)
{
void* mem = alignedAlloc(size+sizeof(Details::Node));
Details::Node* n = reinterpret_cast(mem);
n->mAllocSize = size;
n->mNext = NULL;
n->mHeap = NULL;
n->mRefCount = 0;
return (void*)((char*)mem+sizeof(Details::Node));
}

////////////////////////////////////////////////////////////////////////
// //
// Class Pool. Manages a vector of (equal chunkSize) heaps //
// for allocation, also facilitates recycling of heaps both //
// within itself and though MetaPool. //
// //
////////////////////////////////////////////////////////////////////////

class Pool
{
protected:
friend class Heap;
friend class MetaPool;
Heap* mCurrentHeap;
Heap* mSpareHeap; //to avoid calling Heap::setup when there is no need
size_t mChunkSize;
MetaPool* mParent;
const size_t mDefaultHeapSize;
typedef std::vector HeapVector;
HeapVector mHeaps;

//When a heap is completely empty try putting it in the
//mSpareHeap member (so setup doesn't have to be called).
//If mSpareHeap is already used, add the heap to a LIFO
//in MetaPool for re-use elsewhere.
void cleanup(Heap* h);

public:

inline Pool(MetaPool* parent, size_t chunkSize, size_t defaultHeapSize) :
mCurrentHeap(NULL),
mSpareHeap(NULL),
mChunkSize(chunkSize),
mParent(parent),
mDefaultHeapSize(defaultHeapSize)
{
mHeaps.reserve(100);
}

inline ~Pool()
{
HeapVector::iterator i;
for(i=mHeaps.begin();i!=mHeaps.end();i++)
{
delete *i;
}
}

//Find a heap with a reasonable numer of free slots. If none
//are free allocates a new heap as a last resort, hence could
//throw std::bad_alloc. Implemented after MetaPool.
void findHeap();

//Returns a free chunk. Can throw std::bad_alloc.
//Contains the only branch I can't remove in mempry present conditions.
//Will trigger plenty more branches if mCurrentHeap is full though.
inline void *allocChunk(size_t allocSize)
{
if(!mCurrentHeap || mCurrentHeap->full())
findHeap();
return mCurrentHeap->allocChunk(allocSize);
}

inline size_t chunkSize() const { return mChunkSize; }
inline size_t noOfHeaps() const { return mHeaps.size(); }
inline size_t defaultHeapSize() const { return mDefaultHeapSize; }
};

/*
** This Heap method is here because it has to call a Pool
** method.
*/

//Frees a chunk of memory and sets up all the pointers so no
//searching for free chunks is needed. This is what causes
//the tangles in the list. When the heap this belongs to is
//empty it gets passed to the parent pool for re-use.
inline void Heap::freeChunk(void *mem)
{
Node* n = getNode(mem);
n->mRefCount = 0;
n->mAllocSize = 0;
//set up the pointers
Node* prevLast = mLastNode;
mLastNode = n;
prevLast->mNext = mLastNode;
mLastNode->mNext = NULL;
if(!mNextFree)
mNextFree = mLastNode;
mAllocCount--;
if(mAllocCount == 0)
mParent->cleanup(this);
}

/***********************************************************************
** **
** Begin template metaprogramming **
** **
***********************************************************************/

//////////////////////////////////////////////////////////////////////
// A compile-time loop to test if a number is a power of 2 //
//////////////////////////////////////////////////////////////////////
template
struct PowerOf2Test
{
enum
{
value = !(N%2) ? PowerOf2Test<(int)((N/2.0)+0.5)>::value : false
};
};
//it is a power of two
template <>
struct PowerOf2Test<2>
{
enum { value = true };
};
//it is not a power of two
template <>
struct PowerOf2Test<0>
{
enum { value = false };
};

//////////////////////////////////////////////////////////////////////
// A compile-time test to if a number is a power of 2 plus //
// (itself divided by two). This figures of that the next power //
// of two from 48 is 64 for example. //
//////////////////////////////////////////////////////////////////////
template
struct HalfwayTest
{
enum
{
//perhaps this first PowerOf2Test is redundant, but icc
//complains without it
value = PowerOf2Test::value ? false : PowerOf2Test<(int)(Size+((Size/3.0)+0.5))>::value
};
};

//////////////////////////////////////////////////////////////////////
// A metaprog "if" statement //
//////////////////////////////////////////////////////////////////////
template struct If;

//the true case
template
struct If
{
typedef TCase result;
};

//the false case
template
struct If
{
typedef FCase result;
};

//////////////////////////////////////////////////////////////////////
// True and False boolean wrapper types //
//////////////////////////////////////////////////////////////////////
struct True
{
enum { value = 1 };
};

struct False
{
enum { value = 0 };
};

#define MIN_CHUNKSIZE METAPOOL_MIN_CHUNK_SIZE
#define MAX_CHUNKSIZE METAPOOL_MAX_CHUNK_SIZE
///////////////////////////////////////////////////////////////////////
//Compile time sanity checks for heapSize, and min and max chunkSizes//
///////////////////////////////////////////////////////////////////////

typedef If::value,
True, HalfwayTest >::result HeapSizeCondition;
CTASSERT(HeapSizeCondition::value, BadHeapSize);

typedef If::value,
True, HalfwayTest >::result MinChunkCondition;
CTASSERT(MinChunkCondition::value, BadMinChunkSize);

typedef If::value,
True,HalfwayTest >::result MaxChunkCondition;
CTASSERT(MaxChunkCondition::value, BadMaxChunkSize);

//////////////////////////////////////////////////////////////////////
// Calculate (compile time) the number of pools required //
//////////////////////////////////////////////////////////////////////

template
struct PoolCount
{
enum
{
value = PoolCount::value + 1
};
};

//These terminates the above loop
template <>
struct PoolCount
{
enum { value = 1 };
};

template <>
struct PoolCount<(MIN_CHUNKSIZE+(MIN_CHUNKSIZE/2))>
{
enum { value = 1 };
};

template <>
struct PoolCount<(MIN_CHUNKSIZE+(MIN_CHUNKSIZE/3))>
{
enum { value = 1 };
};

//provides easy acces to the compile time calculated number of
//pools
struct NoOfPools
{
enum
{
max = MAX_CHUNKSIZE,
maxIsPowerOf2 = PowerOf2Test::value,
pwrOf2 = PoolCount<(maxIsPowerOf2 ? max : (max-(max/3)))>::value,
notPwrOf2 = PoolCount<(maxIsPowerOf2 ? (max-(max/4)) : max)>::value,
value = pwrOf2 + notPwrOf2
};
};

///////////////////////////////////////////////////////////////////////
// Calculate (compile time) the chunkSize of the pool at Idx //
///////////////////////////////////////////////////////////////////////
template
class ChunkSize
{
struct MinChunkPowerOf2
{
enum
{
powerOf2 = !(Idx%2),
value = powerOf2 ? ChunkSize::value - ChunkSize::value/3 : ChunkSize::value - ChunkSize::value/4
};
};
struct MinChunkNotPowerOf2
{
enum
{
powerOf2 = Idx%2,
value = powerOf2 ? ChunkSize::value - ChunkSize::value/3 : ChunkSize::value - ChunkSize::value/4
};
};
public:
enum
{
value = If::value, MinChunkPowerOf2, MinChunkNotPowerOf2>::result::value
};
};

//This terminates the above loop
template<>
class ChunkSize
{
struct MaxChunkPowerOf2
{
enum { value = MAX_CHUNKSIZE + (MAX_CHUNKSIZE/2) };
};
struct MaxChunkNotPowerOf2
{
enum { value = MAX_CHUNKSIZE + (MAX_CHUNKSIZE/3) };
};
public:
enum
{
value = If::value, MaxChunkPowerOf2, MaxChunkNotPowerOf2>::result::value
};
};

///////////////////////////////////////////////////////////////////////
// Calculate (compile time) the best pool Idx to allocate Size bytes //
///////////////////////////////////////////////////////////////////////
template
struct PoolIdxSearch
{
enum
{
//will the requested alloc fit in a chunk?
inRange = Size <= MAX_CHUNKSIZE? true : false,
//does the alloc fit in a pool, and it will fit in the current pool?
fitsHereToo = (inRange && (Size < ChunkSize::value)) ? true : false,
//if it does return the current index, otherwise keep recursing
value = fitsHereToo ? Idx : PoolIdxSearch::value
};
};

//terminate the loop at NoOfPools::value, which means we have to use
//the run-time allocator because there isn't a pool large enough
template
struct PoolIdxSearch
{
enum { value = -1 };
};

#undef MIN_CHUNKSIZE
#undef MAX_CHUNKSIZE

/***********************************************************************
** **
** End template metaprogramming **
** **
***********************************************************************/

} //namespace Details

////////////////////////////////////////////////////////////////////////
// //
// Class MetaPool. Encapsulates multiple pools of different //
// chunkSizes, handles runtime searching for the best size //
// pool for an alloc, and manages recylcling of heaps between //
// pools with different chunkSizes. //
// //
////////////////////////////////////////////////////////////////////////
class MetaPool
{
friend class Details::Pool;

typedef Details::NoOfPools NoOfPools;

//holds all the pools
Details::Pool* mPools[NoOfPools::value];

//some space to store empty heaps for re-use between
//pools. Using a vector as a LIFO because of constant-time
//pop_back() and ability to reserve space (which deque can't)
typedef std::vector HeapVector;
HeapVector mSpareHeaps;

inline void addSpareHeap(Details::Heap* h)
{
if(mSpareHeaps.size() > METAPOOL_MAX_SPARE_HEAPS)
{
delete h;
return;
}
mSpareHeaps.push_back(h);
}

inline Details::Heap* findSpareHeap(size_t chunkSize)
{
Details::Heap* h = NULL;
if(mSpareHeaps.empty())
{
for(int i=0;imSpareHeap && (mPools[i]->mSpareHeap->sizeInBytes() % chunkSize))
{
h = mPools[i]->mSpareHeap;
mPools[i]->mSpareHeap = NULL;
return h;
}
}
return NULL;
}

HeapVector::iterator it = mSpareHeaps.begin();
for(;it!=mSpareHeaps.end();it++)
{
if(!((*it)->sizeInBytes() % chunkSize))
{
h = (*it);
mSpareHeaps.erase(it);
return h;
}
}
return NULL;
}

public:

inline Details::Pool* pool(int idx)
{
return mPools[idx];
}

MetaPool()
{
const size_t heapSizeInK = METAPOOL_HEAP_SIZE_IN_K;
const size_t minChunkSize = METAPOOL_MIN_CHUNK_SIZE;
const size_t maxChunkSize = METAPOOL_MAX_CHUNK_SIZE;
//make sure there is plenty of room so the vector
//doesn't have to realloc
mSpareHeaps.reserve(METAPOOL_MAX_SPARE_HEAPS);
bool minChunkPwrOf2 = Details::PowerOf2Test::value;
bool maxChunkPwrOf2 = Details::PowerOf2Test::value;
bool heapSizePwrOf2 = Details::PowerOf2Test::value;
size_t pwrOf2HeapSize = heapSizePwrOf2 ? heapSizeInK : heapSizeInK-(heapSizeInK/3);
size_t notPwrOf2HeapSize = heapSizePwrOf2 ? heapSizeInK-(heapSizeInK/4) : heapSizeInK;
pwrOf2HeapSize *= 1024;
notPwrOf2HeapSize *= 1024;
size_t currentSize = minChunkSize;
if(minChunkPwrOf2)
{
int i;
int j = 0;
for(i=0;i METAPOOL_MAX_CHUNK_SIZE)
return Details::bigAlloc(size);
for(int i=0;ichunkSize())
return mPools[i]->allocChunk(size);
}
throw std::bad_alloc();
return NULL;
}

//Copies an allocation to a larger one and frees the old alloc.
//Can throw std::bad_alloc.
inline void* realloc(void *mem, size_t newSize)
{
assert(mem);
void* newMem = NULL;
Details::Node* n = Details::getNode(mem);
Details::Heap* h = n->mHeap;
if(!h)
{
if(newSize > n->mAllocSize || newSize < (n->mAllocSize/2))
{
newMem = alloc(newSize);
memcpy(newMem,mem,n->mAllocSize);
h->freeChunk(mem);
return newMem;
}
return mem;
}
if(newSize > h->chunkSize() || newSize < (h->chunkSize()/2))
{
void *newChunk = alloc(newSize);
memcpy(newChunk,mem,n->mAllocSize);
h->freeChunk(mem);
return newChunk;
}
return mem;
}

//Free an allocation. This WILL crash if you pass it memory
//that was not allocated through this allocation system.
inline void free(void *mem)
{
assert(mem);
Details::Heap *h = Details::getOwnerHeap(mem);
if(!h)
{
alignedFree(Details::getNode(mem));
return;
}
h->freeChunk(mem);
}
};

/*
** These Pool methods are here because they rely on calling
** MetaPool methods
*/
//When a heap is completely empty try putting it in the
//mSpareHeap member (so setup doesn't have to be called).
//If mSpareHeap is already used, add the heap to a list
//in MetaPool for re-use elsewhere (though setup will
//have to be called).
void Details::Pool::cleanup(Details::Heap* h)
{
if(h==mCurrentHeap && !mSpareHeap)
{
mSpareHeap = h;
mCurrentHeap = NULL;
return;
}
if(h!=mCurrentHeap)
{
mParent->addSpareHeap(h);
//need to remove h from mHeaps!!!
assert(!mHeaps.empty());

HeapVector::iterator it = mHeaps.end();
it--;
while(it!=mHeaps.begin())
{
if((*it) == h)
{
mHeaps.erase(it);
return;
}
it--;
}
assert(mHeaps[0]==h);
mHeaps.erase(it);
}
}

//Find a heap with reasonable number of free chunks. If none are suitable
//allocates a new heap, hence could throw std::bad_alloc.
void Details::Pool::findHeap()
{
if(mCurrentHeap)
{
mHeaps.push_back(mCurrentHeap);
mCurrentHeap = NULL;
}
if(mSpareHeap)
{
mCurrentHeap = mSpareHeap;
mSpareHeap = NULL;
return;
}

//Fallback to looking for a recent Heap with at least threshhold free chunks.
//Looping backwards to exploit any short-lived objects
//that have been freed.
int size = mHeaps.size();
int threshhold = 1;
int stopAt = 0;
if(size)
{
Heap* h;
stopAt = size > 5 ? 5 : size;
for(int i=size-1;i>=stopAt;i--)
{
h = mHeaps[i];
threshhold = h->maxAllocs() - (h->maxAllocs() / 4);
if(mHeaps[i]->allocCount() <= threshhold)
{
mCurrentHeap = mHeaps[i];
return;
}
}
}

//Fallback to asking MetaPool for a spare Heap and calling setup if found
Heap* fallback = mParent->findSpareHeap(mChunkSize);
if(fallback)
{
fallback->setup(this,mChunkSize);
mCurrentHeap = fallback;
return;
}
//Fallback to allocating a new one
mCurrentHeap = new Heap(this,mChunkSize,mDefaultHeapSize);
}

typedef Singleton MetaSingleton;

//This struct will be the result of an If template if the
//requested size will fit in a pool.
template
struct CTAllocHelper
{
inline static void* alloc(size_t size)
{
return MetaSingleton::instance()->pool(Idx)->allocChunk(size);
}
};

//This struct will be the result of an If template if the
//requested size will not fit in a pool.
template
struct RTAllocHelper
{
inline static void* alloc()
{
return Details::bigAlloc(Size);
}
};

/*
** Normal use API follows
*/

template
inline void* alloc()
{
const int idx = Details::PoolIdxSearch::value;
return Details::If, RTAllocHelper >::result::alloc(Size);
}

template
inline void* alloc()
{
return alloc();
};

//Allocate size bytes (from a pool if possible). Throws std::bad_alloc
//if the request cannot be fulfilled. Uses a runtime search for the
//best pool.
inline void* alloc(size_t size)
{
return MetaSingleton::instance()->alloc(size);
}

template
inline void* realloc(void* mem)
{
void* newMem = NULL;
Details::Node* n = Details::getNode(mem);
Details::Heap* h = n->mHeap;

if(!h)
{
if(Size > n->mAllocSize || Size < (n->mAllocSize/2))
{
newMem = alloc();
memcpy(newMem,mem,n->mAllocSize);
Memory::free(mem);
return newMem;
}
return mem;
}

if(Size > h->chunkSize() || Size < (h->chunkSize()/2))
{
newMem = alloc();
memcpy(newMem,mem,n->mAllocSize);
Memory::free(mem);
return newMem;
}
return mem;
}

//Copies an allocation to a larger one and frees the old alloc.
//Will never shrink an allocation. Uses pools if possible.
//Throws std::bad_alloc if the request cannot be fulfilled.
//Uses a runtime search for the best pool.
inline void* realloc(void *mem, size_t newSize)
{
return MetaSingleton::instance()->realloc(mem, newSize);
}

//Free memory allocated with this system only. Will crash if used on
//memory allocated with malloc/calloc/realloc or the default new
//operators.
inline void free(void* mem)
{
MetaSingleton::instance()->free(mem);
}
//class Test : public AddAllocator
template
struct AddAllocator
{
inline void* operator new(size_t rtSize)
{
assert(sizeof(T)==rtSize);
return Memory::alloc();
}
inline void operator delete(void* mem)
{
Memory::free(mem);
}

inline void* operator new[](size_t size)
{
return Memory::alloc(size);
}
inline void operator delete[](void* mem)
{
Memory::free(mem);
}
};

#define METAPOOL_CT_SINGLE_NEW(klass) inline void* operator new(size_t rtSize){ assert(sizeof(klass)==rtSize); return Memory::alloc::alloc();}

#define METAPOOL_RT_SINGLE_NEW inline void* operator new(size_t size){ return Memory::alloc(size);}

#define METAPOOL_SINGLE_DELETE inline void operator delete(void* mem){ Memory::free(mem);}

#define METAPOOL_ARRAY_NEW inline void* operator new[](size_t size){ return Memory::alloc(size);}

#define METAPOOL_ARRAY_DELETE inline void operator delete[](void* mem){ Memory::free(mem);}

#define METAPOOL_CT_ALLOCATORS(klass) METAPOOL_CT_SINGLE_NEW(klass) METAPOOL_SINGLE_DELETE METAPOOL_ARRAY_NEW METAPOOL_ARRAY_DELETE

#define METAPOOL_RT_ALLOCATORS METAPOOL_RT_SINGLE_NEW METAPOOL_SINGLE_DELETE METAPOOL_ARRAY_NEW METAPOOL_ARRAY_DELETE

}//namespace Memory

//clean up the macros
#undef CTASSERT
#undef BYTE_ALIGNMENT
#undef MAX_CHUNKS_PER_HEAP

#endif //__METAPOOL_HH__

[/code]

Submitted by lorien on Tue, 08/02/05 - 12:58 AM Permalink

And here is a completely stupid and synthetic little benchmark prog

[code]
#define _XOPEN_SOURCE 600

#include
#include "MetaPool.h"

/*
** millisecond accurate timer code for win32
*/

#ifdef WIN32
#include
#ifdef __GNUC__
#ifndef INT64
#define INT64 long long
#endif //INT64
#endif //__GNUC__

#include
#include
#define LARGE_INTEGER_PTR LARGE_INTEGER*

#ifdef _MSC_VER
#define LARGE_INTEGER_PTR LARGE_INTEGER*
#endif //_MSC_VER
static INT64 gDivision;
static INT64 gStartTime;

size_t time()
{
INT64 time;
QueryPerformanceCounter((LARGE_INTEGER_PTR)&time);
time -= gStartTime;
time /= gDivision;
return (unsigned) time;
}
#endif //WIN32

/*
** millisecond accurate timer code for unix
** on x86 Linux gettimeofday is as accurate as
** QueryPerformanceCounter on win32.
*/

#ifdef unix
#include
#include
#include
#include
timeval gStartTime;
timeval gSleepTime;

size_t time()
{
size_t time;
struct timeval tv;
gettimeofday(&tv,NULL);
time = (tv.tv_sec - gStartTime.tv_sec) * 1000;
time += ((tv.tv_usec - gStartTime.tv_usec) / 1000);
return time;
}
#endif //unix

void alignedAllocTest(const int runs, int size)
{
size_t start = time();
void** test = (void**)Memory::alloc(runs*sizeof(void*));
for(int repeats=0;repeats<10;repeats++)
{
int i;
for(i=0;i
void ctPoolTest(const int runs)
{
void** test = (void**)Memory::alloc(runs*sizeof(void*));
size_t start = time();
for(int repeats=0;repeats<10;repeats++)
{
int i;
for(i=0;i();
}
for(i=0;i(t1Runs);

std::cout<<"

";

const int t2Runs=100000;
const int t2Size=10;
std::cout<<"Test2: runs "<(t2Runs);

std::cout<<"

";

const int t3Runs=50000;
const int t3Size=1023;
std::cout<<"Test3: runs "<(t3Runs);

std::cout<<"

";

const int tvRuns=3000;
const int tvLower=16;
const int tvUpper=1024;
std::cout<<"Var size test: runs "<

Submitted by Daemin on Tue, 08/02/05 - 9:23 AM Permalink

Erm, it looks "neat"... now I just have to look over it properly when I am concentrating in some form of IDE.

Submitted by lorien on Wed, 09/02/05 - 3:21 AM Permalink

What are you guys (Hazard and Barry) doing in the programmer discussion area then? [:)] This thing will increase framerates.

I'm just fixing a bug btw: the alignedAlloc function for windows doesn't throw a std::bad_alloc if an allocation fails (doh!)

And another Doh!, an "#include " made in in somehow. Both these things are fixed now.

Submitted by mcdrewski on Wed, 09/02/05 - 4:21 AM Permalink

...so there's a sanity check that the MIN_CHUNK_SIZE and MAX_CHUNK_SIZE are even (your error message saying "must be power of two"), but what actually happens at compile time if they're not powers of two? Do we simply end up with one of those unreadable template error messages, or does it infinitely recurse?

It certainly looks funky - but I'm still confused as to how Pools, Heaps and Chunks really work together. If I'm reading it right, You have 'n' pools, each managing a set of Heaps, and each Heap manages a number of fixed size block of memory (Chunks) ranging from MIN_CHUNK_SIZE to MAX_CHUNK_SIZE in powers of two.

Why the level of indirection using Heaps? What benefit does that give over simply a number of pools of chunks? Perhaps a nice high-level disagram of the data structure would explain what it's achieving...

Submitted by lorien on Fri, 11/02/05 - 12:17 AM Permalink

If they are not powers of two everything should still work (I'll do some destructive testing, thanks), but the 16-byte alignment of every allocation will be thrown off, making it useless for vector operations (I'm doing audio synthesis using SSE intrinsics, and the realtime scripting language, whilst general purpose and pure OO, is designed for audio). The next chunksize will still be calculated by 2* currentChunkSize. Also power of two is a good idea because (on the x86) double precision floats are very slow if not aligned on 8-byte boundaries, and some other architectures just crash without things being aligned properly.

I could do a real compile-time test for power-of-two-ness using another metaprogrammed loop. I probably will if it goes boom when they aren't.

I'm afraid I'm going to have to save the nice diagrams for an academic journal- I need to try to properly publish this because it is very close to a solution to an "impossible" problem of computer science (avoiding the runtime search in memory allocators). I wish I could make it work with languages other than C++ too [:)]
The relation is:
* Heaps manage Nodes, which are headers for chunks. Chunks are where you put your data and they don't exist as a class- they are just ram waiting to be used. Heaps have a fixed chunksize, and by making all Heaps equal in size they can be easily re-used by different Pools when they are empty. This is why the Heap class exists: to pass unused memory around between Pools.
* Pools manage Heaps. Pools also have a fixed chunksize. The Pool class also can ask MetaPool for a spare heap or allocate a new one if the current Heap is full.
* MetaPool manages an array of Pools, each of which has different size chunks.

The goal with this arrangement is to avoid calling malloc or new like the plague. They can have highly variable latency, which is a Bad Thing (tm) for realtime software.

I'm just posting another update, this one has some improved comments, and my CTAssert class was stuffed, it just compiled whether or not the condition was false. Now it prints nice errors. Also I #undef the macros I've defined now.

One more thing, malloc and alignedAlloc functions seem to perform much better on AMD CPUs, it's on P4s that I've seen well over 10 fold performance increase. I suspect it's the stupidly long pipelines in P4s and failed branch predictions. On Athlons the max increase I've seen is around 6x.

Submitted by lorien on Fri, 11/02/05 - 12:23 AM Permalink

Do people think this should be sticky?

Submitted by davidcoen on Fri, 11/02/05 - 5:00 AM Permalink

i think it should be sticky, i need to re-read it a few time, but with your latter concept explanation post, i think i understand it. very, very nice.

Submitted by lorien on Sat, 12/02/05 - 12:01 AM Permalink

Thanks. I'll have another small update later today.

Submitted by lorien on Sat, 12/02/05 - 1:22 AM Permalink

Update done, changes are the copyright (apparently I share it with the uni, despite much of this code being written in early 2003 before I started), a typo (CTAssert instead of CTASSERT in a 64-bit section), replaced #errors with #warnings if the chunkSizes are not powers of two, replaced #warn (an invalid preprocessor instruction) with #warning, and a few little code style and comment cleanups.

mcdrewski: it does work with non-power of two chunkSizes, but as I said above it's probably a bad idea to use them.

The Heap class serves another purpose btw: it allows Pools to expand without having to realloc. Without the Heap class a Pool would have to hold one big chunk of contiguous memory, and the whole thing would probably need to be copied during a realloc when it filled up. Instead another Heap* is added to a vector inside Pool, and the vector already has a fair bit of space (for Heap*s) reserved to avoid calling realloc or whatever your STL allocator does to resize a vector.

David: I find this code pretty brain bending too, and I find template metaprogramming a little perverse...

Submitted by Grover on Sat, 12/02/05 - 9:17 AM Permalink

Thats pretty damn handy stuff lorien.
I have seen similarly developed ideas for console memory allocation, however none of which cater for the runtime new/alloc problem.
I have seen solutions in Game Programming Gems which use a couple of differing methods, but again, none that cover general purpose use - really great stuff.

I would imagine this would be especially useful for consoles, where their alloc routines are very slow - PS2 is an example I can think of. I'll also try it out on NGage and GBA, and see how it goes.

Submitted by lorien on Tue, 15/02/05 - 1:59 AM Permalink

Another update with a little more metaprogramming. It's documented in the comments. It goes faster :)

Thanks Grover, and thanks Souri for making it sticky.

I actually didn't think it would be good for consoles because of wasted memory because of the pools. Though of course if a near constant frame-rate is important this should be great.

I don't think I can optimise it much more with my current knowledge of c++ and hardware, so further updates are likely to be bug fixes. If you port it to another platform and are not prevented from doing so by an evil NDA please post those ports here and I'll add them to the main code- providing they don't break it for normal computers.

BTW the only "normal" textbook I've seen that mentions template metaprogramming and similar high end techniques is the 2nd edition of Thinking In C++ by Bruce Eckel. It is available for free in electronic form from http://mindview.net/Books

Submitted by kalin on Tue, 15/02/05 - 8:59 PM Permalink

I'm not sure exactly what you mean by 'normal' textbook, but
two good metaprogramming books come to mind;
Modern C++ Design, by Alexei Alexandrescu.
C++ Template Metaprogramming, by David Abrahams and Aleksey Gurtovoy.

Modern C++ design takes alot of traditional design patterns, and
using heavy template code, makes them very safe, flexible, and
easy to use. Alexandrescu's code has problems with alot of compilers
though, it's just too much for them.

C++ Template Metaprogramming is generally a walk-through of the
Boost.MPL library, and showing ways to use it, and how it works.
It's probably good to get a fundamental understanding on how to
actually use metaprogramming techniques. Also, the Boost.MPL library
is quite well supported by a range of compilers, despite it's
complexity. (Quite a feat!)

[edit]

Also, the Boost.Pool library may be of interest in this thread.
http://www.boost.org/libs/pool/doc/index.html

Submitted by lorien on Tue, 15/02/05 - 11:43 PM Permalink

I don't call either of those books "normal", I call them "crazy", but only in the nicest of ways [:)]. Loki (the norse god of mischief) is a perfect name for Alexei's library...
The Boost Metaprogramming library has to be the most complex and innovative library I've ever seen, and I don't even pretend to have anything other than the barest glimmering of understanding of what can be done with it. I've got the book on order.

Be careful asking questions about David Abrahams work on the boost mailing list, I haven't dared for years after copping a real flaming from him- in front of many of the most full-on c++ gurus in the world. I was just asking questions which the docs to his boost::python library didn't make clear.

*****

Sigh, I was completely blind in that update yesterday... There were two conditional branches the instruction stream for a compile time searched allocation when memory was available.

I have reduced it to one, and I simply can't get rid of it :(

These branches are important because they are slow if the CPU's prediction algorithm fails.

Fortunately this one should be quite predictable. It is in Pool::allocChunk().

Malloc/free (which is the same as new/delete) algorithms will commonly have hundreds of branches, many of which are not at all easy to predict. My synthetic benchmark should be showing close to best case for malloc (and the alignedAlloc functions use the same algorithms).

Just posting the update.

Submitted by lorien on Wed, 16/02/05 - 2:26 AM Permalink

This thing I've made is a lot more sophistcated than the boost memory pools: they are single pools for fixed size allocations. This is a general purpose library made up of multiple pools with around the same speed increase as the single pool allocator.

Submitted by lorien on Sat, 19/02/05 - 12:23 AM Permalink

I dug out my copy of Modern C++ Design again, and found the author's name to be Andrei, not Alexei. There seems to be some confusion here, I was sure Alexei was wrong previously, but a google search returned Alexei so that is what I wrote above.
The cover of the book has Andrei printed on it.

I'll post a fix to a nasty bug shortly: having the Singleton::instance method inlined could lead to multiple instances of MetaPool in different translation units :(

Submitted by lorien on Sat, 19/02/05 - 2:29 AM Permalink

I'll have to do some research on the implications of static local variables in inline functions to be sure. The problem could still be there if the compiler decides to inline the method anyway.
The issue code is:
[code]
template
class Singleton
{
public:
static T* instance();
};

template
T* Singleton::instance()
{
static T instance;
return &instance
}
[/code]

Submitted by lorien on Sat, 19/02/05 - 2:55 AM Permalink

From some testing it doesn't seem to be a problem, even in inline functions. It is hard to find info about this possible issue (perhaps it isn't an issue- imho it shouldn't be, but inline functions and static local variables are an interesting mix).

Submitted by kalin on Sat, 19/02/05 - 11:45 AM Permalink

Alexei, oops. I guess I was tired that day, I had the name printed on a book no more than 40cm from me. -_-

As for the static variables and inline functions.. I don't think it would ever be a problem. Since the object is in static-storage space, it would be impossible to compile the program if it generated a seperate additional copy per call, because it would break the ODR (One definition rule) for the linker. (Any workarounds would just be an impossible nightmare - what would their mangled link-names be?)

Submitted by Blitz on Tue, 22/02/05 - 4:20 AM Permalink

Local statics in inline functions are a problem when using DLL's. It will create one copy inside the DLL, and another copy for any access outside of the DLL. I don't think it's a problem anytime you are using normal static (early?) binding. So static libs and just compiling into your executable should be fine.
CYer, Blitz

Submitted by lorien on Fri, 25/02/05 - 12:16 AM Permalink

Just posted a huge update. Please use this one because there was a showstopper crash bug in all previous versions.

Docs in this major revision will improve with time.

Perhaps I should make a custom non-template Singleton for this- having seperate memory management between dlls will really stink.

Updated the little stupid and synthetic benchmark too.

This version of the allocator is not designed to give fantastic results in stupid benchmarks. It is designed for much better efficiency in real-world progs.

Submitted by lorien on Fri, 25/02/05 - 2:01 AM Permalink

Just made a very minor update: it didn't build with msvc.net because I'd left out one "template<>" befor a full specialisation. Both g++ and icc didn't complain.

Apparently the msvc.net preprocessor doesn't support "#warning", it tells me "invalid preprocessor command". This means my macro overloading stuff doesn't work on msvc.

Submitted by redwyre on Thu, 03/03/05 - 11:48 AM Permalink

For those wondering what mysterious invisible comment I made, I actually unstickied this thread.

Submitted by lorien on Fri, 04/03/05 - 3:46 AM Permalink

I should probably find a better home for this thing anyway: it is becoming multi file.

Submitted by wicked on Tue, 08/11/05 - 10:29 AM Permalink

lorien, please elaborate on how your memory allocator is better than the boost pool libraries.

I once wrote a custom memory allocator class based on the bulk allocator of Bartosz Milewski's 1997 classic C++ In Action. [url]http://www.relisoft.com/book/tech/9new.html[/url]

I can't provide the code as it's owned by my previous employer, but it's basically a single class encapsulating the described behaviour and templatized with an integer which tells it which size objects it can allocate.

The classes which you want to allocate space in bulk for are then provided with an overloaded operator new and delete which calls into the allocator class. This way no other code had to be changed in the 600k line code base.

However, since Boost has come up with a pool allocator library, I'd be likely to check that one before writing my own again. Particularly since it's thread safe and portable.

Submitted by lorien on Tue, 08/11/05 - 1:00 PM Permalink

It's different to the boost pools. I think it's better in the ways that are important to my projects, but I'm not going to say "my code is better than boost", I'm not so stupid :)
quote:
Boost C++ Libraries
#8220;...one of the most highly regarded and expertly designed C++ library projects in the world.#8221;#8212; Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

This is off the top of my head, please correct anything about boost if I'm mistaken...

The boost pools are single pools. They are not a generic allocation system like this.

The boost pools have a very roughly similar linked list structure to the one I'm using in my Heap class. The difference is the boost pools header structure for an alloc, and the alloc itself are different pieces of memory. Have a look at my Heap and Node class, the Heap allocates a single block of memory, which it splits into Nodes and Chunks (chunks are what I call the actual memory you use) using some nutty pointer maths.

The boost pools let you choose your own memory alignment. Mine is pretty well set to 16-bytes, though it can be hacked.

This allocator is designed to be an insanely fast and reasonably memory efficient full replacement for new and delete, that takes advantage of the compilers innate knowledge of the sizes of all your classes (the metaprogramming). The number of conditional branches in the allocation path is absolutely minimised.

The boost pools are designed to be special purpose allocators for particular types of objects.

My code has changed somewhat since the posts above, I think the above might have a nasty crash bug still... Will release on sourceforge when I'm good and ready, though just ask if anyone want a pre-release. It's part of a much larger low level library with a bunch of smart pointers, STL allocators, singletons, lockless queues etc, etc. Some of the guts of a realtime (low latency audio), multithreaded scripting language split out into a shared library.

One of the big differences between the code above and my current code is I'm using a thread-local singleton to hold the MetaPool class (a thread-local singleton means one instance per-thread).

Making it into a per-thread allocator means there is not a single mutex in the allocation path, and it's completely threadsafe. But you can't allocate memory in 1 thread then free it in another. This condition is asserted for. A gather a lot of JVMs use this optimisation.

This allocator puts a whole bunch of pools together in a vector. The run-time search involves iterating over the vector until the best-fit pool is found. The metaprogramming does this at compile time.

For anyone at a uni here, if you're interested in memory allocators, go to your libraries website and find the database collection. Look for IEEE XPlore and start seaching. It will bring back full journal papers talking about the issues involved in memory allocators. Maybe add "real-time" to the search too.

As far as I've been able to find this is a unique memory allocator.

The library is under the ZLib license, which is basically "do want you want with this, except don't claim you own it". Sourceforge doesn't have the boost license as an option.

Just ask if you'd like more detail, and I'm going to be talking about this at the thing La Trobe runs instead of the AGDC, which is also on Dec the 2nd. Send me a PM if you'd like an invite, and we feed you too [:)] We don't claim it's a conference, it's a chance to see some postgrad and undergrad student work and (if you want) hire some graduates. There will be people from other unis there too.

Edit: it may be dec 3, I'll check tomorrow, though plenty have invites.

Posted by lorien on
Forum

As I said elsewhere I've been trying to make heap allocation almost as fast as stack allocation for a realtime scripting language I'm making. Here is the allocator code.

I was going to make this a tutorial, but instead I'm going to try and get something about this published in somewhere like the C++ Users Journal.

For some info on metaprogramming have a look at http://osl.iu.edu/~tveldhui/papers/Template-Metaprograms/meta-art.html and/or do some googling. There is a fair amount of info out there for such an exotic technique, and there are even some libraries like the boost mpl.

Enjoy [:)]

[code]
#ifndef __METAPOOL_HH__
#define __METAPOOL_HH__

/********************************************************************************
** A cross-platform template metaprogrammed memory pool allocation system. **
** **
** Copyright (C) 2003-2005 Lorien Dunn and The Applied Computing Research **
** Institute, La Trobe University. **
** **
** Released at http://sumea.com.au/forum/topic.asp?TOPIC_ID=2716 under **
** the Boost license (repreoduced below) on the 7th of Feb 2005. The API for **
** normal use is at the end of this file. **
** **
** In addition to the boost license I ask that bug fixes, optimisations **
** and other improvements be posted on sumea at the URL given above. **
** And if you use this and meet me perhaps you could buy me a beer :) **
** **
** This system manages a variable number of pools, each of which caters for **
** a different size allocation. The metaprogramming is used to find which **
** pool bests fits a particular size allocation request at compile time. **
** A runtime search is also provided (for when a size can't be known at **
** compile time). **
** **
** Pool chunkSizes are arranged like this: **
** 16 24 32 48 64 96 128 192 or **
** 24 32 48 64 96 128 192 256 **
** depending on whether the value defined by METAPOOL_MIN_CHUNK_SIZE **
** is a power of two or a power of two + 1/2 times the same power of two. **
** **
** Compared to previous versions this release vastly improves memory **
** wastage and scalability. METAPOOL_HEAP_SIZE_IN_K can be set to 4 **
** megs and METAPOOL_MAX_CHUNK_SIZE to 2 megs without problem. There is a **
** small runtime penalty for this scalabilty, and synthetic benchmarks that **
** allocate tons of memory probably won't perform hugely better than malloc. **
** It will perform much better in real programs. **
** **
** It should be 64-bit clean, providing you change one macro, but I've not **
** been able to verify this yet. It will not work on 16-bit CPUs. **
** **
** Note pools with a chunkSizes that are multiples of 16 bytes will return **
** 16 byte allocations, making them suitable for use with SSE vector **
** operations. **
** **
** Normal use API appears at the end of this file. It is very similar to **
** the c standard library: **
** **
** void* alloc(); **
** void* alloc(); **
** void* alloc(size_t size); **
** void* realloc(void* mem); **
** void* realloc(void* mem, size_t newSize); **
** void free(void* mem); **
** **
** The Allocator classes from the previous versions are gone, but a base **
** class allocator for use with the "curiously recursive template pattern" **
** is provided, as are preprocessor macros. **
** **
** Forget about using this with MSVC 6.0 (even with SP 5). 6.0 is way too **
** broken (particularly in templates), and the headers are missing the memory **
** alignment functions. AFAIK the only version of MSVC this code will compile **
** with is .net 2003 because it requires partial template specialisation. **
** If you are using MinGW make sure it is >= 3.4.0 because the memory **
** alignment functions are missing from the headers in 3.3.x **
** Works fine with Intel C++ 7.1 on win32 with MSVC6 headers, but have to hak **
** alignedAlloc and alignedFree to use normal malloc/free :( **
** I have no idea if it works with Borland compilers, and to be honest I **
** don't care- I've had nothing but trouble with them. **
** **
********************************************************************************/

/********************************************************************************
** Boost Software License - Version 1.0 - August 17th, 2003 **
** **
** Permission is hereby granted, free of charge, to any person or organization **
** obtaining a copy of the software and accompanying documentation covered by **
** this license (the "Software") to use, reproduce, display, distribute, **
** execute, and transmit the Software, and to prepare derivative works of the **
** Software, and to permit third-parties to whom the Software is furnished to **
** do so, all subject to the following: **
** **
** The copyright notices in the Software and this entire statement, including **
** the above license grant, this restriction and the following disclaimer, **
** must be included in all copies of the Software, in whole or in part, and **
** all derivative works of the Software, unless such copies or derivative **
** works are solely in the form of machine-executable object code generated by **
** a source language processor. **
** **
** THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR **
** IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, **
** FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT **
** SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE **
** FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, **
** ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER **
** DEALINGS IN THE SOFTWARE. **
********************************************************************************/

#include
#include //for std::bad_alloc
//not using headers because they causes
//compatibility issues when including c header files.
#include //for memcpy,
#include

//Re-implimentation of the Loki library static assert.
//Makes a compile error if the condition is false.
template struct CTAssert;
template <> struct CTAssert {};
#define CTASSERT(expr,msg) CTAssert<((expr) != false)> ERROR_##msg;

//All memory will be aligned on 16 byte boundaries. Do not change.
#define BYTE_ALIGNMENT 16

#ifdef METAPOOL_USE_MALLOC
#warning "METAPOOL_USE_MALLOC overridden"
#else
#define METAPOOL_USE_MALLOC
#endif //METAPOOL_USE_MALLOC

//change this to 0 on a 64 bit system (untested atm). I wish
//sizeof would work in the preprocessor!
#ifdef THIRTY_TWO_BIT
#warning "THIRTY_TWO_BIT overridden"
#else
#define THIRTY_TWO_BIT 1
#endif //THIRTY_TWO_BIT

//You might want to change these according to your needs.
//Chunk sizes should be be powers of two or memory alignment
//will likely be completely wrong, and smaller than 16 bytes
//on 32-bit systems and 32 bytes on 64-bit systems
//will mean more space wasted than allocated for the Pools
//with the smaller chunkSizes.
//add warnings if overidden
#ifdef METAPOOL_MAX_SPARE_HEAPS
#warning "METAPOOL_MAX_SPARE_HEAPS overridden"
#else
#define METAPOOL_MAX_SPARE_HEAPS 20
#endif //METAPOOL_MAX_SPARE_HEAPS

#if THIRTY_TWO_BIT == 1
#ifdef METAPOOL_HEAP_SIZE_IN_K
#warning "METAPOOL_HEAP_SIZE_IN_K overridden"
#else
#define METAPOOL_HEAP_SIZE_IN_K 1024
#endif //METAPOOL_HEAP_SIZE_IN_K

#ifdef METAPOOL_MIN_CHUNK_SIZE
#warning "METAPOOL_MIN_CHUNK_SIZE overridden"
#else
#define METAPOOL_MIN_CHUNK_SIZE 16
#endif //METAPOOL_MIN_CHUNK_SIZE
#else //64-bit
#ifdef METAPOOL_HEAP_SIZE_IN_K
#warning "METAPOOL_HEAP_SIZE_IN_K overridden"
#else
#define METAPOOL_HEAP_SIZE_IN_K 2048
#endif //METAPOOL_HEAP_SIZE_IN_K

#ifdef METAPOOL_MIN_CHUNK_SIZE
#warning "METAPOOL_MIN_CHUNK_SIZE overridden"
#else
#define METAPOOL_MIN_CHUNK_SIZE 32
#endif //METAPOOL_MIN_CHUNK_SIZE
#endif //THIRTY_TWO_BIT == 1

#ifdef METAPOOL_MAX_CHUNK_SIZE
#warning "METAPOOL_MIN_CHUNK_SIZE overridden"
#else
#define METAPOOL_MAX_CHUNK_SIZE ((METAPOOL_HEAP_SIZE_IN_K*1024)/2)
#endif //METAPOOL_MAX_CHUNK_SIZE

//do a little preprocessor sanity checking.
//Real power-of-2 and power-of-2 + 1/2-power-of-2 sanity
//checking is done with a template metaprogram.

#if METAPOOL_HEAP_SIZE_IN_K % 2 != 0
#error "METAPOOL_HEAP_SIZE_IN_K must be even"
#endif //METAPOOL_HEAP_SIZE_IN_K % 2 == 0

#if METAPOOL_MAX_CHUNK_SIZE < METAPOOL_MIN_CHUNK_SIZE
#error "METAPOOL_MAX_CHUNK_SIZE < METAPOOL_MIN_CHUNK_SIZE"
#endif //METAPOOL_MAX_CHUNK_SIZE < METAPOOL_MIN_CHUNK_SIZE

#if METAPOOL_MIN_CHUNK_SIZE % 2 != 0
#error "METAPOOL_MIN_CHUNK_SIZE must be even"
#endif //METAPOOL_MIN_CHUNK_SIZE % 2 != 0

#if METAPOOL_MAX_CHUNK_SIZE % 2 != 0
#warning "METAPOOL_MAX_CHUNK_SIZE must be even"
#endif //METAPOOL_MAX_CHUNK_SIZE % 2 != 0

#if METAPOOL_MAX_CHUNK_SIZE / 1024 > METAPOOL_HEAP_SIZE_IN_K
#error "METAPOOL_MAX_CHUNK_SIZE / 1024 > HEAP_SIZE"
#endif //METAPOOL_MAX_CHUNK_SIZE / 1024 > METAPOOL_HEAP_SIZE_IN_K

#if THIRTY_TWO_BIT == 1
#if METAPOOL_MIN_CHUNK_SIZE < 16
#warning "minimum chunk size is less than housekeeping size"
#endif //METAPOOL_MIN_CHUNK_SIZE < 16
#else //64 bit
#if METAPOOL_MIN_CHUNK_SIZE < 32
#warning "minimum chunk size is less than housekeeping size"
#endif //METAPOOL_MIN_CHUNK_SIZE < 32
#endif //THIRTY_TWO_BIT

#define MAX_CHUNKS_PER_HEAP (METAPOOL_HEAP_SIZE_IN_K/ METAPOOL_MIN_CHUNK_SIZE)

//You should probably get rid of this and use a
//better Singleton class beacuse static storage
//is rather slow to access.
//I plan on using a thread local storage singleton,
//so each thread gets it's own MetaPool. This will
//make everything threadsafe without using any
//mutexes. Providing the same thread that allocates
//a given piece of memory also frees it...
template
class Singleton
{
public:
inline static T* instance()
{
static T instance;
return &instance;
}
};

//Portability functions for allocating memory aligned on a
//particular bytes boundary. This allocation system always
//uses 16-byte alignment.

#ifdef unix
#include //cstdlib doesn't have posix_memalign!

//WARNING: arg order reversed from prev versions
inline void* alignedAlloc(size_t bytes, size_t align=BYTE_ALIGNMENT)
{
void *rtn = NULL;
#ifdef METAPOOL_USE_MALLOC
posix_memalign(&rtn,align,bytes);
#else
rtn = malloc(bytes);
#endif
if(!rtn) throw std::bad_alloc();
return rtn;
}

inline void alignedFree(void *mem)
{
assert(mem);
free(mem);
}

#elif defined WIN32
#include
//WARNING: arg order reversed from prev versions
inline void* alignedAlloc(size_t bytes,size_t align=BYTE_ALIGNMENT)
{
#ifdef METAPOOL_USE_MALLOC
void* rtn = malloc(bytes);
#else
void* rtn = _aligned_malloc(bytes,align);
#endif
if(!rtn) throw std::bad_alloc();
return rtn;
}

inline void alignedFree(void *mem)
{
assert(mem);
#ifdef METAPOOL_USE_MALLOC
free(mem);
#else
_aligned_free(mem);
#endif
}

#else
#error "Either unix or WIN32 was not defined, or you have an unsupported system"
#endif //unix

namespace Memory
{
class MetaPool;

namespace Details
{

class Heap;

////////////////////////////////////////////////////////////////////////
// //
// Struct Node. A header for a chunk and also a linked //
// list node. The list structure eliminates searching for //
// free chunks. //
// //
////////////////////////////////////////////////////////////////////////

struct Node
{
//Holds the actual size of the allocation to speed up realloc
//and test if free
size_t mAllocSize;
//next node in this heap
Node* mNext;
//heap that owns this node (BIG freeChunk speedup)
Heap* mHeap;
//To be used for reference counted smart pointers.
//Also ensures sizeof(Node) is 16 (32 on 64-bit systems) bytes to
//eliminate memory alignment issues
size_t mRefCount;
};

CTASSERT(sizeof(char)==1,CharSizeIsNot1Byte);
#if THIRTY_TWO_BIT == 1
CTASSERT(sizeof(Node)==16,NodeSizeIsNot16Bytes)
#else
CTASSERT(sizeof(Node)==32,NodeSizeIsNot32Bytes);
#endif //THIRTY_TWO_BIT

/*
** NB I use c-style casts in places because it is much
** easier to read. IMHO the reinterpret_cast syntax
** was never designed for such crazy casting.
*/

//Finds the mHeap member in the node that this memory
//belongs to.
inline Heap* getOwnerHeap(void* mem)
{
// *(Heap**) (((char*)mem) - HeapOffsetIntoNodeInBytes);
return *(Heap**) (((char*)mem) - sizeof(size_t) - sizeof(Heap*));
}

//returns the Node that owns this memory.
inline Node* getNode(void* mem)
{
return (Node*)(((char*)mem) - sizeof(Node));
}

class Pool;

////////////////////////////////////////////////////////////////////////
// //
// Class Heap. Uses a very strange and (eventually) tangled //
// linked list of nodes and chunks that get allocated as //
// one block of ram, arranged like this: //
// //
// |--------------| The tangle comes from nodes and chunks //
// | Node | being reused, and the linked list means //
// |--------------| there is no searching through the heap //
// | Allocated | for a free node/chunk combination. //
// | chunk | The nodes are headers for the chunks //
// |--------------| //
// | Next Node | Strangely this class becomes much slower //
// |--------------| if I make this block of memory part of //
// |Next Allocated| the instance rather than dynamically //
// | chunk | allocating it. //
// |--------------| //
// //
////////////////////////////////////////////////////////////////////////

class Heap
{
friend class Pool;
//the mem that gets split into Nodes and chunks
void* mMemory;
//sizeof(Node) + chunkSize
size_t mNodeSize;
//the next unused node
Node* mNextFree;
//the last node in the heap
Node* mLastNode;
//the size of the mem chunks this heap contains
size_t mChunkSize;
//the noOfChunks in this heap that have been allocated
size_t mAllocCount;
//the max allocs this heap can hold
size_t mMaxAllocs;
//used in cleanup phase
Pool* mParent;
//the requested heapSize
const size_t mSizeInBytes;
//size of the heap + sizeof the max possible no of nodes
const size_t mRealSize;

inline void setup(Pool* parent, size_t chunkSize)
{
mParent = parent;
mChunkSize = chunkSize;
mNodeSize = chunkSize + sizeof(Node);
mMaxAllocs = mRealSize / mNodeSize;
mAllocCount = 0;

//Set all the pointers. This loop will cause a little
//lag for small chunkSize heaps, but it can't be avoided :(
Node *n;
for(int i=0;imHeap = this;
n->mNext = (*this)[i+1]; //this will go past the end, but it gets
//set to NULL on the last Node below
n->mAllocSize = 0;
}

// set pointer to first free node.
mNextFree = reinterpret_cast(mMemory);
// set pointer to last free Node.
mLastNode = (*this)[mMaxAllocs-1];
mLastNode->mAllocSize = 0;
mLastNode->mHeap = this;
mLastNode->mNext = NULL;
}

public:

//make absolutely sure this thing throws bad_alloc
//independent of compiler stupidities
inline void* operator new(size_t size)
{
void* mem = malloc(size);
if(!mem)
throw std::bad_alloc();
return mem;
}

inline void operator delete(void* mem)
{
assert(mem);
::free(mem);
}

inline Heap(Pool* parent, size_t chunkSize, size_t size) :
mChunkSize(chunkSize),
mMemory(NULL),
mSizeInBytes(size),
mRealSize(size+(MAX_CHUNKS_PER_HEAP*sizeof(Node)))
{
mMemory = alignedAlloc(mRealSize);
setup(parent, chunkSize);
}

inline ~Heap()
{
if(mMemory)
alignedFree(mMemory);
}

//makes a node and chunk appear as one block of memory,
//and a heap as an array of them. mNodeSize is sizeof(Node)
//plus the size of the chunk.
inline Node* operator[](size_t idx) const
{
return reinterpret_cast((char*)mMemory + (idx*mNodeSize));
}

inline size_t chunkSize() const { return mChunkSize; }

inline size_t sizeInBytes() const { return mSizeInBytes; }

inline size_t allocCount() const { return mAllocCount; }

inline size_t maxAllocs() const { return mMaxAllocs; }

inline bool full() const { return !mNextFree; }

inline bool empty() const { return mAllocCount == 0; }

//Returns a free chunk from this heap and sets up the next
//allocation. This method will never be called if mNextFree
//is NULL
inline void* allocChunk(size_t allocSize)
{
void *chunk = NULL;
mNextFree->mAllocSize = allocSize;
mNextFree->mRefCount = 0;
chunk = (void*)(((char*)mNextFree) + sizeof(Node));
mNextFree = mNextFree->mNext;
mAllocCount++;
return chunk;
}

//Frees a chunk of memory and sets up all the pointers so no
//searching for free chunks is needed. This is what causes
//the tangles in the list. When the heap this belongs to is
//empty it gets passed to the parent pool for re-use.
//Implemented after Pool because it needs to call a Pool method.
inline void freeChunk(void *mem);
};

//A wrapper function for alignedAlloc that places a Node structure
//prior to the usable memory. Used when the requested allocation
//size will not fit in a Pool. The Node structure makes the allocation
//able to be freed and realloced by Memory::free and Memory::realloc.
inline void* bigAlloc(size_t size)
{
void* mem = alignedAlloc(size+sizeof(Details::Node));
Details::Node* n = reinterpret_cast(mem);
n->mAllocSize = size;
n->mNext = NULL;
n->mHeap = NULL;
n->mRefCount = 0;
return (void*)((char*)mem+sizeof(Details::Node));
}

////////////////////////////////////////////////////////////////////////
// //
// Class Pool. Manages a vector of (equal chunkSize) heaps //
// for allocation, also facilitates recycling of heaps both //
// within itself and though MetaPool. //
// //
////////////////////////////////////////////////////////////////////////

class Pool
{
protected:
friend class Heap;
friend class MetaPool;
Heap* mCurrentHeap;
Heap* mSpareHeap; //to avoid calling Heap::setup when there is no need
size_t mChunkSize;
MetaPool* mParent;
const size_t mDefaultHeapSize;
typedef std::vector HeapVector;
HeapVector mHeaps;

//When a heap is completely empty try putting it in the
//mSpareHeap member (so setup doesn't have to be called).
//If mSpareHeap is already used, add the heap to a LIFO
//in MetaPool for re-use elsewhere.
void cleanup(Heap* h);

public:

inline Pool(MetaPool* parent, size_t chunkSize, size_t defaultHeapSize) :
mCurrentHeap(NULL),
mSpareHeap(NULL),
mChunkSize(chunkSize),
mParent(parent),
mDefaultHeapSize(defaultHeapSize)
{
mHeaps.reserve(100);
}

inline ~Pool()
{
HeapVector::iterator i;
for(i=mHeaps.begin();i!=mHeaps.end();i++)
{
delete *i;
}
}

//Find a heap with a reasonable numer of free slots. If none
//are free allocates a new heap as a last resort, hence could
//throw std::bad_alloc. Implemented after MetaPool.
void findHeap();

//Returns a free chunk. Can throw std::bad_alloc.
//Contains the only branch I can't remove in mempry present conditions.
//Will trigger plenty more branches if mCurrentHeap is full though.
inline void *allocChunk(size_t allocSize)
{
if(!mCurrentHeap || mCurrentHeap->full())
findHeap();
return mCurrentHeap->allocChunk(allocSize);
}

inline size_t chunkSize() const { return mChunkSize; }
inline size_t noOfHeaps() const { return mHeaps.size(); }
inline size_t defaultHeapSize() const { return mDefaultHeapSize; }
};

/*
** This Heap method is here because it has to call a Pool
** method.
*/

//Frees a chunk of memory and sets up all the pointers so no
//searching for free chunks is needed. This is what causes
//the tangles in the list. When the heap this belongs to is
//empty it gets passed to the parent pool for re-use.
inline void Heap::freeChunk(void *mem)
{
Node* n = getNode(mem);
n->mRefCount = 0;
n->mAllocSize = 0;
//set up the pointers
Node* prevLast = mLastNode;
mLastNode = n;
prevLast->mNext = mLastNode;
mLastNode->mNext = NULL;
if(!mNextFree)
mNextFree = mLastNode;
mAllocCount--;
if(mAllocCount == 0)
mParent->cleanup(this);
}

/***********************************************************************
** **
** Begin template metaprogramming **
** **
***********************************************************************/

//////////////////////////////////////////////////////////////////////
// A compile-time loop to test if a number is a power of 2 //
//////////////////////////////////////////////////////////////////////
template
struct PowerOf2Test
{
enum
{
value = !(N%2) ? PowerOf2Test<(int)((N/2.0)+0.5)>::value : false
};
};
//it is a power of two
template <>
struct PowerOf2Test<2>
{
enum { value = true };
};
//it is not a power of two
template <>
struct PowerOf2Test<0>
{
enum { value = false };
};

//////////////////////////////////////////////////////////////////////
// A compile-time test to if a number is a power of 2 plus //
// (itself divided by two). This figures of that the next power //
// of two from 48 is 64 for example. //
//////////////////////////////////////////////////////////////////////
template
struct HalfwayTest
{
enum
{
//perhaps this first PowerOf2Test is redundant, but icc
//complains without it
value = PowerOf2Test::value ? false : PowerOf2Test<(int)(Size+((Size/3.0)+0.5))>::value
};
};

//////////////////////////////////////////////////////////////////////
// A metaprog "if" statement //
//////////////////////////////////////////////////////////////////////
template struct If;

//the true case
template
struct If
{
typedef TCase result;
};

//the false case
template
struct If
{
typedef FCase result;
};

//////////////////////////////////////////////////////////////////////
// True and False boolean wrapper types //
//////////////////////////////////////////////////////////////////////
struct True
{
enum { value = 1 };
};

struct False
{
enum { value = 0 };
};

#define MIN_CHUNKSIZE METAPOOL_MIN_CHUNK_SIZE
#define MAX_CHUNKSIZE METAPOOL_MAX_CHUNK_SIZE
///////////////////////////////////////////////////////////////////////
//Compile time sanity checks for heapSize, and min and max chunkSizes//
///////////////////////////////////////////////////////////////////////

typedef If::value,
True, HalfwayTest >::result HeapSizeCondition;
CTASSERT(HeapSizeCondition::value, BadHeapSize);

typedef If::value,
True, HalfwayTest >::result MinChunkCondition;
CTASSERT(MinChunkCondition::value, BadMinChunkSize);

typedef If::value,
True,HalfwayTest >::result MaxChunkCondition;
CTASSERT(MaxChunkCondition::value, BadMaxChunkSize);

//////////////////////////////////////////////////////////////////////
// Calculate (compile time) the number of pools required //
//////////////////////////////////////////////////////////////////////

template
struct PoolCount
{
enum
{
value = PoolCount::value + 1
};
};

//These terminates the above loop
template <>
struct PoolCount
{
enum { value = 1 };
};

template <>
struct PoolCount<(MIN_CHUNKSIZE+(MIN_CHUNKSIZE/2))>
{
enum { value = 1 };
};

template <>
struct PoolCount<(MIN_CHUNKSIZE+(MIN_CHUNKSIZE/3))>
{
enum { value = 1 };
};

//provides easy acces to the compile time calculated number of
//pools
struct NoOfPools
{
enum
{
max = MAX_CHUNKSIZE,
maxIsPowerOf2 = PowerOf2Test::value,
pwrOf2 = PoolCount<(maxIsPowerOf2 ? max : (max-(max/3)))>::value,
notPwrOf2 = PoolCount<(maxIsPowerOf2 ? (max-(max/4)) : max)>::value,
value = pwrOf2 + notPwrOf2
};
};

///////////////////////////////////////////////////////////////////////
// Calculate (compile time) the chunkSize of the pool at Idx //
///////////////////////////////////////////////////////////////////////
template
class ChunkSize
{
struct MinChunkPowerOf2
{
enum
{
powerOf2 = !(Idx%2),
value = powerOf2 ? ChunkSize::value - ChunkSize::value/3 : ChunkSize::value - ChunkSize::value/4
};
};
struct MinChunkNotPowerOf2
{
enum
{
powerOf2 = Idx%2,
value = powerOf2 ? ChunkSize::value - ChunkSize::value/3 : ChunkSize::value - ChunkSize::value/4
};
};
public:
enum
{
value = If::value, MinChunkPowerOf2, MinChunkNotPowerOf2>::result::value
};
};

//This terminates the above loop
template<>
class ChunkSize
{
struct MaxChunkPowerOf2
{
enum { value = MAX_CHUNKSIZE + (MAX_CHUNKSIZE/2) };
};
struct MaxChunkNotPowerOf2
{
enum { value = MAX_CHUNKSIZE + (MAX_CHUNKSIZE/3) };
};
public:
enum
{
value = If::value, MaxChunkPowerOf2, MaxChunkNotPowerOf2>::result::value
};
};

///////////////////////////////////////////////////////////////////////
// Calculate (compile time) the best pool Idx to allocate Size bytes //
///////////////////////////////////////////////////////////////////////
template
struct PoolIdxSearch
{
enum
{
//will the requested alloc fit in a chunk?
inRange = Size <= MAX_CHUNKSIZE? true : false,
//does the alloc fit in a pool, and it will fit in the current pool?
fitsHereToo = (inRange && (Size < ChunkSize::value)) ? true : false,
//if it does return the current index, otherwise keep recursing
value = fitsHereToo ? Idx : PoolIdxSearch::value
};
};

//terminate the loop at NoOfPools::value, which means we have to use
//the run-time allocator because there isn't a pool large enough
template
struct PoolIdxSearch
{
enum { value = -1 };
};

#undef MIN_CHUNKSIZE
#undef MAX_CHUNKSIZE

/***********************************************************************
** **
** End template metaprogramming **
** **
***********************************************************************/

} //namespace Details

////////////////////////////////////////////////////////////////////////
// //
// Class MetaPool. Encapsulates multiple pools of different //
// chunkSizes, handles runtime searching for the best size //
// pool for an alloc, and manages recylcling of heaps between //
// pools with different chunkSizes. //
// //
////////////////////////////////////////////////////////////////////////
class MetaPool
{
friend class Details::Pool;

typedef Details::NoOfPools NoOfPools;

//holds all the pools
Details::Pool* mPools[NoOfPools::value];

//some space to store empty heaps for re-use between
//pools. Using a vector as a LIFO because of constant-time
//pop_back() and ability to reserve space (which deque can't)
typedef std::vector HeapVector;
HeapVector mSpareHeaps;

inline void addSpareHeap(Details::Heap* h)
{
if(mSpareHeaps.size() > METAPOOL_MAX_SPARE_HEAPS)
{
delete h;
return;
}
mSpareHeaps.push_back(h);
}

inline Details::Heap* findSpareHeap(size_t chunkSize)
{
Details::Heap* h = NULL;
if(mSpareHeaps.empty())
{
for(int i=0;imSpareHeap && (mPools[i]->mSpareHeap->sizeInBytes() % chunkSize))
{
h = mPools[i]->mSpareHeap;
mPools[i]->mSpareHeap = NULL;
return h;
}
}
return NULL;
}

HeapVector::iterator it = mSpareHeaps.begin();
for(;it!=mSpareHeaps.end();it++)
{
if(!((*it)->sizeInBytes() % chunkSize))
{
h = (*it);
mSpareHeaps.erase(it);
return h;
}
}
return NULL;
}

public:

inline Details::Pool* pool(int idx)
{
return mPools[idx];
}

MetaPool()
{
const size_t heapSizeInK = METAPOOL_HEAP_SIZE_IN_K;
const size_t minChunkSize = METAPOOL_MIN_CHUNK_SIZE;
const size_t maxChunkSize = METAPOOL_MAX_CHUNK_SIZE;
//make sure there is plenty of room so the vector
//doesn't have to realloc
mSpareHeaps.reserve(METAPOOL_MAX_SPARE_HEAPS);
bool minChunkPwrOf2 = Details::PowerOf2Test::value;
bool maxChunkPwrOf2 = Details::PowerOf2Test::value;
bool heapSizePwrOf2 = Details::PowerOf2Test::value;
size_t pwrOf2HeapSize = heapSizePwrOf2 ? heapSizeInK : heapSizeInK-(heapSizeInK/3);
size_t notPwrOf2HeapSize = heapSizePwrOf2 ? heapSizeInK-(heapSizeInK/4) : heapSizeInK;
pwrOf2HeapSize *= 1024;
notPwrOf2HeapSize *= 1024;
size_t currentSize = minChunkSize;
if(minChunkPwrOf2)
{
int i;
int j = 0;
for(i=0;i METAPOOL_MAX_CHUNK_SIZE)
return Details::bigAlloc(size);
for(int i=0;ichunkSize())
return mPools[i]->allocChunk(size);
}
throw std::bad_alloc();
return NULL;
}

//Copies an allocation to a larger one and frees the old alloc.
//Can throw std::bad_alloc.
inline void* realloc(void *mem, size_t newSize)
{
assert(mem);
void* newMem = NULL;
Details::Node* n = Details::getNode(mem);
Details::Heap* h = n->mHeap;
if(!h)
{
if(newSize > n->mAllocSize || newSize < (n->mAllocSize/2))
{
newMem = alloc(newSize);
memcpy(newMem,mem,n->mAllocSize);
h->freeChunk(mem);
return newMem;
}
return mem;
}
if(newSize > h->chunkSize() || newSize < (h->chunkSize()/2))
{
void *newChunk = alloc(newSize);
memcpy(newChunk,mem,n->mAllocSize);
h->freeChunk(mem);
return newChunk;
}
return mem;
}

//Free an allocation. This WILL crash if you pass it memory
//that was not allocated through this allocation system.
inline void free(void *mem)
{
assert(mem);
Details::Heap *h = Details::getOwnerHeap(mem);
if(!h)
{
alignedFree(Details::getNode(mem));
return;
}
h->freeChunk(mem);
}
};

/*
** These Pool methods are here because they rely on calling
** MetaPool methods
*/
//When a heap is completely empty try putting it in the
//mSpareHeap member (so setup doesn't have to be called).
//If mSpareHeap is already used, add the heap to a list
//in MetaPool for re-use elsewhere (though setup will
//have to be called).
void Details::Pool::cleanup(Details::Heap* h)
{
if(h==mCurrentHeap && !mSpareHeap)
{
mSpareHeap = h;
mCurrentHeap = NULL;
return;
}
if(h!=mCurrentHeap)
{
mParent->addSpareHeap(h);
//need to remove h from mHeaps!!!
assert(!mHeaps.empty());

HeapVector::iterator it = mHeaps.end();
it--;
while(it!=mHeaps.begin())
{
if((*it) == h)
{
mHeaps.erase(it);
return;
}
it--;
}
assert(mHeaps[0]==h);
mHeaps.erase(it);
}
}

//Find a heap with reasonable number of free chunks. If none are suitable
//allocates a new heap, hence could throw std::bad_alloc.
void Details::Pool::findHeap()
{
if(mCurrentHeap)
{
mHeaps.push_back(mCurrentHeap);
mCurrentHeap = NULL;
}
if(mSpareHeap)
{
mCurrentHeap = mSpareHeap;
mSpareHeap = NULL;
return;
}

//Fallback to looking for a recent Heap with at least threshhold free chunks.
//Looping backwards to exploit any short-lived objects
//that have been freed.
int size = mHeaps.size();
int threshhold = 1;
int stopAt = 0;
if(size)
{
Heap* h;
stopAt = size > 5 ? 5 : size;
for(int i=size-1;i>=stopAt;i--)
{
h = mHeaps[i];
threshhold = h->maxAllocs() - (h->maxAllocs() / 4);
if(mHeaps[i]->allocCount() <= threshhold)
{
mCurrentHeap = mHeaps[i];
return;
}
}
}

//Fallback to asking MetaPool for a spare Heap and calling setup if found
Heap* fallback = mParent->findSpareHeap(mChunkSize);
if(fallback)
{
fallback->setup(this,mChunkSize);
mCurrentHeap = fallback;
return;
}
//Fallback to allocating a new one
mCurrentHeap = new Heap(this,mChunkSize,mDefaultHeapSize);
}

typedef Singleton MetaSingleton;

//This struct will be the result of an If template if the
//requested size will fit in a pool.
template
struct CTAllocHelper
{
inline static void* alloc(size_t size)
{
return MetaSingleton::instance()->pool(Idx)->allocChunk(size);
}
};

//This struct will be the result of an If template if the
//requested size will not fit in a pool.
template
struct RTAllocHelper
{
inline static void* alloc()
{
return Details::bigAlloc(Size);
}
};

/*
** Normal use API follows
*/

template
inline void* alloc()
{
const int idx = Details::PoolIdxSearch::value;
return Details::If, RTAllocHelper >::result::alloc(Size);
}

template
inline void* alloc()
{
return alloc();
};

//Allocate size bytes (from a pool if possible). Throws std::bad_alloc
//if the request cannot be fulfilled. Uses a runtime search for the
//best pool.
inline void* alloc(size_t size)
{
return MetaSingleton::instance()->alloc(size);
}

template
inline void* realloc(void* mem)
{
void* newMem = NULL;
Details::Node* n = Details::getNode(mem);
Details::Heap* h = n->mHeap;

if(!h)
{
if(Size > n->mAllocSize || Size < (n->mAllocSize/2))
{
newMem = alloc();
memcpy(newMem,mem,n->mAllocSize);
Memory::free(mem);
return newMem;
}
return mem;
}

if(Size > h->chunkSize() || Size < (h->chunkSize()/2))
{
newMem = alloc();
memcpy(newMem,mem,n->mAllocSize);
Memory::free(mem);
return newMem;
}
return mem;
}

//Copies an allocation to a larger one and frees the old alloc.
//Will never shrink an allocation. Uses pools if possible.
//Throws std::bad_alloc if the request cannot be fulfilled.
//Uses a runtime search for the best pool.
inline void* realloc(void *mem, size_t newSize)
{
return MetaSingleton::instance()->realloc(mem, newSize);
}

//Free memory allocated with this system only. Will crash if used on
//memory allocated with malloc/calloc/realloc or the default new
//operators.
inline void free(void* mem)
{
MetaSingleton::instance()->free(mem);
}
//class Test : public AddAllocator
template
struct AddAllocator
{
inline void* operator new(size_t rtSize)
{
assert(sizeof(T)==rtSize);
return Memory::alloc();
}
inline void operator delete(void* mem)
{
Memory::free(mem);
}

inline void* operator new[](size_t size)
{
return Memory::alloc(size);
}
inline void operator delete[](void* mem)
{
Memory::free(mem);
}
};

#define METAPOOL_CT_SINGLE_NEW(klass) inline void* operator new(size_t rtSize){ assert(sizeof(klass)==rtSize); return Memory::alloc::alloc();}

#define METAPOOL_RT_SINGLE_NEW inline void* operator new(size_t size){ return Memory::alloc(size);}

#define METAPOOL_SINGLE_DELETE inline void operator delete(void* mem){ Memory::free(mem);}

#define METAPOOL_ARRAY_NEW inline void* operator new[](size_t size){ return Memory::alloc(size);}

#define METAPOOL_ARRAY_DELETE inline void operator delete[](void* mem){ Memory::free(mem);}

#define METAPOOL_CT_ALLOCATORS(klass) METAPOOL_CT_SINGLE_NEW(klass) METAPOOL_SINGLE_DELETE METAPOOL_ARRAY_NEW METAPOOL_ARRAY_DELETE

#define METAPOOL_RT_ALLOCATORS METAPOOL_RT_SINGLE_NEW METAPOOL_SINGLE_DELETE METAPOOL_ARRAY_NEW METAPOOL_ARRAY_DELETE

}//namespace Memory

//clean up the macros
#undef CTASSERT
#undef BYTE_ALIGNMENT
#undef MAX_CHUNKS_PER_HEAP

#endif //__METAPOOL_HH__

[/code]


Submitted by lorien on Tue, 08/02/05 - 12:58 AM Permalink

And here is a completely stupid and synthetic little benchmark prog

[code]
#define _XOPEN_SOURCE 600

#include
#include "MetaPool.h"

/*
** millisecond accurate timer code for win32
*/

#ifdef WIN32
#include
#ifdef __GNUC__
#ifndef INT64
#define INT64 long long
#endif //INT64
#endif //__GNUC__

#include
#include
#define LARGE_INTEGER_PTR LARGE_INTEGER*

#ifdef _MSC_VER
#define LARGE_INTEGER_PTR LARGE_INTEGER*
#endif //_MSC_VER
static INT64 gDivision;
static INT64 gStartTime;

size_t time()
{
INT64 time;
QueryPerformanceCounter((LARGE_INTEGER_PTR)&time);
time -= gStartTime;
time /= gDivision;
return (unsigned) time;
}
#endif //WIN32

/*
** millisecond accurate timer code for unix
** on x86 Linux gettimeofday is as accurate as
** QueryPerformanceCounter on win32.
*/

#ifdef unix
#include
#include
#include
#include
timeval gStartTime;
timeval gSleepTime;

size_t time()
{
size_t time;
struct timeval tv;
gettimeofday(&tv,NULL);
time = (tv.tv_sec - gStartTime.tv_sec) * 1000;
time += ((tv.tv_usec - gStartTime.tv_usec) / 1000);
return time;
}
#endif //unix

void alignedAllocTest(const int runs, int size)
{
size_t start = time();
void** test = (void**)Memory::alloc(runs*sizeof(void*));
for(int repeats=0;repeats<10;repeats++)
{
int i;
for(i=0;i
void ctPoolTest(const int runs)
{
void** test = (void**)Memory::alloc(runs*sizeof(void*));
size_t start = time();
for(int repeats=0;repeats<10;repeats++)
{
int i;
for(i=0;i();
}
for(i=0;i(t1Runs);

std::cout<<"

";

const int t2Runs=100000;
const int t2Size=10;
std::cout<<"Test2: runs "<(t2Runs);

std::cout<<"

";

const int t3Runs=50000;
const int t3Size=1023;
std::cout<<"Test3: runs "<(t3Runs);

std::cout<<"

";

const int tvRuns=3000;
const int tvLower=16;
const int tvUpper=1024;
std::cout<<"Var size test: runs "<

Submitted by Daemin on Tue, 08/02/05 - 9:23 AM Permalink

Erm, it looks "neat"... now I just have to look over it properly when I am concentrating in some form of IDE.

Submitted by lorien on Wed, 09/02/05 - 3:21 AM Permalink

What are you guys (Hazard and Barry) doing in the programmer discussion area then? [:)] This thing will increase framerates.

I'm just fixing a bug btw: the alignedAlloc function for windows doesn't throw a std::bad_alloc if an allocation fails (doh!)

And another Doh!, an "#include " made in in somehow. Both these things are fixed now.

Submitted by mcdrewski on Wed, 09/02/05 - 4:21 AM Permalink

...so there's a sanity check that the MIN_CHUNK_SIZE and MAX_CHUNK_SIZE are even (your error message saying "must be power of two"), but what actually happens at compile time if they're not powers of two? Do we simply end up with one of those unreadable template error messages, or does it infinitely recurse?

It certainly looks funky - but I'm still confused as to how Pools, Heaps and Chunks really work together. If I'm reading it right, You have 'n' pools, each managing a set of Heaps, and each Heap manages a number of fixed size block of memory (Chunks) ranging from MIN_CHUNK_SIZE to MAX_CHUNK_SIZE in powers of two.

Why the level of indirection using Heaps? What benefit does that give over simply a number of pools of chunks? Perhaps a nice high-level disagram of the data structure would explain what it's achieving...

Submitted by lorien on Fri, 11/02/05 - 12:17 AM Permalink

If they are not powers of two everything should still work (I'll do some destructive testing, thanks), but the 16-byte alignment of every allocation will be thrown off, making it useless for vector operations (I'm doing audio synthesis using SSE intrinsics, and the realtime scripting language, whilst general purpose and pure OO, is designed for audio). The next chunksize will still be calculated by 2* currentChunkSize. Also power of two is a good idea because (on the x86) double precision floats are very slow if not aligned on 8-byte boundaries, and some other architectures just crash without things being aligned properly.

I could do a real compile-time test for power-of-two-ness using another metaprogrammed loop. I probably will if it goes boom when they aren't.

I'm afraid I'm going to have to save the nice diagrams for an academic journal- I need to try to properly publish this because it is very close to a solution to an "impossible" problem of computer science (avoiding the runtime search in memory allocators). I wish I could make it work with languages other than C++ too [:)]
The relation is:
* Heaps manage Nodes, which are headers for chunks. Chunks are where you put your data and they don't exist as a class- they are just ram waiting to be used. Heaps have a fixed chunksize, and by making all Heaps equal in size they can be easily re-used by different Pools when they are empty. This is why the Heap class exists: to pass unused memory around between Pools.
* Pools manage Heaps. Pools also have a fixed chunksize. The Pool class also can ask MetaPool for a spare heap or allocate a new one if the current Heap is full.
* MetaPool manages an array of Pools, each of which has different size chunks.

The goal with this arrangement is to avoid calling malloc or new like the plague. They can have highly variable latency, which is a Bad Thing (tm) for realtime software.

I'm just posting another update, this one has some improved comments, and my CTAssert class was stuffed, it just compiled whether or not the condition was false. Now it prints nice errors. Also I #undef the macros I've defined now.

One more thing, malloc and alignedAlloc functions seem to perform much better on AMD CPUs, it's on P4s that I've seen well over 10 fold performance increase. I suspect it's the stupidly long pipelines in P4s and failed branch predictions. On Athlons the max increase I've seen is around 6x.

Submitted by lorien on Fri, 11/02/05 - 12:23 AM Permalink

Do people think this should be sticky?

Submitted by davidcoen on Fri, 11/02/05 - 5:00 AM Permalink

i think it should be sticky, i need to re-read it a few time, but with your latter concept explanation post, i think i understand it. very, very nice.

Submitted by lorien on Sat, 12/02/05 - 12:01 AM Permalink

Thanks. I'll have another small update later today.

Submitted by lorien on Sat, 12/02/05 - 1:22 AM Permalink

Update done, changes are the copyright (apparently I share it with the uni, despite much of this code being written in early 2003 before I started), a typo (CTAssert instead of CTASSERT in a 64-bit section), replaced #errors with #warnings if the chunkSizes are not powers of two, replaced #warn (an invalid preprocessor instruction) with #warning, and a few little code style and comment cleanups.

mcdrewski: it does work with non-power of two chunkSizes, but as I said above it's probably a bad idea to use them.

The Heap class serves another purpose btw: it allows Pools to expand without having to realloc. Without the Heap class a Pool would have to hold one big chunk of contiguous memory, and the whole thing would probably need to be copied during a realloc when it filled up. Instead another Heap* is added to a vector inside Pool, and the vector already has a fair bit of space (for Heap*s) reserved to avoid calling realloc or whatever your STL allocator does to resize a vector.

David: I find this code pretty brain bending too, and I find template metaprogramming a little perverse...

Submitted by Grover on Sat, 12/02/05 - 9:17 AM Permalink

Thats pretty damn handy stuff lorien.
I have seen similarly developed ideas for console memory allocation, however none of which cater for the runtime new/alloc problem.
I have seen solutions in Game Programming Gems which use a couple of differing methods, but again, none that cover general purpose use - really great stuff.

I would imagine this would be especially useful for consoles, where their alloc routines are very slow - PS2 is an example I can think of. I'll also try it out on NGage and GBA, and see how it goes.

Submitted by lorien on Tue, 15/02/05 - 1:59 AM Permalink

Another update with a little more metaprogramming. It's documented in the comments. It goes faster :)

Thanks Grover, and thanks Souri for making it sticky.

I actually didn't think it would be good for consoles because of wasted memory because of the pools. Though of course if a near constant frame-rate is important this should be great.

I don't think I can optimise it much more with my current knowledge of c++ and hardware, so further updates are likely to be bug fixes. If you port it to another platform and are not prevented from doing so by an evil NDA please post those ports here and I'll add them to the main code- providing they don't break it for normal computers.

BTW the only "normal" textbook I've seen that mentions template metaprogramming and similar high end techniques is the 2nd edition of Thinking In C++ by Bruce Eckel. It is available for free in electronic form from http://mindview.net/Books

Submitted by kalin on Tue, 15/02/05 - 8:59 PM Permalink

I'm not sure exactly what you mean by 'normal' textbook, but
two good metaprogramming books come to mind;
Modern C++ Design, by Alexei Alexandrescu.
C++ Template Metaprogramming, by David Abrahams and Aleksey Gurtovoy.

Modern C++ design takes alot of traditional design patterns, and
using heavy template code, makes them very safe, flexible, and
easy to use. Alexandrescu's code has problems with alot of compilers
though, it's just too much for them.

C++ Template Metaprogramming is generally a walk-through of the
Boost.MPL library, and showing ways to use it, and how it works.
It's probably good to get a fundamental understanding on how to
actually use metaprogramming techniques. Also, the Boost.MPL library
is quite well supported by a range of compilers, despite it's
complexity. (Quite a feat!)

[edit]

Also, the Boost.Pool library may be of interest in this thread.
http://www.boost.org/libs/pool/doc/index.html

Submitted by lorien on Tue, 15/02/05 - 11:43 PM Permalink

I don't call either of those books "normal", I call them "crazy", but only in the nicest of ways [:)]. Loki (the norse god of mischief) is a perfect name for Alexei's library...
The Boost Metaprogramming library has to be the most complex and innovative library I've ever seen, and I don't even pretend to have anything other than the barest glimmering of understanding of what can be done with it. I've got the book on order.

Be careful asking questions about David Abrahams work on the boost mailing list, I haven't dared for years after copping a real flaming from him- in front of many of the most full-on c++ gurus in the world. I was just asking questions which the docs to his boost::python library didn't make clear.

*****

Sigh, I was completely blind in that update yesterday... There were two conditional branches the instruction stream for a compile time searched allocation when memory was available.

I have reduced it to one, and I simply can't get rid of it :(

These branches are important because they are slow if the CPU's prediction algorithm fails.

Fortunately this one should be quite predictable. It is in Pool::allocChunk().

Malloc/free (which is the same as new/delete) algorithms will commonly have hundreds of branches, many of which are not at all easy to predict. My synthetic benchmark should be showing close to best case for malloc (and the alignedAlloc functions use the same algorithms).

Just posting the update.

Submitted by lorien on Wed, 16/02/05 - 2:26 AM Permalink

This thing I've made is a lot more sophistcated than the boost memory pools: they are single pools for fixed size allocations. This is a general purpose library made up of multiple pools with around the same speed increase as the single pool allocator.

Submitted by lorien on Sat, 19/02/05 - 12:23 AM Permalink

I dug out my copy of Modern C++ Design again, and found the author's name to be Andrei, not Alexei. There seems to be some confusion here, I was sure Alexei was wrong previously, but a google search returned Alexei so that is what I wrote above.
The cover of the book has Andrei printed on it.

I'll post a fix to a nasty bug shortly: having the Singleton::instance method inlined could lead to multiple instances of MetaPool in different translation units :(

Submitted by lorien on Sat, 19/02/05 - 2:29 AM Permalink

I'll have to do some research on the implications of static local variables in inline functions to be sure. The problem could still be there if the compiler decides to inline the method anyway.
The issue code is:
[code]
template
class Singleton
{
public:
static T* instance();
};

template
T* Singleton::instance()
{
static T instance;
return &instance
}
[/code]

Submitted by lorien on Sat, 19/02/05 - 2:55 AM Permalink

From some testing it doesn't seem to be a problem, even in inline functions. It is hard to find info about this possible issue (perhaps it isn't an issue- imho it shouldn't be, but inline functions and static local variables are an interesting mix).

Submitted by kalin on Sat, 19/02/05 - 11:45 AM Permalink

Alexei, oops. I guess I was tired that day, I had the name printed on a book no more than 40cm from me. -_-

As for the static variables and inline functions.. I don't think it would ever be a problem. Since the object is in static-storage space, it would be impossible to compile the program if it generated a seperate additional copy per call, because it would break the ODR (One definition rule) for the linker. (Any workarounds would just be an impossible nightmare - what would their mangled link-names be?)

Submitted by Blitz on Tue, 22/02/05 - 4:20 AM Permalink

Local statics in inline functions are a problem when using DLL's. It will create one copy inside the DLL, and another copy for any access outside of the DLL. I don't think it's a problem anytime you are using normal static (early?) binding. So static libs and just compiling into your executable should be fine.
CYer, Blitz

Submitted by lorien on Fri, 25/02/05 - 12:16 AM Permalink

Just posted a huge update. Please use this one because there was a showstopper crash bug in all previous versions.

Docs in this major revision will improve with time.

Perhaps I should make a custom non-template Singleton for this- having seperate memory management between dlls will really stink.

Updated the little stupid and synthetic benchmark too.

This version of the allocator is not designed to give fantastic results in stupid benchmarks. It is designed for much better efficiency in real-world progs.

Submitted by lorien on Fri, 25/02/05 - 2:01 AM Permalink

Just made a very minor update: it didn't build with msvc.net because I'd left out one "template<>" befor a full specialisation. Both g++ and icc didn't complain.

Apparently the msvc.net preprocessor doesn't support "#warning", it tells me "invalid preprocessor command". This means my macro overloading stuff doesn't work on msvc.

Submitted by redwyre on Thu, 03/03/05 - 11:48 AM Permalink

For those wondering what mysterious invisible comment I made, I actually unstickied this thread.

Submitted by lorien on Fri, 04/03/05 - 3:46 AM Permalink

I should probably find a better home for this thing anyway: it is becoming multi file.

Submitted by wicked on Tue, 08/11/05 - 10:29 AM Permalink

lorien, please elaborate on how your memory allocator is better than the boost pool libraries.

I once wrote a custom memory allocator class based on the bulk allocator of Bartosz Milewski's 1997 classic C++ In Action. [url]http://www.relisoft.com/book/tech/9new.html[/url]

I can't provide the code as it's owned by my previous employer, but it's basically a single class encapsulating the described behaviour and templatized with an integer which tells it which size objects it can allocate.

The classes which you want to allocate space in bulk for are then provided with an overloaded operator new and delete which calls into the allocator class. This way no other code had to be changed in the 600k line code base.

However, since Boost has come up with a pool allocator library, I'd be likely to check that one before writing my own again. Particularly since it's thread safe and portable.

Submitted by lorien on Tue, 08/11/05 - 1:00 PM Permalink

It's different to the boost pools. I think it's better in the ways that are important to my projects, but I'm not going to say "my code is better than boost", I'm not so stupid :)
quote:
Boost C++ Libraries
#8220;...one of the most highly regarded and expertly designed C++ library projects in the world.#8221;#8212; Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

This is off the top of my head, please correct anything about boost if I'm mistaken...

The boost pools are single pools. They are not a generic allocation system like this.

The boost pools have a very roughly similar linked list structure to the one I'm using in my Heap class. The difference is the boost pools header structure for an alloc, and the alloc itself are different pieces of memory. Have a look at my Heap and Node class, the Heap allocates a single block of memory, which it splits into Nodes and Chunks (chunks are what I call the actual memory you use) using some nutty pointer maths.

The boost pools let you choose your own memory alignment. Mine is pretty well set to 16-bytes, though it can be hacked.

This allocator is designed to be an insanely fast and reasonably memory efficient full replacement for new and delete, that takes advantage of the compilers innate knowledge of the sizes of all your classes (the metaprogramming). The number of conditional branches in the allocation path is absolutely minimised.

The boost pools are designed to be special purpose allocators for particular types of objects.

My code has changed somewhat since the posts above, I think the above might have a nasty crash bug still... Will release on sourceforge when I'm good and ready, though just ask if anyone want a pre-release. It's part of a much larger low level library with a bunch of smart pointers, STL allocators, singletons, lockless queues etc, etc. Some of the guts of a realtime (low latency audio), multithreaded scripting language split out into a shared library.

One of the big differences between the code above and my current code is I'm using a thread-local singleton to hold the MetaPool class (a thread-local singleton means one instance per-thread).

Making it into a per-thread allocator means there is not a single mutex in the allocation path, and it's completely threadsafe. But you can't allocate memory in 1 thread then free it in another. This condition is asserted for. A gather a lot of JVMs use this optimisation.

This allocator puts a whole bunch of pools together in a vector. The run-time search involves iterating over the vector until the best-fit pool is found. The metaprogramming does this at compile time.

For anyone at a uni here, if you're interested in memory allocators, go to your libraries website and find the database collection. Look for IEEE XPlore and start seaching. It will bring back full journal papers talking about the issues involved in memory allocators. Maybe add "real-time" to the search too.

As far as I've been able to find this is a unique memory allocator.

The library is under the ZLib license, which is basically "do want you want with this, except don't claim you own it". Sourceforge doesn't have the boost license as an option.

Just ask if you'd like more detail, and I'm going to be talking about this at the thing La Trobe runs instead of the AGDC, which is also on Dec the 2nd. Send me a PM if you'd like an invite, and we feed you too [:)] We don't claim it's a conference, it's a chance to see some postgrad and undergrad student work and (if you want) hire some graduates. There will be people from other unis there too.

Edit: it may be dec 3, I'll check tomorrow, though plenty have invites.