Описание тега tcmalloc
Introduction
TCMalloc
(Thread-Caching malloc
) is a malloc (memory allocation) library developed by Google. It is part of the gperftools
( Google Performance Tools) project. Other tools in the same project include a heap checker (detecting memory leaks), a heap profiler (getting statistics for memory usage) and a CPU profiler (getting statistics for CPU usage).
Official Introduction by Sanjay Ghemawat
TCMalloc
is faster than the glibc
2.3 malloc
(available as a separate library called ptmalloc2
) and other malloc
s that I have tested. ptmalloc2
takes approximately 300 nanoseconds to execute a malloc
/ free
pair on a 2.8 GHz P4 (for small objects). The TCMalloc
implementation takes approximately 50 nanoseconds for the same operation pair. Speed is important for a malloc
implementation because if malloc
is not fast enough, application writers are inclined to write their own custom free lists on top of malloc
. This can lead to extra complexity, and more memory usage unless the application writer is very careful to appropriately size the free lists and scavenge idle objects out of the free list.
TCMalloc
also reduces lock contention for multi-threaded programs. For small objects, there is virtually zero contention. For large objects, TCMalloc
tries to use fine grained and efficient spinlocks. ptmalloc2
also reduces lock contention by using per-thread arenas but there is a big problem with ptmalloc2
's use of per-thread arenas. In ptmalloc2
memory can never move from one arena to another. This can lead to huge amounts of wasted space. For example, in one Google application, the first phase would allocate approximately 300MB of memory for its URL canonicalization data structures. When the first phase finished, a second phase would be started in the same address space. If this second phase was assigned a different arena than the one used by the first phase, this phase would not reuse any of the memory left after the first phase and would add another 300MB to the address space. Similar memory blowup problems were also noticed in other applications.
Another benefit of TCMalloc
is space-efficient representation of small objects. For example, N 8-byte objects can be allocated while using space approximately 8N * 1.01 bytes. I.e., a one-percent space overhead. ptmalloc2
uses a four-byte header for each object and (I think) rounds up the size to a multiple of 8 bytes and ends up using 16N bytes.
Links
- Google Performance Tools (gpreftools) Official Site
- Google Performance Tools (gpreftools) Documentation
- TCMalloc: Thread-Caching Malloc Documentation
- C dynamic memory allocation - Wikipedia