Class Diagram of Counters |
High Level Language Virtual Machines is a core topic of interest for the researchers who are into virtual execution environments. As an open source virtual machine released to 16 universities, as early as 2001, Jikes RVM has been a major drive for many researches. Garbage Collectors for the Jikes RVM are constructed by the framework known as the Memory Management Toolkit (MMTk). Key Compositional Elements in MMTk are utilities, policies, and plans. Apart from these, package org.mmtk.harness contains the MMTk harness test cases. Sanity checks are included in the package org.mmtk.harness.sanity. GcSanity is the collection sanity checker, that takes snapshots before and after a collection to ensure that the live objects after a collection are the same that exist before the collection. Further options are implemented in the classes of org.mmtk.harness.options. We have looked at policies and plans. In this post, we will look at the MMTk utility package, which implements the MMTk mechanisms.
MMTk comes with the utility classes that provides mechanism for the memory management, across multiple policies and plans that use those policies. An ideal example showing the usage of the utility package is, the interface Constants. All the classes either implement the Constants interface, or are sub classes of the classes that implement the interface.
1. Allocation
The
package org.mmtk.utility.alloc
handles the allocation. Allocator is the base
class providing the basis for the processor-local allocation. This
provides the retry mechanism, to prevent the slow-path allocation
causing a garbage collection, violating the assumption of
uninterruptibility. Allocator also ensures that requests are aligned
according to requests. This class is very crucial in garbage
collection as the base class for all the allocator algorithms.
Improper handling of this will make it hard to trace the bugs, where
allocations may cause a garbage collection, or a garbage collection
may immediately follow an allocation.
The method alignAllocation() accepts a region to be aligned, and aligns up the request, according to the requested alignment and offset, optimizing with the known alignment for the allocator algorithms. This method returns the aligned up address.
Consecutive allocation failure for the threads are counted by determineCollectionAttempts(). fillAlignmentGap() gets the start and end addresses, and fills the region with the alignment value. The minimum size guaranteeing the allocation of a given number of bytes at the specified alignment is calculated by getMaximumAlignedSize().
All allocators should use the final method, allocSlowInline() for the slow path allocation. This method attempts the protected method allocSlowOnce() multiple times, which is defined in the subclasses. This method ensures safe execution by taking care of the thread/mutator context affinity changes, whilst allowing the collection to occur.
2. Bump Pointer Allocation
This is implemented by
the BumpPointer class,
which extends the abstract class Allocator. Bump Pointer scans
through the allocated objects linearly. To achieve parallelism, this
class maintains a header at a region of 1 or more blocks granularity.
The minimum region size is 32678 bytes. Hence the 3 or 4 word
overhead is less than 0.05% of all space. BumpPointer is initialized
by providing the space to bump point into, indicating whether the
linear scanning is allowed in the region. The method linearScan()
performs a linear scan through
the objects allocated by the bump pointer, and scanRegion()
scans through a single
contiguous region.
Intra-block allocation requires only a load, addition comparison and store, and hence is fast. The allocator will request more memory, if a block boundary is encountered. The scanned objects maintain affinity with the thread which allocated the objects in the region. This class relies on the supporting virtual machine implementing the getNextObject and related operations.
Space is allocated for a new object, by calling alloc(). This method is frequently executed, and is sensitive to the optimizing compiler. Whenever the bump pointer reaches the internal limit, allocSlow() is called. This method should never be inlined by the optimizing compiler, and hence is annotated with @NoInline, to force out of line.
Bump pointer can be re-associated to a different space by calling rebind(), providing the new space to which the pointer to be associated to. The bump pointer will be reset such that it will use the new space for the next call to alloc().
Address is a stub implementation of an Address type, used by the runtime system and collector to denote the machine addresses. An allocation unit is denoted as a card, which is marked by an address that lies within the card. Providing the address of the object creating a new card, the address that lies within the card, and the size of the pending allocation in bytes, createCardAnchor() creates a record, where the start of the card is relative to the start of the object. The start of the card corresponding to the given address can be retrieved by calling getCard(). Similarly, the address of the card metadata can be retrieved by providing an address from the card, calling getCardMetaData().
Next region from the linked list of regions can be retrieved using getNextRegion(). Similarly, setNextRegion() is used to set the next region in the linked list. clearNextRegion() clears the next region pointer in the linked list.
The DATA_END address from the region header can be retrieved using getDataEnd(), by providing the bump pointer region address. setDataEnd() is used to set the new DATA_END address from the header. Similarly, getRegionLimit() and setRegionLimit() return or store the end address of the given region, respectively. The lowest address where the data can be stored can be retrieved by getDataStart(), for the given region. updateMetaData() is used to update the metadata, reflecting the addition of a new region. Where a bump pointer region has been consumed, but the contiguous region is available, consumeNextRegion() consumes it and returns the start of the region satisfying the outstanding allocation request.
3. Block Allocation
Blocks are a unit of storage of 2 ^ n bytes, that are
non-shared (thread-local) and coarse-grained. Virtual memory resource
provides virtual memory space. Here, pages consumed by blocks are
accountable for the memory resource. BlockAllocator implements
various sizes of Block data structure. alloc() allocates
a block and returns the first usable bytes of the block. A block is
freed by calling free(). If
the block is completely free, the block is returned to the virtual
memory resource. Otherwise, if the block is just a sub-page block,
the block is added to the free list.
4. GCspy Integration
GCspy 2.0 is a
tool that helps to analyse the heap, that often is used to understand
the memory usage and effectiveness of garbage collectors in our
project. The development of GCspy however lags behind that of the
Jikes RVM core [10]. The data for GCspy is gathered using
gcspyGatherData() in the
classes. The package
org.mmtk.utility.gcspy contains
the classes for the GCspy integration, and
org.mmtk.utility.gcspy.drivers contains
the GCspy drivers for the MMTk collectors.
5. Treadmill
Each
instance of the Treadmill
is
a doubly linked list, where each node is a piece of memory. The
words of each node,
[Forward
Link | Backward Link | Treadmill | Payload ----->]
|
The Treadmill object must not be moved. Depending on the
constructor argument, access to the instances may be synchronized.
This assumes that the implementation language, and the
language being implemented are the same. This works well with Jikes
RVM, as these are Java. However, for using with other VM
implementations in other languages, the doubly linked list class,
upon which Treadmill depends, should be rewritten.
If the created instance
is to be shared between the threads, which is decided by the boolean
parameter “shared”
provided as a parameter in the constructor, the access will be
synchronized with locks. A node is added to the treadmill during the
allocation, using addToTreadmill().
6. Large Object Allocation
This is implemented in
LargeObjectLocal, which
extends the abstract class LargeObjectAllocator.
Each instance provides a fast unsynchronized access to the treadmill,
and bound to a single CPU. Hence this should not be shared across
different CPUs, as they provide truly concurrent threads.
Given c CPUs, and t Treadmill spaces, there will be c *
t instances of this class, for each {CPU, Treadmill} pair.
7. Page Resource
Allocation of pages for
a space is managed by the abstract class PageResource.
When a space request a page, page budget and the use of
virtual address space are checked. If the request can not be
satisfied, garbage collection is triggered. MonotonePageResource,
which is a subclass of this class handles the monotonic space usage.
Similarly, the other sub class, FreeListPageResource handles
the ad hoc usage. Copying collectors allocate space monotonically
before freeing the entire space. Hence, MonotonePageResource is
useful for them. Though the MonotonePageResource is more restrictive,
it is easier to manage.
8. Heap Growth Management
HeapGrowthManager
observes the heap utilization and GC load, and grows and shrinks the
heap size accordingly. This class, and all the other classes
in the package org.mmtk.utility.heap contains
the heap related mechanisms.
9. Sanity Checker
Sanity checks for the
simple collectors are handled by the classes in the package
org.mmtk.utility.sanitychecker.
SanityChecker is the
major class handling the sanity checking. SanityDataTable
implements a simple hashtable to
store and retrieve per object information on sanity checks.
10. Statistics
The package
org.mmtk.utility.statistics
contains a number of counters,
implementing the abstract class Counter for
multiple counting purposes. SizeCounter, aggregates
two EventCounter
objects, to count the
number of events and the volume. Xml class
writes the output in XML format.
11. Deque
Classes extending Deque |
The package
org.mmtk.utility.deque defines
the doubly-linked, double-ended queue (deque). Though the double
linking slightly increases the space demand, this is a worth trade
off, as this provides an efficient buffer and operations such as
sorting.
LocalSSB implements
a local unsynchronized sequential store buffer. This is used in
critical code such as garbage collection work queue and the write
buffer used by many collectors. Hence, this is implemented as
efficient as possible in space and time. Each instance has a bump
pointer and a pointer to the sharedDeque. This class follows FIFO,
though it doesn't implement dequeing. TraceBuffer supports
enqueuing and dequeuing of tracing data and bulk processing of the
buffer.
12.
Log
Log class is
used for trace and error logging. Message buffer size is
intentionally kept large (3000 characters), as Lock class of
Jikes RVM logs a considerable amount of information during a
potential GC deadlock.
No comments:
Post a Comment
You are welcome to provide your opinions in the comments. Spam comments and comments with random links will be deleted.