Friday, June 21, 2013

MMTk utility classes

Class Diagram of Counters
High Level Language Virtual Machines is a core topic of interest for the researchers who are into virtual execution environments. As an open source virtual machine released to 16 universities, as early as 2001, Jikes RVM has been a major drive for many researches. Garbage Collectors for the Jikes RVM are constructed by the framework known as the Memory Management Toolkit (MMTk). Key Compositional Elements in MMTk are utilities, policies, and plans. Apart from these, package org.mmtk.harness contains the MMTk harness test cases. Sanity checks are included in the package org.mmtk.harness.sanity. GcSanity is the collection sanity checker, that takes snapshots before and after a collection to ensure that the live objects after a collection are the same that exist before the collection. Further options are implemented in the classes of org.mmtk.harness.options. We have looked at policies and plans. In this post, we will look at the MMTk utility package, which implements the MMTk mechanisms.

MMTk comes with the utility classes that provides mechanism for the memory management, across multiple policies and plans that use those policies. An ideal example showing the usage of the utility package is, the interface
Constants. All the classes either implement the Constants interface, or are sub classes of the classes that implement the interface.

1. Allocation
The package org.mmtk.utility.alloc handles the allocation. Allocator is the base class providing the basis for the processor-local allocation. This provides the retry mechanism, to prevent the slow-path allocation causing a garbage collection, violating the assumption of uninterruptibility. Allocator also ensures that requests are aligned according to requests. This class is very crucial in garbage collection as the base class for all the allocator algorithms. Improper handling of this will make it hard to trace the bugs, where allocations may cause a garbage collection, or a garbage collection may immediately follow an allocation.

The method alignAllocation() accepts a region to be aligned, and aligns up the request, according to the requested alignment and offset, optimizing with the known alignment for the allocator algorithms. This method returns the aligned up address.

Consecutive allocation failure for the threads are counted by determineCollectionAttempts(). fillAlignmentGap() gets the start and end addresses, and fills the region with the alignment value. The minimum size guaranteeing the allocation of a given number of bytes at the specified alignment is calculated by getMaximumAlignedSize().

All allocators should use the final method,
allocSlowInline() for the slow path allocation. This method attempts the protected method allocSlowOnce() multiple times, which is defined in the subclasses. This method ensures safe execution by taking care of the thread/mutator context affinity changes, whilst allowing the collection to occur.

2. Bump Pointer Allocation
This is implemented by the BumpPointer class, which extends the abstract class Allocator. Bump Pointer scans through the allocated objects linearly. To achieve parallelism, this class maintains a header at a region of 1 or more blocks granularity. The minimum region size is 32678 bytes. Hence the 3 or 4 word overhead is less than 0.05% of all space. BumpPointer is initialized by providing the space to bump point into, indicating whether the linear scanning is allowed in the region. The method linearScan() performs a linear scan through the objects allocated by the bump pointer, and scanRegion() scans through a single contiguous region.

Intra-block allocation requires only a load, addition comparison and store, and hence is fast. The allocator will request more memory, if a block boundary is encountered. The scanned objects maintain affinity with the thread which allocated the objects in the region. This class relies on the supporting virtual machine implementing the getNextObject and related operations.

Space is allocated for a new object, by calling alloc(). This method is frequently executed, and is sensitive to the optimizing compiler. Whenever the bump pointer reaches the internal limit, allocSlow() is called. This method should never be inlined by the optimizing compiler, and hence is annotated with @NoInline, to force out of line.

Bump pointer can be re-associated to a different space by calling
rebind(), providing the new space to which the pointer to be associated to. The bump pointer will be reset such that it will use the new space for the next call to alloc().

Address
is a stub implementation of an Address type, used by the runtime system and collector to denote the machine addresses. An allocation unit is denoted as a card, which is marked by an address that lies within the card. Providing the address of the object creating a new card, the address that lies within the card, and the size of the pending allocation in bytes, createCardAnchor() creates a record, where the start of the card is relative to the start of the object. The start of the card corresponding to the given address can be retrieved by calling getCard(). Similarly, the address of the card metadata can be retrieved by providing an address from the card, calling getCardMetaData().

Next region from the linked list of regions can be retrieved using
getNextRegion(). Similarly, setNextRegion() is used to set the next region in the linked list. clearNextRegion() clears the next region pointer in the linked list.

The DATA_END address from the region header can be retrieved using
getDataEnd(), by providing the bump pointer region address. setDataEnd() is used to set the new DATA_END address from the header. Similarly, getRegionLimit() and setRegionLimit() return or store the end address of the given region, respectively. The lowest address where the data can be stored can be retrieved by getDataStart(), for the given region. updateMetaData() is used to update the metadata, reflecting the addition of a new region. Where a bump pointer region has been consumed, but the contiguous region is available, consumeNextRegion() consumes it and returns the start of the region satisfying the outstanding allocation request.

3. Block Allocation
Blocks are a unit of storage of 2 ^ n bytes, that are non-shared (thread-local) and coarse-grained. Virtual memory resource provides virtual memory space. Here, pages consumed by blocks are accountable for the memory resource. BlockAllocator implements various sizes of Block data structure. alloc() allocates a block and returns the first usable bytes of the block. A block is freed by calling free(). If the block is completely free, the block is returned to the virtual memory resource. Otherwise, if the block is just a sub-page block, the block is added to the free list.

4. GCspy Integration
GCspy 2.0 is a tool that helps to analyse the heap, that often is used to understand the memory usage and effectiveness of garbage collectors in our project. The development of GCspy however lags behind that of the Jikes RVM core [10]. The data for GCspy is gathered using gcspyGatherData() in the classes. The package org.mmtk.utility.gcspy contains the classes for the GCspy integration, and org.mmtk.utility.gcspy.drivers contains the GCspy drivers for the MMTk collectors.

5. Treadmill
Each instance of the Treadmill is a doubly linked list, where each node is a piece of memory. The words of each node,
[Forward Link | Backward Link | Treadmill | Payload ----->]
The Treadmill object must not be moved. Depending on the constructor argument, access to the instances may be synchronized.
 
This assumes that the implementation language, and the language being implemented are the same. This works well with Jikes RVM, as these are Java. However, for using with other VM implementations in other languages, the doubly linked list class, upon which Treadmill depends, should be rewritten.
 
If the created instance is to be shared between the threads, which is decided by the boolean parameter “shared” provided as a parameter in the constructor, the access will be synchronized with locks. A node is added to the treadmill during the allocation, using addToTreadmill().

6. Large Object Allocation
This is implemented in LargeObjectLocal, which extends the abstract class LargeObjectAllocator. Each instance provides a fast unsynchronized access to the treadmill, and bound to a single CPU. Hence this should not be shared across different CPUs, as they provide truly concurrent threads.
Given c CPUs, and t Treadmill spaces, there will be c * t instances of this class, for each {CPU, Treadmill} pair.

7. Page Resource
Allocation of pages for a space is managed by the abstract class PageResource. When a space request a page, page budget and the use of virtual address space are checked. If the request can not be satisfied, garbage collection is triggered. MonotonePageResource, which is a subclass of this class handles the monotonic space usage. Similarly, the other sub class, FreeListPageResource handles the ad hoc usage. Copying collectors allocate space monotonically before freeing the entire space. Hence, MonotonePageResource is useful for them. Though the MonotonePageResource is more restrictive, it is easier to manage.

8. Heap Growth Management
HeapGrowthManager observes the heap utilization and GC load, and grows and shrinks the heap size accordingly. This class, and all the other classes in the package org.mmtk.utility.heap contains the heap related mechanisms.

9. Sanity Checker
Sanity checks for the simple collectors are handled by the classes in the package org.mmtk.utility.sanitychecker. SanityChecker is the major class handling the sanity checking. SanityDataTable implements a simple hashtable to store and retrieve per object information on sanity checks.

10. Statistics
The package org.mmtk.utility.statistics contains a number of counters, implementing the abstract class Counter for multiple counting purposes. SizeCounter, aggregates two EventCounter objects, to count the number of events and the volume. Xml class writes the output in XML format.

11. Deque
Classes extending Deque
The package org.mmtk.utility.deque defines the doubly-linked, double-ended queue (deque). Though the double linking slightly increases the space demand, this is a worth trade off, as this provides an efficient buffer and operations such as sorting.
 
LocalSSB implements a local unsynchronized sequential store buffer. This is used in critical code such as garbage collection work queue and the write buffer used by many collectors. Hence, this is implemented as efficient as possible in space and time. Each instance has a bump pointer and a pointer to the sharedDeque. This class follows FIFO, though it doesn't implement dequeing. TraceBuffer supports enqueuing and dequeuing of tracing data and bulk processing of the buffer.
12. Log
Log class is used for trace and error logging. Message buffer size is intentionally kept large (3000 characters), as Lock class of Jikes RVM logs a considerable amount of information during a potential GC deadlock. 

No comments:

Post a Comment

You are welcome to provide your opinions in the comments. Spam comments and comments with random links will be deleted.