Garbage Collector in JAVA

  • Garbage Collector (GC) Algorithm:
    Almost all programming languages have a notion of a heap, or a free-store, from which the programmer can request memory. Historically the heap was managed explicitly by the programming by using allocate and release function calls in a library (such as malloc/free in C and new/delete in C++). Explicit memory management has been a significant source of software bugs, and programmer pain (For example using memory after it has been freed, freeing it twice or forgetting to free it at all).
    Starting around 1960, algorithms began to be studied for automatic memory management (McCarthy 1960) [Collins 1960]. With automatic memory management, a programmer can request space from the heap (ex. Instantiating a new object with a statement like “new MyObject()” in C# and JAVA). GC avoids the dangling reference problem because an object referenced somewhere will never be garbage collected and so will not be considered free. GC also solves the space leak and automatically frees all memory no longer referenced.

    Mark-Sweep GC:
    The earliest and most basic garbage collection algorithm is mark – sweep garbage collection. [McCarthy 1960] and modern algorithms are variant on it. Mark – sweep is a “Stop - the - world” collector, which means that at some point when the program request memory and none is available, the program is stopped and a full garbage collection is performed to free up space. In mark – sweep each object has a “mark – bit” which is used during the collection process to track whether the object has been visited. Here is an algorithm for mark & sweep garbage collection implemented on top of some underlying explicit memory management routines, in which free regions of the heap are also considered objects with mark-bits and a known size.

    Algorithm for mark-sweep:
    Also called as tracing garbage collector because it traces out entire collection of objects accessible by the program.

    Two phases:
    1. Finds and marks all accessible objects. Also called as Mark phase.
    2. Scans heap and reclaims all unmarked objects. Also called as Sweep phase.

    State of an object is recorded to distinguish live objects from GC. All objects are unmarked when created by default so the marked field of object is initially false.

    if(!p.marked)
    p.marked=true;
    //For each object q referenced by p
    mark(q);
    hence it ends by marking all accessible objects

    Second phase reclaims storage from all the unmarked objects and at the same time the mark field of live object is set back to false for next invocation.

    void sweep()
    for each object in heap
    if(p.marked)
    p.marked=false;
    else
    heap.release(p);

    The mark-sweep algorithm operates in time linear in the size of the heap i.e. O(N) this doesn’t directly tell how much overhead it imposes on a program, because it must be invoked whenever an allocation fails, and so the overhead depends on parameters such as how big the heap is, and how much memory has become unreachable since the last GC. In practice, the overhead, as well as the pause time, of mark-sweep collectors is high compared to other algorithms.
    Mark-sweep does however have the advantage of freeing all unused memory, but this free memory easily becomes fragmented (limiting the availability of larger contiguous regions). There are technically space over-head for the mark-bit, but in practice a bit is usually re-purposed from some other run-time data. Structure, since it’s only needed when the program is not running.

    Mark & Copy GC:
    1.     Collect objects with a short and short-to-medium lifetime.
    2.     Fast algorithm but requires more space.
    3.     Enables efficient allocation afterwards.
    4.     Frequent and short pauses (minor GC).

    Mark & Compact GC:
    1.     Collect objects with a medium-to-long and long lifetime.
    2.     Slow algorithm but requires less space.
    3.     Avoids fragmentation and enables efficient allocation.

    What is elastic memory for JAVA?
    The Java Virtual Machine (JVM) manages memory automatically for applications, storing objects in heap memory pre-allocated to it by the operating system, and freeing the memory when objects are no longer needed. However it takes a lazy approach, allowing unreferenced objects to remain in heap memory until it is necessary to free up space to store additional objects. This lazy approach to memory management works fine when java is running directly on physical hardware. However, it can lead to performance problems if you are trying to leverage certain key benefits of visualization. For java applications running on ESXi that require consistent, fast response times, the traditional best practice advice has been to reserve 100% of the virtual machine.



No comments:

Post a Comment