Overview

Download

Install Instructions

LiMiT Tools

Lock Monitor

Library Monitor

Video Demo

Sponsors

LiMiT -- Lightweight MonItoring Toolkit -- is a patch for the Linux kernel and userland library which enables direct userspace access to Intel's hardware performance counters for lightweight, precise performance measurements. A paper detailing the implementation of LiMiT and several case studies was presented at ISCA 2011.

LiMiT -- A tool for reading out performance counters in nanoseconds


Locking instrumentation overheads with various counter access methods. Current Linux tools for accessing performance counters viz. Perfmon2 (and thus PAPI) require system calls to read out counters. System calls usually take microseconds to execute and can perturb execution. LiMiT enables reading performance counters in userspace, decreasing read time to 5 instructions taking about 37 cycles on a modern x86 processor -- which is in the nanoseconds range.

The chart on the right shows an example of the overheads when instrumenting pthread locking calls in MySQL. In a direct comparison benchmark (3 counters with 10e7 reads each, no other code), we see the following:

PAPI-C perf_event LiMiT Speedup over PAPI Speedup over perf_event
Wall Time 7.87s 31.44s 0.34s 92x 23.1x
User Time 0.53s 1.26s 0.34s 3.7x 1.56x
Kernel Time 7.30s 30.10s 0.0s

Usage


LiMiT has several usage modes, however the most useful is the C API. In this mode, you include limit.h in your source code and link against the LiMiT library. The API provides a set of functions to set up/close hardware performance counters as well as read them. Routines for reading are inlined to decrease latency to 5 instructions.

Usage Example: Measurement of branch misprediction during a function call.

// Compile with: gcc -O3 -o hello hello.c -llimit -ldl
#include <limit.h>
#include <stdint.h>
#include <stdio.h> 
 
#define str1 "Hello World, Hello World, Hello World"
#define str2 "HELLOWORLDHELLOWORLDHELLOWORLDHELLOWO"
 
const char* testStr = str1;  //Also try str2
 
uint64_t uppersFound = 0, lowersFound = 0;
void function_to_watch(void) {
    size_t i;
    const char* c = testStr;
 
    while (*c != 0) {
      if (isupper(*c))
        uppersFound++;
      else if (islower(*c))
        lowersFound++;
      c++;
    }
}
 
int main(void) {
    uint64_t br_last, brm_last, c, br, brm;
    size_t i;
 
    lprof_init(3, EV_CYCLES, EV_BRANCH, EV_BRANCH_MISS);
 
    for (i=1; i<=30; i++) {
      lprof(2, br_last);  //Optional: 
      lprof(3, brm_last); // sample just before call
 
      function_to_watch(); // Do something
 
      lprof(1, c);        //Get cumulative cycles
      lprofd(2, br, br_last);    //Get delta branches
      lprofd(3, brm, brm_last);   //Get delta mispreds
 
      printf("At Cycle: %7lu, Br Misprediction: %lf\n",
    		c, 100.0*((double)brm)/br);
    }
 
    lprof_close();
}

Download & Installation


Download and Installation