1.\" $OpenBSD: pctr.4,v 1.30 2013/07/16 16:05:49 schwarze Exp $ 2.\" 3.\" Pentium performance counter driver for OpenBSD. 4.\" Copyright 1996 David Mazieres <dm@lcs.mit.edu>. 5.\" 6.\" Modification and redistribution in source and binary forms is 7.\" permitted provided that due credit is given to the author and the 8.\" OpenBSD project by leaving this copyright notice intact. 9.\" 10.Dd $Mdocdate: July 16 2013 $ 11.Dt PCTR 4 i386 12.Os 13.Sh NAME 14.Nm pctr 15.Nd driver for CPU performance counters 16.Sh SYNOPSIS 17.Cd "pseudo-device pctr 1" 18.Sh DESCRIPTION 19The 20.Nm 21device provides access to the performance counters on AMD and Intel brand 22processors, and to the TSC on others. 23.Pp 24Intel processors have two 40-bit performance counters which can be 25programmed to count events such as cache misses, branch target buffer hits, 26TLB misses, dual-issues, interrupts, pipeline flushes, and more. 27While AMD processors have four 48-bit counters, their precision is decreased 28to 40 bits. 29.Pp 30There is one 31.Em ioctl 32call to read the status of all counters, and one 33.Em ioctl 34call to program the function of each counter. 35All require the following includes: 36.Bd -literal -offset indent 37#include <sys/types.h> 38#include <machine/cpu.h> 39#include <machine/pctr.h> 40.Ed 41.Pp 42The current state of all counters can be read with the 43.Dv PCIOCRD 44.Em ioctl , 45which takes an argument of type 46.Dv "struct pctrst" : 47.Bd -literal -offset indent 48#define PCTR_NUM 4 49struct pctrst { 50 u_int pctr_fn[PCTR_NUM]; 51 pctrval pctr_tsc; 52 pctrval pctr_hwc[PCTR_NUM]; 53}; 54.Ed 55.Pp 56In this structure, 57.Em ctr_fn 58contains the functions of the counters, as previously set by the 59.Dv PCIOCS0 , 60.Dv PCIOCS1 , 61.Dv PCIOCS2 62and 63.Dv PCIOCS3 64ioctls (see below). 65.Em pctr_hwc 66contains the actual value of the hardware counters. 67.Em pctr_tsc 68is a free-running, 64-bit cycle counter. 69.Pp 70The functions of the counters can be programmed with ioctls 71.Dv PCIOCS0 , 72.Dv PCIOCS1 , 73.Dv PCIOCS2 74and 75.Dv PCIOCS3 76which require a writeable file descriptor and take an argument of type 77.Dv "unsigned int" . \& 78The meaning of this integer is dependent on the particular CPU. 79.Ss Time stamp counter 80The time stamp counter is available on most of the AMD K6, Intel Pentium 81and higher class CPUs, as well as on some 486s and non-intel CPUs. 82It is set to zero at boot time, and then increments with each cycle. 83Because the counter is 64-bits wide, it does not overflow. 84.Pp 85The time stamp counter can be read directly from user-mode using 86the 87.Fn rdtsc 88macro, which returns a 64-bit value of type 89.Em pctrval . 90The following example illustrates a simple use of 91.Fn rdtsc 92to measure the execution time of a hypothetical subroutine called 93.Fn functionx : 94.Bd -literal -offset indent 95void 96time_functionx(void) 97{ 98 pctrval tsc; 99 100 tsc = rdtsc(); 101 functionx(); 102 tsc = rdtsc() - tsc; 103 printf("Functionx took %llu cycles.\en", tsc); 104} 105.Ed 106.Pp 107The value of the time stamp counter is also returned by the 108.Dv PCIOCRD 109.Em ioctl , 110so that one can get an exact timestamp on readings of the hardware 111event counters. 112.Ss Intel Pentium counters 113The Intel Pentium counters are programmed with a 9 bit function. 114The top three bits contain the following flags: 115.Bl -tag -width P5CTR_C 116.It Dv P5CTR_K 117Enables counting of events that occur in kernel mode. 118.It Dv P5CTR_U 119Enables counting of events that occur in user mode. 120You must set at least one of 121.Dv P5CTR_U 122and 123.Dv P5CTR_K 124to count anything. 125.It Dv P5CTR_C 126When this flag is set, the counter attempts to count the number of 127cycles spent servicing a particular event, rather than simply the 128number of occurrences of that event. 129.El 130.Pp 131The bottom 6 bits set the particular event counted. 132A list of possible event functions could be obtained by running a 133.Xr pctr 1 134command with 135.Fl l 136option. 137.Ss "Counters for AMD K6, Intel Pentium Pro and newer CPUs" 138Unlike the Pentium counters, these counters can be read 139directly from user-mode without need to invoke the kernel. 140The macro 141.Fn rdpmc ctr 142takes 0, 1, 2 or 3 as an argument to specify a counter, and returns that 143counter's 40-bit value (which will be of type 144.Em pctrval ) . 145This is generally preferable to making a system call as it introduces 146less distortion in measurements. 147.Pp 148Counter functions supported by these CPUs contain several parts. 149The most significant byte (an 8-bit integer shifted left by 150.Dv PCTR_CM_SHIFT ) 151contains a 152.Em "counter mask" . 153If non-zero, this sets a threshold for the number of times an event 154must occur in one cycle for the counter to be incremented. 155The 156.Em "counter mask" 157can therefore be used to count cycles in which an event 158occurs at least some number of times. 159The next byte contains several flags: 160.Bl -tag -width PCTR_EN 161.It Dv PCTR_U 162Enables counting of events that occur in user mode. 163.It Dv PCTR_K 164Enables counting of events that occur in kernel mode. 165You must set at least one of 166.Dv PCTR_K 167and 168.Dv PCTR_U 169to count anything. 170.It Dv PCTR_E 171Counts edges rather than cycles. 172For some functions this allows you 173to get an estimate of the number of events rather than the number of 174cycles occupied by those events. 175.It Dv PCTR_EN 176Enable counters. 177This bit must be set in the function for counter 0 178in order for either of the counters to be enabled. 179This bit should probably be set in counter 1 as well. 180.It Dv PCTR_I 181Inverts the sense of the 182.Em "counter mask" . \& 183When this bit is set, the counter only increments on cycles in which 184there are no 185.Em more 186events than specified in the 187.Em "counter mask" . 188.El 189.Pp 190The next byte (shifted left by the 191.Dv PCTR_UM_SHIFT ) 192contains flags specific to the event being counted, also known as the 193.Em "unit mask" . 194.Pp 195For events dealing with the L2 cache, the following flags are valid 196on Intel brand processors: 197.Bl -tag -width PCTR_UM_M 198.It Dv PCTR_UM_M 199Count events involving modified cache coherency state lines. 200.It Dv PCTR_UM_E 201Count events involving exclusive cache coherency state lines. 202.It Dv PCTR_UM_S 203Count events involving shared cache coherency state lines. 204.It Dv PCTR_UM_I 205Count events involving invalid cache coherency state lines. 206.El 207.Pp 208To measure all L2 cache activity, all these bits should be set. 209They can be set with the macro 210.Dv PCTR_UM_MESI 211which contains the bitwise or of all of the above. 212.Pp 213For event types dealing with bus transactions, there is another flag 214that can be set in the 215.Em "unit mask" : 216.Bl -tag -width PCTR_UM_A 217.It Dv PCTR_UM_A 218Count all appropriate bus events, not just those initiated by the 219processor. 220.El 221.Pp 222Events marked 223.Em (MESI) 224require the 225.Dv PCTR_UM_[MESI] 226bits in the 227.Em "unit mask" . \& 228Events marked 229.Em (A) 230can take the 231.Dv PCTR_UM_A 232bit. 233.Pp 234Finally, the least significant byte of the counter function is the 235event type to count. 236A list of possible event functions could be obtained by running a 237.Xr pctr 1 238command with 239.Fl l 240option. 241.Sh FILES 242.Bl -tag -width /dev/pctr -compact 243.It Pa /dev/pctr 244.El 245.Sh ERRORS 246.Bl -tag -width "[ENODEV]" 247.It Bq Er ENODEV 248An attempt was made to set the counter functions on a CPU that does 249not support counters. 250.It Bq Er EINVAL 251An invalid counter function was provided as an argument to the 252.Dv PCIOCSx 253.Em ioctl . 254.It Bq Er EPERM 255An attempt was made to set the counter functions, but the device was 256not open for writing. 257.El 258.Sh SEE ALSO 259.Xr pctr 1 , 260.Xr ioctl 2 261.Sh HISTORY 262A 263.Nm 264device first appeared in 265.Ox 2.0 266but was subsequently extended to support AMD and newer Intel CPUs in 267.Ox 4.3 . 268.Sh AUTHORS 269The 270.Nm 271device was written by 272.An David Mazieres Aq Mt dm@lcs.mit.edu . 273.Sh BUGS 274Not all counter functions are completely accurate. 275Some of the functions may not make any sense at all. 276Also you should be aware of the possibility of an interrupt between 277invocations of 278.Fn rdpmc 279and/or 280.Fn rdtsc 281that can potentially decrease the accuracy of measurements. 282