Preface 1. Document Conventions ....................v 1.1. Typographic Conventions ..................v 1.2. Pull-quote Conventions ..................vi 1.3. Notes and Warnings ................... vii 2. We Need Feedback! ..................... viii 1. Introduction 1.1. Documentation Goals ....................1 1.2. SystemTap Capabilities ....................1 2.
Page 4
SystemTap Beginners Guide 4.3.1. Counting Function Calls Made ................40 4.3.2. Call Graph Tracing ..................41 4.3.3. Determining Time Spent in Kernel and User Space ..........43 4.3.4. Monitoring Polling Applications ................. 44 4.3.5. Tracking Most Frequently Used System Calls ............ 47 4.3.6.
Preface 1. Document Conventions This manual uses several conventions to highlight certain words and phrases and draw attention to specific pieces of information. Liberation Fonts In PDF and paper editions, this manual uses typefaces drawn from the set. The Liberation Fonts set is also used in HTML editions if the set is installed on your system. If not, alternative but equivalent typefaces are displayed.
Preface Choose System > Preferences > Mouse from the main menu bar to launch Mouse Preferences. In the Buttons tab, click the Left-handed mouse check box and click Close to switch the primary mouse button from the left to the right (making the mouse suitable for use in the left hand).
Notes and Warnings Output sent to a terminal is set in mono-spaced roman and presented thus: books Desktop documentation drafts photos stuff books_tests Desktop1 downloads images notes scripts svgs Source-code listings are also set in mono-spaced roman but add syntax highlighting as follows: package org.jboss.book.jca.ex1;...
2. We Need Feedback! If you find a typographical error in this manual, or if you have thought of a way to make this manual http://bugzilla.redhat.com/ better, we would love to hear from you! Please submit a report in Bugzilla: against the product Red_Hat_Enterprise_Linux 5.
Chapter 1. Introduction SystemTap is a tracing and probing tool that allows users to study and monitor the activities of the operating system (particularly, the kernel) in fine detail. It provides information similar to the output of tools like netstat, ps, top, and iostat; however, SystemTap is designed to provide more filtering and analysis options for collected information.
Page 10
Chapter 1. Introduction developmental efforts of the SystemTap community are geared towards improving SystemTap's user- space probing capabilities.
Chapter 2. Using SystemTap This chapter instructs users how to install SystemTap, and provides an introduction on how to run SystemTap scripts. 2.1. Installation and Setup To deploy SystemTap, you need to install the SystemTap packages along with the corresponding set of -devel, -debuginfo and -debuginfo-common packages for your kernel.
Chapter 2. Using SystemTap uname -r For example, if you wish to use SystemTap on kernel version 2.6.18-53.el5 on an i686 machine, then you would need to download and install the following RPMs: • kernel-debuginfo-2.6.18-53.1.13.el5.i686.rpm • kernel-debuginfo-common-2.6.18-53.1.13.el5.i686.rpm • kernel-devel-2.6.18-53.1.13.el5.i686.rpm Important The version, variant, and architecture of the -devel, -debuginfo and -debuginfo- common packages must match the kernel you wish to probe with SystemTap exactly.
Generating Instrumentation for Other Computers Pass 1: parsed user script and 45 library script(s) in 340usr/0sys/358real ms. Pass 2: analyzed script: 1 probe(s), 1 function(s), 0 embed(s), 0 global(s) in 290usr/260sys/568real ms. Pass 3: translated to C into "/tmp/stapiArgLX/stap_e5886fa50499994e6a87aacdc43cd392_399.c" in 490usr/430sys/938real ms.
Page 14
Chapter 2. Using SystemTap • target kernel — the kernel of the target system. This is the kernel on which you wish to load/run the instrumentation module. Procedure 2.1. Configuring a Host System and Target Systems Install the systemtap-runtime RPM on each target system. Determine the kernel running on each target system by running uname -r on each target system.
Running SystemTap Scripts Important The host system must be the same architecture and running the same distribution of Linux as the target system in order for the built instrumentation module to work. 2.3. Running SystemTap Scripts SystemTap scripts are run through the command stap. stap can run SystemTap scripts from standard input or from file.
Page 16
Chapter 2. Using SystemTap SystemTap (as in stap script -c /bin/cp). For more information about target(), refer to Functions. -e 'script' Use script string rather than a file as input for systemtap translator. You can also instruct stap to run scripts from standard input using the switch -. To illustrate: echo "probe timer.s(1) {exit()}"...
Chapter 3. Understanding How SystemTap Works SystemTap allows users to write and reuse simple scripts to deeply examine the activities of a running Linux system. These scripts can be designed to extract data, filter it, and summarize it quickly (and safely), enabling the diagnosis of complex performance (or even functional) problems.
Page 18
Chapter 3. Understanding How SystemTap Works Note An event and its corresponding handler is collectively called a probe. A SystemTap script can have multiple probes. A probe's handler is commonly referred to as a probe body. In terms of application development, using events and handlers is similar to instrumenting the code by inserting diagnostic print statements in a program's sequence of commands.
Event Chapter 4, Useful SystemTap Scripts; each section therein provides a detailed refer to explanation of the script, its events, handlers, and expected output. 3.2.1. Event SystemTap events can be broadly classified into two types: synchronous and asynchronous. Synchronous Events A synchronous event occurs when any process executes an instruction at a particular location in kernel code.
Page 20
Chapter 3. Understanding How SystemTap Works probe module("ext3").function("*") { } probe module("ext3").function("*").return { } Example 3.2. moduleprobe.stp Example 3.2, “moduleprobe.stp” points to the entry of all functions for the ext3 The first probe in module. The second probe points to the exits of all functions for that same module; the use of Example 3.2, the .return suffix is similar to kernel.function().
Systemtap Handler/Body Important SystemTap supports the use of a large collection of probe events. For more information about supported events, refer to man stapprobes. The SEE ALSO section of man stapprobes also contains links to other man pages that discuss supported events for specific subsystems and components.
Page 22
Chapter 3. Understanding How SystemTap Works probe syscall.open printf ("%s(%d) open\n", execname(), pid()) Example 3.5. variables-in-printf-statements.stp Example 3.5, “variables-in-printf-statements.stp” instructs SystemTap to probe all entries to the system call open; for each event, it prints the current execname() (a string with the executable name) and pid() (the current process ID number), followed by the word open.
Page 23
Systemtap Handler/Body The generic data included in the returned string includes a timestamp (number of microseconds since the first call to thread_indent() by the thread), a process name, and the thread ID. This allows you to identify what functions were called, who called them, and the duration of each function call.
Chapter 3. Understanding How SystemTap Works name Identifies the name of a specific system call. This variable can only be used in probes that use the event syscall.system_call. target() Used in conjunction with stap script -x process ID or stap script -c command. If you want to specify a script to take an argument of a process ID or command, use target() as the variable in the script to refer to it.
Conditional Statements hz=(1000*count_jiffies) / count_ms printf ("jiffies:ms ratio %d:%d => CONFIG_HZ=%d\n", count_jiffies, count_ms, hz) exit () Example 3.8. timer-jiffies.stp Example 3.8, “timer-jiffies.stp” computes the CONFIG_HZ setting of the kernel using timers that count jiffies and milliseconds, then computing accordingly. The global statement allows the script to use the variables count_jiffies and count_ms (set in their own respective probes) to be shared with probe timer.ms(12345).
Page 26
Chapter 3. Understanding How SystemTap Works else countnonread ++ probe timer.s(5) { exit() } probe end printf("VFS reads total %d\n VFS writes total %d\n", countread, countnonread) Example 3.9. ifelse.stp Example 3.9, “ifelse.stp” is a script that counts how many virtual file system reads (vfs_read) and writes (vfs_write) the system performs within a 5-second span.
Command-Line Arguments 3.3.3. Command-Line Arguments You can also allow a SystemTap script to accept simple command-line arguments using a $ or @ immediately followed by the number of the argument on the command line. Use $ if you are expecting the user to enter an integer as a command-line argument, and @ if you are expecting a string.
Chapter 3. Understanding How SystemTap Works Important All associate arrays must be declared as global, regardless of whether the associate array is used in one or multiple probes. 3.5. Array Operations in SystemTap This section enumerates some of the most commonly used array operations in SystemTap. 3.5.1.
Incrementing Associated Values Example 3.13, “Using Array Values in Simple Computations” The construct in computes a value for the variable delta by subtracting the associated value of the key tid() from the current gettimeofday_s(). The construct does this by reading the value of tid() from the array. This particular construct is useful for determining the time between two events, such as the start and completion of a read operation.
Chapter 3. Understanding How SystemTap Works reads[execname()] ++ probe timer.s(3) foreach (count in reads) printf("%s : %d \n", count, reads[count]) Example 3.15. cumulative-vfsreads.stp Example 3.15, “cumulative-vfsreads.stp”, the foreach statement uses the In the second probe of variable count to reference each iteration of a unique key in the array reads. The reads[count] array statement in the same probe retrieves the associated value of each unique key.
Using Arrays in Conditional Statements foreach (count in reads) printf("%s : %d \n", count, reads[count]) delete reads Example 3.16. noncumulative-vfsreads.stp Example 3.16, “noncumulative-vfsreads.stp”, the second probe prints the number of VFS reads each process made within the probed 3-second period only. The delete reads statement clears the reads array within the probe.
Chapter 3. Understanding How SystemTap Works reads[execname()] ++ probe timer.s(3) printf("=======\n") foreach (count in reads-) if (reads[count] >= 1024) printf("%s : %dkB \n", count, reads[count]/1024) else printf("%s : %dB \n", count, reads[count]) Example 3.17. vfsreads-print-if-1kb.stp Example 3.17, “vfsreads-print-if-1kb.stp” Every three seconds, prints out a list of all processes, along with how many times each process performed a VFS read.
Page 33
Computing for Statistical Aggregates To add value to a statistical aggregate, use the operator <<< value. global reads probe vfs.read reads[execname()] <<< count Example 3.19. stat-aggregates.stp Example 3.19, “stat-aggregates.stp”, the operator <<< count stores the amount returned by count to to the associated value of the corresponding execname() in the reads array. Remember, these values are stored;...
Chapter 3. Understanding How SystemTap Works reads[execname(),pid()] <<< 1 probe timer.s(3) foreach([var1,var2] in reads) printf("%s (%d) : %d \n", var1, var2, @count(reads[var1,var2])) Example 3.20. Multiple Array Indexes Example 3.20, “Multiple Array Indexes”, the first probe tracks how many times each process performs a VFS read.
Chapter 4. Useful SystemTap Scripts This chapter enumerates several SystemTap scripts you can use to monitor and investigate different subsystems. All of these scripts are available at /usr/share/systemtap/testsuite/ systemtap.examples/ once you install the systemtap-testsuite RPM. 4.1. Network The following sections showcase scripts that trace network-related functions and build a profile of network activity.
Page 36
Chapter 4. Useful SystemTap Scripts delete ifxmit delete ifrecv delete ifmerged probe timer.ms(5000), end, error print_activity() Note that function print_activity() uses the following expressions: n_xmit ? @sum(ifxmit[pid, dev, exec, uid])/1024 : 0 n_recv ? @sum(ifrecv[pid, dev, exec, uid])/1024 : 0 These expressions are if/else conditionals.
Chapter 4. Useful SystemTap Scripts [...] socket-trace.stp Example 4.2. Sample Output Example 4.2, “socket-trace.stp Sample Output” socket- contains a 3-second excerpt of the output for trace.stp. For more information about the output of this script as provided by thread_indent(), SystemTap Functions Example 3.6, “thread_indent.stp”.
Page 39
#! /usr/bin/env stap ############################################################ # Dropwatch.stp # Author: Neil Horman <nhorman@redhat.com> # An example script to mimic the behavior of the dropwatch utility # http://fedorahosted.org/dropwatch ############################################################ # Array to hold the list of drop points we find...
Chapter 4. Useful SystemTap Scripts To make the location of packet drops more meaningful, refer to the /boot/System.map-`uname - r` file. This file lists the starting addresses for each function, allowing you to map the addresses in the Example 4.4, “dropwatch.stp Sample Output” output of to a specific function name.
Page 41
Summarizing Disk Read/Write Traffic probe vfs.write.return { if ($return>0) { if (devname!="N/A") { /*skip update cache*/ io_stat[pid(),execname(),uid(),ppid(),"W"] += $return device[pid(),execname(),uid(),ppid(),"W"] = devname write_bytes += $return probe timer.ms(5000) { /* skip non-read/write disk */ if (read_bytes+write_bytes) { printf("\n%-25s, %-8s%4dKb/sec, %-7s%6dKb, %-7s%6dKb\n\n", ctime(gettimeofday_s()), "Average:", ((read_bytes+write_bytes)/1024)/5, "Read:",read_bytes/1024,...
Chapter 4. Useful SystemTap Scripts • BYTES — the amount of data read to or written from disk. disktop.stp The time and date in the output of is returned by the functions ctime() and gettimeofday_s(). ctime() derives calendar time in terms of seconds passed since the Unix epoch (January 1, 1970).
Page 43
Tracking I/O Time For Each File Read or Write filenames[pid()] = user_string($filename) probe syscall.open.return { if ($return != -1) { filehandles[pid(), $return] = filenames[pid()] fileread[pid(), $return] = 0 filewrite[pid(), $return] = 0 } else { printf("%d %s access %s fail\n", timestamp(), proc(), filenames[pid()]) delete filenames[pid()] probe syscall.read { if ($count >...
This section describes how to track the cumulative amount of I/O to the system. traceio.stp #! /usr/bin/env stap # traceio.stp # Copyright (C) 2007 Red Hat, Inc., Eugene Teo <eteo@redhat.com> # Copyright (C) 2009 Kai Meyer <kai@unixlords.com> Fixed a bug that allows this to run longer And added the humanreadable function # This program is free software;...
Page 45
Track Cumulative IO # published by the Free Software Foundation. global reads, writes, total_io probe vfs.read.return { reads[pid(),execname()] += $return total_io[pid(),execname()] += $return probe vfs.write.return { writes[pid(),execname()] += $return total_io[pid(),execname()] += $return function humanreadable(bytes) { if (bytes > 1024*1024*1024) { return sprintf("%d GiB", bytes/1024/1024/1024) } else if (bytes >...
Monitoring Reads and Writes to a File [...] traceio2.stp Example 4.8. Sample Output 4.2.5. Monitoring Reads and Writes to a File This section describes how to monitor reads from and writes to a file in real time. inodewatch-simple.stp probe vfs.write, vfs.read dev_nr = $file->f_dentry->d_inode->i_sb->s_dev inode_nr = $file->f_dentry->d_inode->i_ino if (dev_nr == ($1 <<...
Chapter 4. Useful SystemTap Scripts 4.2.6. Monitoring Changes to File Attributes This section describes how to monitor if any processes are changing the attributes of a targeted file, in real time. inodewatch2-simple.stp global ATTR_MODE = 1 probe kernel.function("inode_setattr") { dev_nr = $inode->i_sb->s_dev inode_nr = $inode->i_ino if (dev_nr == ($1 <<...
Call Graph Tracing # stap functioncallcount.stp "*@mm/*.c" probe kernel.function(@1).call { # probe functions listed on commandline called[probefunc()] <<< 1 # add a count efficiently global called probe end { foreach (fn in called-) # Sort by call count (in decreasing order) (fn+ in called) # Sort by function name printf("%s %d\n", fn, @count(called[fn]))
Page 50
Chapter 4. Useful SystemTap Scripts para-callgraph-simple.stp function trace(entry_p) { if(tid() in trace) printf("%s%s%s\n",thread_indent(entry_p), (entry_p>0?"->":"<-"), probefunc()) global trace probe kernel.function(@1).call { if (execname() == "stapio") next # skip our own helper process trace[tid()] = 1 trace(1) probe kernel.function(@1).return { trace(-1) delete trace[tid()] probe kernel.function(@2).call { trace(1) } probe kernel.function(@2).return { trace(-1) } function trace(entry_p) {...
CPU usage and power savings. timeout.stp #! /usr/bin/env stap # Copyright (C) 2009 Red Hat, Inc. # Written by Ulrich Drepper <drepper@redhat.com> # Modified by William Cohen <wcohen@redhat.com>...
Page 53
Monitoring Polling Applications global process, timeout_count, to global poll_timeout, epoll_timeout, select_timeout, itimer_timeout global nanosleep_timeout, futex_timeout, signal_timeout probe syscall.poll, syscall.epoll_wait { if (timeout) to[pid()]=timeout probe syscall.poll.return { p = pid() if ($return == 0 && to[p] > 0 ) { poll_timeout[p]++ timeout_count[p]++ process[p] = execname() delete to[p]...
Chapter 4. Useful SystemTap Scripts global syscalls_count probe syscall.* { syscalls_count[name]++ function print_systop () { printf ("%25s %10s\n", "SYSCALL", "COUNT") foreach (syscall in syscalls_count- limit 20) { printf("%25s %10d\n", syscall, syscalls_count[syscall]) delete syscalls_count probe timer.s(5) { print_systop () printf("--------------------------------------------------------------\n") topsys.stp lists the top 20 system calls used by the system per 5-second interval.
Page 57
Tracking System Call Volume Per Process (Section 4.3.4, “Monitoring Polling Applications”). Monitoring the volume of system calls made most by each process provides more data in investigating your system for polling processes and other resource hogs. syscalls_by_proc.stp #! /usr/bin/env stap # Copyright (C) 2006 IBM Corp.
Chapter 4. Useful SystemTap Scripts If you prefer the output to display the process IDs instead of the process names, use the following script instead. syscalls_by_pid.stp #! /usr/bin/env stap # Copyright (C) 2006 IBM Corp. # This file is part of systemtap, and is free software. You can # redistribute it and/or modify it under the terms of the GNU General # Public License (GPL);...
Page 59
Identifying Contended User-Space Locks futexes.stp #! /usr/bin/env stap # This script tries to identify contended user-space locks by hooking # into the futex system call. global thread_thislock # short global thread_blocktime # global FUTEX_WAIT = 0 /*, FUTEX_WAKE = 1 */ global lock_waits # long-lived stats on (tid,lock) blockage elapsed time global process_names # long-lived pid-to-execname mapping probe syscall.futex {...
Chapter 5. Understanding SystemTap Errors This chapter explains the most common errors you may encounter while using SystemTap. 5.1. Parse and Semantic Errors These types of errors occur while SystemTap attempts to parse and translate the script into C, prior to being converted into a kernel module.
Page 62
Chapter 5. Understanding SystemTap Errors semantic error: unresolved type for identifier 'foo' The identifier (e.g. a variable) was used, but no type (integer or string) could be determined. This occurs, for instance, if you use a variable in a printf statement while the script never assigns a value to the variable.
Run Time Errors and Warnings semantic error: libdwfl failure There was a problem processing the debugging information. In most cases, this error results from the installation of a kernel-debuginfo RPM whose version does not match the probed kernel exactly. The installed kernel-debuginfo RPM itself may have some consistency / correctness problems. semantic error: cannot find foo debuginfo SystemTap could not find a suitable kernel-debuginfo at all.
Chapter 6. References This chapter enumerates other references for more information about SystemTap. It is advisable that you refer to these sources in the course of writing advanced probes and tapsets. SystemTap Wiki The SystemTap Wiki is a collection of links and articles related to the deployment, usage, and development of SystemTap.
Appendix A. Revision History Revision 2.0 Mon Jul 20 2009 Don Domingo ddomingo@redhat.com includes 5.4 minor updates and additional script "dropwatch.stp" Revision 1.0 Wed Jun 17 2009 Don Domingo ddomingo@redhat.com Building+pushing to RHEL...
Index associative arrays, 20 clearing arrays/array elements, 22 delete operator, 22 Symbols multiple array operations within the same probe, 23 $count virtual file system reads (non-cumulative), sample usage tallying, 22 local variables, 36 computing for statistical aggregates, 24 $return @avg (integer extractor), 25 sample usage @count (integer extractor), 25 local variables, 34, 37...
Page 70
Index introduction, 19 computing for timestamp deltas associated values, 19 reading values from arrays example, 19 array operations, 20 index expression, 19 conditional operators key pairs, 19 conditional statements syntax, 19 handlers, 18 unique keys, 19 conditional statements, using arrays in asynchronous events array operations, 23 Events, 12...
Page 71
array operations, 22 event types determining architecture notation, 6 Understanding How SystemTap Works, 9 determining the kernel version, 3 Events determining time spent in kernel and user space asynchronous events, 12 examples of SystemTap scripts, 43 begin, 12 device I/O, monitoring end, 12 examples of SystemTap scripts, 38 examples of synchronous and asynchronous...
Page 72
Index network profiling, 27, 30 format and syntax process deadlocks (arising from futex printf(), 13 contentions), 50 SystemTap handler constructs stat -c, determining file device number (integer handlers, 16 format), 39 SystemTap scripts stat -c, determining whole device number, 38 introduction, 10 summarizing disk I/O traffic, 32 format specifiers...
Page 79
while loops conditional statements handlers, 18 whole device number (usage as a command-line argument) examples of SystemTap scripts, 38 wildcards in events, 11 writes/reads to a file, monitoring examples of SystemTap scripts, 39...
Need help?
Do you have a question about the ENTERPRISE LINUX 5.4 - SYSTEMTAP BEGINNERS GUIDE and is the answer not in the manual?
Questions and answers