collect(1)collect(1)NAMEcollect - command used to collect program performance data
SYNOPSIScollect collect-arguments target target-arguments
collect
collect -V
collect-R
DESCRIPTION
The collect command runs the target process and records performance
data and global data for the process. Performance data is collected
using profiling or tracing techniques. The data can be examined with a
GUI program (analyzer) or a command-line program (er_print). The data
collection software run by the collect command is referred to here as
the Collector.
The data from a single run of the collect command is called an experi‐
ment. The experiment is represented in the file system as a directory,
with various files inside that directory.
The target is the path name of the executable, Java(TM) .jar file, or
Java .class file for which you want to collect performance data. (For
more information about Java profiling, see JAVA PROFILING, below.)
Executables that are targets for the collect command can be compiled
with any level of optimization, but must use dynamic linking. If a
program is statically linked, the collect command prints an error mes‐
sage. In order to see annotated source using analyzer or er_print,
targets should be compiled with the -g flag, and should not be
stripped.
In order to enable dataspace profiling, executables must be compiled
with the -xhwcprof -xdebugformat=dwarf -g flags. These flags are valid
for the C, C++ and Fortran compilers, but only on SPARC[R] platforms.
See the section "DATASPACE PROFILING", below.
The collect command uses the following strategy to find its target:
- If a file with the specified target name exists, has execute permis‐
sion set, and is an ELF executable, the collect command verifies that
it can run on the current machine and then runs it. If the file is
not an ELF executable, the collect command assumes it is a script,
and runs it.
- If a file with the specified target name exists, but does not have
execute permission, collect checks whether the file is a Java[TM] jar
file (target name ends in .jar) or class file (target name ends in
.class). If the file is a jar file or class file, collect inserts the
Java[TM] virtual machine (JVM) software as the target, with any nec‐
essary flags, and collects data on that JVM machine. (The terms
"Java virtual machine" and "JVM" mean a virtual machine for the
Java[TM] platform.) See the section on "JAVA PROFILING", below.
- If a file with the specified target name is not found, collect
seaches your path to find an executable; if an executable file is
found, collect verifies it as described above.
- If a file of the target name is also not found in your path, the com‐
mand looks for a file with that name and the string .class appended;
if a file with the class name is found, collect inserts the JVM
machine with the appropriate flags, as above.
- If none of these procedures can find the target, the command fails.
OPTIONS
If invoked with no arguments, collect prints a usage summary, including
the default configuration of the experiment.
If invoked with only the -h argument, collect prints hardware counter
information. If the processor supports hardware counter overflow pro‐
filing, collect prints two lists containing information about hardware
counters. The first list contains "aliased" hardware counters; the
second list contains raw hardware counters. For more details, see the
"Hardware Counter Overflow Profiling" section below.
Data Specifications
-p option
Collect clock-based profiling data. The allowed values of
option are:
Value Meaning
off Turn off clock-based profiling
on Turn on clock-based profiling with the default profil‐
ing interval of approximately 10 milliseconds.
lo[w] Turn on clock-based profiling with the low-resolution
profiling interval of approximately 100 milliseconds.
hi[gh] Turn on clock-based profiling with the high-resolution
profiling interval of approximately 1 millisecond.
n Turn on clock-based profiling with a profiling
interval of n. The value n can be an integer or a
floating-point number, with a suffix of u for values
in microseconds, or m for values in milliseconds. If
no suffix is used, assume the value to be in millisec‐
onds.
If the value is smaller than the clock profiling mini‐
mum, set it to the minimum; if it is not a multiple of
the clock profiling resolution, round down to the
nearest multiple of the clock resolution. If it
exceeds the clock profiling maximum, report an error.
If it is negative or zero, report an error. If
invoked with no arguments, report the clock-profiling
intervals.
On Linux systems, clock-profiling of multithreaded applications may
report inaccurate data for threads. The profile signal is not
always delivered by the kernel to each thread at the specified
interval; sometimes the signal is delivered to the wrong thread. If
available, HW counter profiling using the cycle counter will give
more accurate per-thread results.
An optional + can be prepended to the clock-profiling interval,
specifying that collect capture dataspace data. It will do so by
backtracking one instruction, and if that instruction is a memory
instruction, it will assume that the delay was attributed to that
instruction and record the event, including the virtual and physical
addresses of the memory reference.
Caution must be used in interpreting clock-based dataspace data; the
delay might be completely unrelated to the memory instruction that
happened to precede the instruction with the clock-profile hit; for
example, if a memory instruction hits in the cache, but is in a loop
executed many times, high counts on that instruction might appear to
indicate memory stall delays, but they do not. This situation can
be disambiguated by examining the disassembly around the instruction
indicating the stall. If the surrounding instructions also have high
clock-profiling metrics, the memory delay is likely to be spurious.
Clock-based dataspace profiling should be used only on machines that
do not support hardware counter profiling on memory-based counters.
See the section "DATASPACE AND MEMORYSPACE PROFILING", below.
If no explicit -p off argument is given, and neither hardware
counter overflow profiling, nor count data, nor race-detection or
deadlock data is specified, turn on clock-based profiling.
-h ctr_def...[,ctr_n_def]
Collect hardware counter overflow profiles. The number of
counter definitions, (ctr_def through ctr_n_def) is processor-
dependent. You can determine the maximum number of hardware
counters definitions for profiling on the current machine, and
see the full list of available hardware counters, by running the
collect command with only the -h argument on the current
machine.
This option is now available on systems running the Linux OS.
For those versions of Linux running the Linux kernel with a ver‐
sion greater than 2.6.32, hardware-counter profiling uses the
PerfEvents framework, and requires no kernel patch.
For earlier Linux systems that use the perfctr framework, you
are responsible for installing the required perfctr patch on the
system. You can find the patch by searching the Web for "perfctr
patch." Instructions for installation are contained within a
tar file at the patch download location. The Collector searches
for user-level libperfctr.so libraries using LD_LIBRARY_PATH,
and then in /usr/local/lib, /usr/lib/, and /lib/ for the 32-bit
versions, or /usr/local/lib64 /usr/lib64/, and /lib64/ for the
64-bit versions.
Each counter definition takes one of the following forms,
depending on whether attributes for hardware counters are sup‐
ported on the processor:
1. [+]ctr[/reg#][,interval]
2. [+]ctr[~attr=val]...[~attrN=valN][/reg#][,interval]
The meanings of the counter definition options are as follows:
Value Meaning
+ Optional parameter that can be applied to memory-
related counters. Causes collect to collect dataspace
data by backtracking to find the instruction that
triggered the overflow, and to find the virtual and
physical addresses of the memory reference. Back‐
tracking works on SPARC processors, and only with
counters of type load, store, or load-store, as dis‐
played in the counter list obtained by running the
collect-h command without any other command-line
arguments. See the section "DATASPACE AND MEMORYSPACE
PROFILING", below.
ctr Processor-specific counter name. You can ascertain the
list of counter names by running the collect-h com‐
mand without any other command-line arguments. On
most systems, even if a counter is not listed, it can
still be specified by a numeric value, either in hexa‐
decimal (0x1234) or decimal. Drivers for older chips
do not support numeric values, but drivers for more
recent chips do.
attr=val On some processors, attribute options can be associ‐
ated with a hardware counter. If the processor sup‐
ports attribute options, then running collect-h with‐
out any other command-line arguments specifies the
counter definition, ctr_def, in the second form listed
above, and provides a list of attribute names to use
for attr. Value val can be in decimal or hexadecimal
format. Hexadecimal format numbers are in C program
format where the number is prepended by a zero and
lower-case x (0xhex_number).
reg# Hardware register to use for the counter. If not spec‐
ified, collect attempts to place the counter into the
first available register and as a result, might be
unable to place subsequent counters due to register
conflicts. If you specify more than one counter, the
counters must use different registers. The list of
allowable register numbers can be ascertained by run‐
ning the collect-h command without any other command-
line arguments.
interval Sampling frequency, set by defining the counter over‐
flow value. Valid values are as follows:
Value Meaning
on Select the default rate, which can be deter‐
mined by running the collect-h command
without any other command-line arguments.
Note that the default value for all raw
counters is the same, and might not be the
most suitable value for a specific counter.
hi Set interval to approximately 10 times
shorter than on.
lo Set interval to approximately 10 times
longer than on.
value Set interval to a specific value, specified
in decimal or hexadecimal format.
An experiment can specify both hardware counter overflow profiling
and clock-based profiling. If hardware counter overflow profiling
is specified, but clock-based profiling is not explicitly specified,
turn off clock-based profiling.
For more information on hardware counters, see the "Hardware Counter
Overflow Profiling" section below.
-s option
Collect synchronization tracing data.
The minimum delay threshold for tracing events is set using
option. The allowed values of option are:
Value Meaning
on Turn on synchronization delay tracing and set the
threshold value by calibration at runtime
calibrate Same as on
off Turn off synchronization delay tracing
n Turn on synchronization delay tracing with a threshold
value of n microseconds; if n is zero, trace all
events
all Turn on synchronization delay tracing and trace all
synchronization events
By default, turn off synchronization delay tracing.
Record synchronization events for Java monitors, but not for native
synchronization within the JVM machine.
On Solaris, the following functions are traced: mutex_lock,
rw_rdlock, rw_wrlock, cond_wait, cond_timedwait, cond_reltimedwait,
thr_join, sema_wait, pthread_mutex_lock, pthread_rwlock_rdlock,
pthread_rwlock_wrlock, pthread_cond_wait, pthread_cond_timedwait,
pthread_cond_reltimedwait_np, pthread_join, and sem_wait.
On Linux, the following functions are traced: pthread_mutex_lock,
pthread_cond_wait, pthread_cond_timedwait, pthread_join, and
sem_wait.
-H option
Collect heap trace data. The allowed values of option are:
Value Meaning
on Turn on tracing of memory allocation requests
off Turn off tracing of memory allocation requests
By default, turn off heap tracing.
Record heap-tracing events for any native calls. Treat calls to mmap
as memory allocations.
Heap profiling is not supported for Java programs. Specifying it is
treated as an error.
Note that heap tracing might produce very large experiments. Such
experiments are very slow to load and browse.
-M option
Specify collection of an MPI experiment. (See MPI PROFILING,
below.) The target of collect should be mpirun, and its argu‐
ments should be separated from the user target (that is the pro‐
grams that are to be run by mpirun) by an inserted -- argument.
The experiment is named as usual, and is referred to as the
"founder experiment"; its directory contains subexperiments for
each of the MPI processes, named by rank. It is recommended that
the -- argument always be used with mpirun, so that an experi‐
ment can be collected by prepending collect and its options to
the mpirun command line.
The allowed values of option are:
Value Meaning
MPI-version
Turn on collection of an MPI experiment, assuming the
MPI version named. The recognized versions of MPI are
printed when you type collect with no arguments, or in
response to an unrecognized version specified with -M.
off Turn off collection of an MPI experiment
By default, turn off collection of an MPI experiment. When an MPI
experiment is turned on, the default setting for -m (see below) is
changed to on.
-m option
Collect MPI tracing data. (See MPI PROFILING, below.)
The allowed values of option are:
Value Meaning
on Turn on MPI tracing information
off Turn off MPI tracing information
By default, turn off MPI tracing, except if the -M flag is enabled,
in which case MPI tracing is turned on by default. Normally, MPI
experiments are collected with -M, and no user control of MPI trac‐
ing is needed. If you want to collect an MPI experiment, but not
collect MPI trace data, you can use the explicit flags:
-M MPI-version -m off.
-c option
Collect count data, using bit(1) instrumentation. This option
is available only on Solaris systems. The allowed values of
option are:
Value Meaning
on Turn on count data
static Turn on simulated count data, based on the assumption
that every instruction was executed exactly once.
off Turn off count data
By default, turn off count data. Count data cannot be collected with
any other type of data. For count data or simulated count data, the
executable and any shared-objects that are instrumented and stati‐
cally linked are counted; for count data, but not simulated count
data, dynamically loaded shared objects are also instrumented and
counted.
In order to collect count data, the executable must be compiled with
the -xbinopt=prepare flag.
-I directory
Specify a directory for bit(1) instrumentation. This option is
available only on Solaris systems, and is meaningful only when
-c is specified.
-N libname
Specify a library to be excluded from bit(1) instrumentation,
whether the library is linked into the executable, or loaded
with dlopen. This option is available only on Solaris systems,
and is meaningful only when -c is also specified. Multiple -N
options can be specified.
-r option
Collect data for data race detection or deadlock detection for
the Thread Analyzer.
The allowed values of option are:
Value Meaning
race Collect data for detecting data races.
deadlock Collect data for detecting deadlocks and potential
deadlocks.
all Collect data for detecting data races, deadlocks, and
potential deadlocks. Can also be specified as
race,deadlock.
off Turn off data collection for data races, deadlocks,
and potential deadlocks.
on Collect data for detecting data races (same as race).
By default, turn off collection of all Thread Analyzer data.
Thread Analyzer data cannot be collected with any tracing data, but
can be collected in conjunction with clock- or hardware counter pro‐
filing data. Thread Analyzer data significantly slows down the exe‐
cution of the target, and profiles might not be meaningful as
applied to the user code.
Thread Analyzer experiments can be examined with either analyzer or
with tha. The latter displays a simplified list of default tabs,
but is otherwise identical.
In order to enable data-race detection, executables must be instru‐
mented, either at compile time, or by invoking a postprocessor. If
the target is not instrumented, and none of the shared objects on
its library list is instrumented, a warning is displayed, but the
experiment is run. Other Thread Analyzer data do not require
instrumentation.
See the tha(1) man page or the Thread Analyzer User's Guide for more
detail.
-S interval
Collect periodic samples at the interval specified (in seconds).
Record data samples from the process, and include a timestamp
and execution statistics from the kernel, among other things.
The allowed values of interval are:
Value Meaning
off Turn off periodic sampling
on Turn on periodic sampling with the default sampling
interval (1 second)
n Turn on periodic sampling with a sampling interval of
n in seconds; n must be positive.
By default, turn on periodic sampling.
If no data specification arguments are supplied, collect clock-based
profiling data, using the default resolution.
If clock-based profiling is explicitly disabled, and neither hard‐
ware counter overflow profiling nor any kind of tracing is enabled,
display a warning that no function-level data is being collected,
then execute the target and record global data.
Experiment Controls
-L size
Limit the amount of profiling and tracing data recorded to size
megabytes. The limit applies to the sum of all profiling data
and tracing data, but not to sample points. The limit is only
approximate, and can be exceeded. When the limit is reached,
stop profiling and tracing data, but keep the experiment open
and record samples until the target process terminates. The
allowed values of size are:
Value Meaning
unlimited or none
Do not impose a size limit on the experiment.
n Impose a limit of n megabytes. The value of n must be
positive and greater than zero.
By default, there is no limit on the amount of data recorded.
-F option
Control whether or not descendant processes should have their
data recorded. The allowed values of option are:
Value Meaning
on Record experiments on descendant processes from fork
and exec
all Record experiments on all descendant processes
off Do not record experiments on descendant processes
=<regex> Record experiments on all descendant processes whose
executable name (a.out name) or lineage match the reg‐
ular expression.
By default, record descendant processes from fork and exec. For
more details, read the sections "FOLLOWING DESCENDANT PROCESSES",
and "PROFILING SCRIPTS" below.
-A option
Control whether or not load objects used by the target process
should be archived or copied into the recorded experiment. The
allowed values of option are:
Value Meaning
on Archive load objects into the experiment.
off Do not archive load objects into the experiment.
copy Copy and archive load objects (the target and any
shared objects it uses) into the experiment.
If you copy experiments onto a different machine, or read the exper‐
iments from a different machine, specify -A copy. Doing so will
consume more disk space but allow the experiment to be read on other
machines. For Java experiments, all .jar files are also copied into
the experiment.
Note that -A copy does not copy any sources or object files (.o's);
it is your responsibility to ensure that those files are accessible
from the machine where the experiment is being examined.
The default setting for -A is on, except for datarace detection and
deadlock experiments, where the default setting is copy.
-j option
Control Java profiling when the target is a JVM machine. The
allowed values of option are:
Value Meaning
on Record profiling data for the JVM machine, and recog‐
nize methods compiled by the Java HotSpot[TM] virtual
machine, and also record Java call stacks.
off Do not record Java profiling data.
<path> Record profiling data for the JVM, and use the JVM as
installed in <path>.
See the section "JAVA PROFILING", below.
You must use -j on to obtain profiling data if the target is a JVM
machine. The -j on option is not needed if the target is a class or
jar file. If you are using a 64-bit JVM machine, you must specify
its path explicitly as the target; do not use the -d64 option for a
32-bit JVM machine. If the -j on option is specified, but the target
is not a JVM machine, an invalid argument might be passed to the
target, and no data would be recorded. The collect command validates
the version of the JVM machine specified for Java profiling.
-J java_arg
Specify additional arguments to be passed to the JVM used for
profiling. If -J is specified, but Java profiling is not spec‐
ified, an error is generated, and no experiment run. The
java_arg must be surrounded by quotes if it contains more than
one argument. It consists of a set of tokens, separated by
either a blank or a tab; each token is passed as a separate
argument to the JVM. Note that most arguments to the JVM must
begin with a "-" character.
-l signal
Record a sample point whenever the given signal is delivered to
the process.
-y signal[,r]
Control recording of data with signal. Whenever the given sig‐
nal is delivered to the process, switch between paused (no data
is recorded) and resumed (data is recorded) states. Start in
the resumed state if the optional ,r flag is given, otherwise
start in the paused state. This option does not affect the
recording of sample points.
Output Controls
-o experiment_name
Use experiment_name as the name of the experiment to be
recorded. The experiment_name must end in the string .er; if
not, print an error message and do not run the experiment.
If -o is not specified, give the experiment a name of the form
stem.n.er, where stem is a string, and n is a number. If a group
name has been specified with -g, set stem to the group name
without the .erg suffix. If no group name has been specified,
set stem to the string "test".
If invoked from one of the commands used to run MPI jobs, for
example, mpirun, but without -M MPI-versions, and -o is not
specified, take the value of n used in the name from the envi‐
ronment variable used to define the MPI rank of that process.
Otherwise, set n to one greater than the highest integer cur‐
rently in use. (See MPI PROFILING, below.)
If the name is not specified in the form stem.n.er, and the
given name is in use, print an error message and do not run the
experiment. If the name is of the form stem.n.er and the name
supplied is in use, record the experiment under a name corre‐
sponding to one greater than the highest value of n that is cur‐
rently in use. Print a warning if the name is changed.
-d directory_name
Place the experiment in directory directory_name. If no direc‐
tory is given, place the experiment in the current working
directory. If a group is specified (see -g, below), the group
file is also written to the directory named by -d.
For the lightest-weight data collection, it is best to record
data to a local file, with -d used to specify a directory in
which to put the data. However, for MPI experiments on a clus‐
ter, the founder experiment must be available at the same path
to all processes to have all data recorded into the founder
experiment.
Experiments written to long-latency file systems are especially
problematic, and might progress very slowly, especially if Sam‐
ple data is collected (-S on, the default). If you must record
over a long-latency connection, disable Sample data.
-g group_name
Add the experiment to the experiment group group_name. The
group_name string must end in the string .erg; if not, report an
error and do not run the experiment.
The first line of a group file must contain the string
#analyzer experiment group
and each subsequent line is the name of an experiment.
-O file
Append all output from collect itself to the named file, but do
not redirect the output from the spawned target. If file is set
to /dev/null suppress all output from collect, including any
error messages.
-t duration
Collect data for the specified duration. duration can be a sin‐
gle number, followed by either m, specifying minutes, or s,
specifying seconds (default), or two such numbers separated by a
- sign. If one number is given, data is collected from the
start of the run until the given time; if two numbers are given,
data is collected from the first time to the second. If the
second time is zero, data is collected until the end of the run.
If two non-zero numbers are given, the first must be less than
the second.
Other Arguments
-P <pid>
Write a script for dbx to attach to the process with the given
PID, and collect data from it, and then invoke dbx with that
script. Only profiling data, not tracing data can be specified,
and timed runs (-t) are not supported.
-C comment
Put the comment into the notes file for the experiment. Up to
ten -C arguments can be supplied.
-n Dry run: do not run the target, but print all the details of the
experiment that would be run. Turn on -v.
-R Display the text version of the performance tools README in the
terminal window. If the README is not found, print a warning.
Do not examine further arguments and do no further processing.
-V Print the current version. Do not examine further arguments and
do no further processing.
-v Print the current version and further detailed information about
the experiment being run.
-x Leave the target process stopped on the exit from the exec sys‐
tem call, in order to allow a debugger to attach to it. The
collect command prints a message with the process PID.
To attach a debugger to the target once it is stopped by
collect, you must follow the procedure below.
- Obtain the PID of the process from the message printed by the
collect-x command
- Start the debugger
- Configure the debugger to ignore SIGPROF and, if you chose to
collect hardware counter data, SIGEMT on Solaris or SIGIO on
Linux
- Attach to the process using the PID.
As the process runs under the control of the debugger, the Col‐
lector records an experiment.
FOLLOWING DESCENDANT PROCESSES
Data from the initial process spawned by collect, called the founder
process, is always collected. Processes can create descendant pro‐
cesses by calling system library functions, including the variants of
fork, exec, system, etc. If a -F argument is used, the collector can
collect data for descendant processes, and it opens a new experiment
for each descendant process inside the parent experiment. These new
experiments are named with their lineage as follows:
- An underscore is appended to the creator's experiment name.
- A code letter is added: either "f" for a fork, or "x" for an exec, or
"c" for other descendants.
- A number is added after the code letter, which is the index of the
fork or exec. The assignment of this number is applied whether the
process was started successfully or not.
- The experiment suffix, ".er" is appended to the lineage.
For example, if the experiment name for the initial process is
"test.1.er", the experiment for the descendant process created by its
third fork is "test.1.er/_f3.er". If that descendant process execs a
new image, the corresponding experiment name is "test.1.er/_f3_x1.er".
If the default, -F on, is used, descendant processes initiated by
calls to fork(2), fork1(2), fork(3F), vfork(2), and exec(2) and its
variants are followed. The call to vfork is replaced internally by a
call to fork1. Descendants created by calls to system(3C), system(3F),
sh(3F), popen(3C), and similar functions, and their associated descen‐
dant processes, are not followed.
If the -F all argument is used, all descendants are followed, including
those from system(3C), system(3F), sh(3F), popen(3C), and similar func‐
tions.
If the -F =<regex> argument is used, all descendants whose name or lin‐
eage match the regular expression are followed. When matching lineage,
the ".er" should be omitted. When matching names, both the command,
and its arguments are part of the expression.
For example, to capture data on the descendant process of the first
exec from the first fork from the first call to system in the founder,
use:
collect-F '=_c1_f1_x1'
To capture data on all the variants of exec, but not fork, use:
collect-F '=.*_x[0-9]/*'
To capture data from a call to system("echo hello")
but not system("goodbye"), use:
collect-F '=echo hello'
The Analyzer and er_print automatically read experiments for descendant
processes when the founder experiment is read, and the experiments for
the descendant processes are selected for data display.
To specifically select the data for display from the command line,
specify the path name explicitly to either er_print or Analyzer. The
specified path must include the founder experiment name, and the
descendant experiment's name inside the founder directory.
For example, to see the data for the third fork of the test.1.er exper‐
iment:
er_print test.1.er/_f3.er
analyzer test.1.er/_f3.er
You can prepare an experiment group file with the explicit names of
descendant experiments of interest.
To examine descendant processes in the Analyzer, load the founder
experiment and choose View > Filter data. The Analyzer displays a list
of experiments with only the founder experiment checked. Uncheck the
founder experiment and check the descendant experiment of interest.
PROFILING SCRIPTS
By default, collect no longer requires that its target be an ELF exe‐
cutable. If collect is invoked on a script, data is collected on the
program launched to execute the script, and on all descendant pro‐
cesses. To collect data only on a specific process, use the -F flag to
specify the name of the executable to follow.
For example, to profile the script foo.sh, but collect data primarily
from the executable bar, use the command:
collect-F =bar foo.sh
Data will be collected on the founder process launched to execute the
script, and all bar processes spawned from the script, but not for
other processes.
JAVA PROFILING
Java profiling consists of collecting a performance experiment on the
JVM machine as it runs your .class or .jar files. If possible, call
stacks are collected in both the Java model and in the machine model.
Data can be shown with view mode set to User, Expert, or Machine. User
mode shows each method by name, with data for interpreted and HotSpot-
compiled methods aggregated together; it also suppresses data for non-
user-Java threads. Expert mode separates HotSpot-compiled methods from
interpreted methods, and does not suppress non-user Java threads.
Machine mode shows data for interpreted Java methods against the JVM
machine as it does the interpreting, while data for methods compiled
with the Java HotSpot virtual machine is reported for named methods.
All threads are shown. In all three modes, data is reported in the
usual way for any non-OpenMP C, C++, or Fortran code called by a Java
target. Such code corresponds to Java native methods. The Analyzer
and the er_print utility can switch between the view mode User, view
mode Expert, and view mode Machine, with User being the default.
Clock-based profiling and hardware counter overflow profiling are sup‐
ported. Synchronization tracing collects data only on the Java monitor
calls, and synchronization calls from native code; it does not collect
data about internal synchronization calls within the JVM.
Heap tracing is not supported for Java, and generates an error if spec‐
ified.
When collect inserts a target name of java into the argument list, it
examines environment variables for a path to the java target, in the
order JDK_HOME, and then JAVA_PATH. For the first of these environment
variables that is set, the resultant target is verified as an ELF exe‐
cutable. If it is not, collect fails with an error indicating which
environment variable was used, and the full path name that was tried.
If neither of those environment variables is set, the collect command
uses the version set by your PATH. If there is no java in your PATH, a
system default of /usr/java/bin/java is tried.
Java Profiling requires Java[TM] 2 SDK (JDK) 5, Update 19 or later JDK
5's; or Java[TM] 2 SDK (JDK) 6, Update 18 or later JDK 6's.
JAVA PROFILING WITH A DLOPEN'd LIBJVM.SO
Some applications are not pure Java, but are C or C++ applications that
invoke dlopen to load libjvm.so, and then start the JVM by calling into
it. To profile such applications, set the environment variable SP_COL‐
LECTOR_USE_JAVA_OPTIONS, and add -j on to the collect command line. Do
not set either LD_LIBRARY_PATH for this scenario.
SHARED_OBJECT HANDLING
Normally, the collect command causes data to be collected for all
shared objects in the address space of the target, whether on the ini‐
tial library list, or explicitly dlopen'd. However, there are some
circumstances under which some shared objects are not profiled.
One such scenario is when the target program is invoked with lazy-load‐
ing. In such cases, the library is not loaded at startup time, and is
not loaded by explicitly calling dlopen, so the shared object name is
not included in the experiment, and all PCs from it are mapped to the
<Unknown> function. The workaround is to set LD_BIND_NOW, to force the
library to be loaded at startup time.
Another such scenario is when the executable is built with the
-B direct linking option. In that case the object is dynamically loaded
by a call specifically to the dynamic linker entry point of dlopen, and
the libcollector interposition is bypassed. The shared object name is
not included in the experiment, and all PCs from it are mapped to the
<Unknown> function. The workaround is to not use -B direct.
OPENMP PROFILING
Data collection for OpenMP programs collects data that can be displayed
in any of the three view modes, just as for Java programs. In User
mode, slave threads are shown as if they were really cloned from the
master thread, and have call stacks matching those from the master
thread. Frames in the call stack coming from the OpenMP runtime code
(libmtsk.so) are suppressed. In Expert user mode, the master and slave
threads are shown differently, and the explicit functions generated by
the compiler are visible, and the frames from the OpenMP runtime code
(libmtsk.so) are suppressed. For Machine mode, the actual native
stacks are shown.
In User mode, various artificial functions are introduced as the leaf
function of a call stack whenever the runtime library is in one of sev‐
eral states. These functions are <OMP-overhead>, <OMP-idle>, <OMP-
reduction>, <OMP-implicit_barrier>, <OMP-explicit_barrier>, <OMP-
lock_wait>, <OMP-critical_section_wait>, and <OMP-ordered_sec‐
tion_wait>.
Two additional clock-profiling metrics are added to the data for clock-
profiling experiments:
OpenMP Work
OpenMP Wait
OpenMP Work is counted when the OpenMP runtime thinks the code is doing
work. It includes time when the process is consuming User-CPU time,
but it also can include time when the process is consuming System-CPU
time, waiting for page faults, waiting for the CPU, etc. Hence, OpenMP
Work can exceed User-CPU time. OpenMP Wait is accumulated when the
OpenMP runtime thinks the process is waiting. OpenMP Wait can include
User-CPU time for busy-waits (spin-waits), but it also includes Other-
Wait time for sleep-waits.
The inclusive metrics are visible by default; the exclusive are not.
Together, the sum of those two metrics equals the Total LWP Time met‐
ric. These metrics are added for all clock- and hardware counter pro‐
filing experiments.
Collecting information for every fork in the execution of the program
can be very expensive. You can suppress that cost by setting the envi‐
ronment variable SP_COLLECTOR_NO_OMP. If you set SP_COLLECTOR_NO_OMP,
the program will have substantially less dilation, but you will not see
the data from slave threads propagate up the caller, and eventually to
main(), as you would when the variable is not set.
A new collector for OpenMP 3.0 is enabled by default in this release.
It can profile programs that use explicit tasking. Programs built with
earlier compilers can be profiled with the new collector only if a
patched version of libmtsk.so is available. If it is not installed,
you can switch data collection to use the old collector by setting the
environment variable SP_COLLECTOR_OLDOMP.
Note that the OpenMP profiling functionality is only available for
applications compiled with the Oracle Solaris Studio compilers, since
it depends on the Oracle Solaris Studio compiler runtime. GNU-compiled
code will only see machine-level call stacks.
DATASPACE AND MEMORYSPACE PROFILING
A dataspace profile is a data collection in which memory-related
events, such as cache misses, are reported against the data object ref‐
erences that cause the events rather than just the instructions where
the memory-related events occur. Dataspace profiling is not available
on systems running the Linux OS, nor on x86 based systems running the
Solaris OS.
A memoryspace profile is similar to a dataspace profile, but events are
not reported against data objects in the program, but rather against
components of the memory subsystem, such as cache-lines or pages.
To allow dataspace profiling, the target can be written in C, C++ or
Fortran, and must be compiled for SPARC architecture, with the
-xhwcprof -xdebugformat=dwarf -g flags, as described above. Further‐
more, the data collected must be hardware counter profiles and the
optional + must be prepended to the counter name. If the optional + is
prepended to one memory-related counter, but not all, the counters
without the + report dataspace data against the <Unknown> data object,
with subtype (Dataspace data not requested during data collection).
On machines with precise interrupts (no backtracking required), memo‐
ryspace profiling does not require the -xhwcprof -xdebugformat=dwarf -g
flags for compilation. Dataspace profiling, even on such machines,
does require the flags.
With the data collected, the er_print utility allows three additional
commands: data_objects, data_single, and data_layout, as well as vari‐
ous commands relating to Memory Objects. See the er_print(1) man page
for more information.
In addition, the Analyzer includes two tabs related to dataspace pro‐
filing, labeled DataObjects and DataLayout, as well as a set of tabs
relating to Memory Objects. See the analyzer(1) man page for more
information.
Clock-based dataspace profiling should only be used on machines that do
not support hardware counter profiling with memory-based counters. It
requires the same compilation flags as for hardware counter profiling.
Data should be interpreted with care, as explained above.
MPI PROFILING
The collect command can be used for MPI profiling to manage collection
of the data from the constituent MPI processes, collect MPI trace data,
and organize the data into a single "founder" experiment, with "subex‐
periments" for each MPI process.
The collect command can be used with MPI by simply prefacing the com‐
mand that starts the MPI job and its arguments with the desired collect
command and its arguments (assuming you have inserted the -- argument
to indicate the end of the mpirun arguments). For example, on an SMP
machine,
% mpirun -np 16 -- a.out 3 5
can be replaced by
% collect-M OMPT mpirun -np 16 -- a.out 3 5
This command runs an MPI tracing experiment on each of the 16 MPI pro‐
cesses, collecting them all in an MPI experiment, named by the usual
conventions for naming experiments. It assumes use of the Oracle Mes‐
sage Passing Toolkit (previously known as sun HPC ClusterTools) version
of MPI.
The initial collect process reformats the mpirun command to specify
running collect with appropriate arguments on each of the individual
MPI processes.
Note that the -- argument immediately before the target name is
required for MPI profiling (although it is optional for mpirun itself),
so that collect can separate the mpirun arguments from the target and
its arguments. If the -- argument is not supplied, collect prints an
error message, and no experiment is run.
Furthermore, a -x PATH argument is added to the mpirun arguments by
collect, so that the remote collect's can find their targets. If any
environment variables in your environment begin with "VT_" or with
"SP_COLLECTOR_", they are passed to the remote collect with -x flags
for each.
MIMD MPI runs are supported, with the similar requirement that there
must be a "--" argument after each ":" (indicating a new target and
local mpirun arguments for it). If the -- argument
is not supplied, collect prints an error message, and no experiment is
run.
Some versions of Oracle Message Passing Toolkit, or Sun HPC Cluster‐
Tools have functionality for MPI State profiling. When clock-profiling
data is collected on an MPI experiment run with such a version of MPI,
two additional metrics can be shown:
MPI Work
MPI Wait
MPI Work accumulates when the process is inside the MPI runtime doing
work, such as processing requests or messages; MPI Wait accumulates
when the process is inside the MPI runtime, but waiting for an event,
buffer, or message.
In the Analyzer, when MPI trace data is collected, two additional tabs
are shown, MPI Timeline and MPI Chart.
The technique of using mpirun to spawn explicit collect commands on the
MPI processes is no longer supported to collect MPI trace data, and
should not be used. It can still be used for all other types of data.
MPI profiling is based on the open source VampirTrace 5.5.3 release.
It recognizes several VampirTrace environment variables, and a new one,
VT_STACKS, which controls whether or not call stacks are recorded in
the data. For further information on the meaning of these variables,
see the VampirTrace 5.5.3 documentation.
The default value of the environment variable VT_BUFFER_SIZE limits the
internal buffer of the MPI API trace collector to 64 MB, and the
default value of VT_MAX_FLUSHES limits the number of times that the
buffer is flushed to 1. Events that are to be recorded after the limits
have been reached are no longer written into the trace file. The envi‐
ronment variables apply to every process of a parallel application,
meaning that applications with n processes will typically create trace
files n times the size of a serial application.
To remove the limit and get a complete trace of an application, set
VT_MAX_FLUSHES to 0. This setting causes the MPI API trace collector to
flush the buffer to disk whenever the buffer is full. To change the
size of the buffer, use the environment variable VT_BUFFER_SIZE. The
optimal value for this variable depends on the application which is to
be traced. Setting a small value will increase the memory available to
the application but will trigger frequent buffer flushes by the MPI API
trace collector. These buffer flushes can significantly change the
behavior of the application. On the other hand, setting a large value,
like 2G, will minimize buffer flushes by the MPI API trace collector,
but decrease the memory available to the application. If not enough
memory is available to hold the buffer and the application data this
might cause parts of the application to be swapped to disk leading also
to a significant change in the behavior of the application.
Another important variable is VT_VERBOSE, which turns on various error
and status messages, and setting it to 2 or higher is recommended if
problems arise.
Normally, MPI trace output data is post-processed when the mpirun tar‐
get exits; a processed data file is written to the experiment, and
information about the post-processing time is written into the experi‐
ment header. MPI post-processing is not done if MPI tracing is explic‐
itly disabled.
In the event of a failure in post-processing, an error is reported, and
no MPI Tabs or MPI tracing metrics will be available.
If the mpirun target does not actually invoke MPI, an experiment will
still be recorded, but no MPI trace data will be produced. The experi‐
ment will report an MPI post-processing error, and no MPI Tabs or MPI
tracing metrics will be available.
If the environment variable VT_UNIFY is set to "0", the post-process‐
ing routines, er_vtunify and er_mpipp will not be run by collect. They
will be run the first time either er_print or analyzer are invoked on
the experiment.
USING COLLECT WITH PPGSZ
The collect command can be used with ppgsz by running the collect com‐
mand on the ppgsz command, and specifying the -F on flag. The founder
experiment is on the ppgsz executable and is uninteresting. If your
path finds the 32-bit version of ppgsz, and the experiment is being run
on a system that supports 64-bit processes, the first thing the collect
command does is execute an exec function on its 64-bit version, creat‐
ing _x1.er. That executable forks, creating _x1_f1.er. The descendant
process attempts to execute an exec function on the named target, in
the first directory on your path, then in the second, and so forth,
until one of the exec functions succeeds. If, for example, the third
attempt succeeds, the first two descendant experiments are named
_x1_f1_x1.er and _x1_f1_x2.er, and both are completely empty. The
experiment on the target is the one from the successful exec, the third
one in the example, and is named _x1_f1_x3.er, stored under the founder
experiment. It can be processed directly by invoking the Analyzer or
the er_print utility on test.1.er/_x1_f1_x3.er.
If the 64-bit ppgsz is the initial process run, or if the 32-bit ppgsz
is invoked on a 32-bit kernel, the fork descendant that executes exec
on the real target has its data in _f1.er, and the real target's exper‐
iment is in _f1_x3.er, assuming the same path properties as in the
example above.
See the section "FOLLOWING DESCENDANT PROCESSES", above. For more
information on hardware counters, see the "Hardware Counter Overflow
Profiling" section below.
USING COLLECT ON SETUID/SETGID TARGETS
The collect command operates by inserting a shared library, lib‐
collector.so, into the target's address space (LD_PRELOAD), and
by using a second shared library, collaudit.so, to record
shared-object use with the runtime linker's audit interface
(LD_AUDIT). Those two shared libraries write the files that
constitute the experiment.
Several problems might arise if collect is invoked on executables that
call setuid or setgid, or that create descendant processes that call
setuid or setgid. If the user running the experiment is not root, col‐
lection fails because the shared libraries are not installed in a
trusted directory. The workaround is to run the experiments as root,
or use crle(1) to grant permission. Users should, of course, take
great care when circumventing security barriers, and do so at their own
risk.
In addition, the umask for the user running the collect command must be
set to allow write permission for that user, and for any users or
groups that are set by the setuid/setgid attributes of a program being
exec'd and for any user or group to which that program sets itself. If
the mask is not set properly, some files might not be written to the
experiment, and processing of the experiment might not be possible. If
the log file can be written, an error is shown when the user attempts
to process the experiment.
Other problems can arise if the target itself makes any of the system
calls to set UID or GID, or if it changes its umask and then forks or
runs exec on some other process, or crle was used to configure how the
runtime linker searches for shared objects.
If an experiment is started as root on a target that changes its effec‐
tive GID, the er_archive process that is automatically run when the
experiment terminates fails, because it needs a shared library that is
not marked as trusted. In that case, you can run er_archive (or
er_print or Analyzer) explicitly by hand, on the machine on which the
experiment was recorded, immediately following the termination of the
experiment.
DATA COLLECTED
Three types of data are collected: profiling data, tracing data and
sampling data. The data packets recorded in profiling and tracing
include the callstack of each LWP, the LWP, thread, and CPU IDs, and
some event-specific data. The data packets recorded in sampling contain
global data such as execution statistics, but no program-specific or
event-specific data. All data packets include a timestamp.
Clock-based Profiling
The event-specific data recorded in clock-based profiling is an
array of counts for each accounting microstate. The microstate
array is incremented by the system at a prescribed frequency, and
is recorded by the Collector when a profiling signal is processed.
Clock-based profiling can run at a range of frequencies which must
be multiples of the clock resolution used for the profiling timer.
If you try to do high-resolution profiling on a machine with an
operating system that does not support it, the command prints a
warning message and uses the highest resolution supported. Simi‐
larly, a custom setting that is not a multiple of the resolution
supported by the system is rounded down to the nearest non-zero
multiple of that resolution, and a warning message is printed.
Clock-based profiling data is converted into the following met‐
rics:
User CPU Time
Wall Time
Total LWP Time
System CPU Time
Wait CPU Time
User Lock Time
Text Page Fault Time
Data Page Fault Time
Other Wait Time
For experiments on multithreaded applications, all of the times,
other than Wall Time, are summed across all LWPs in the process;
Wall Time is the time spent in all states for LWP 1 only. Total
LWP Time adds up to the real elapsed time, multiplied by the aver‐
age number of LWPs in the process.
If clock-based profiling is performed on an OpenMP program, two
additional metrics:
OpenMP Work
OpenMP Wait
are provided. On Solaris, OpenMP Work accumulates when work is
being done in parallel. OpenMP Wait accumulates when the OpenMP
runtime is waiting for synchronization, and accumulates whether
the wait is using CPU time or sleeping, or when work is being done
in parallel, but the thread is not scheduled on a CPU.
On Linux, OpenMP Work and OpenMP Wait are accumulated only when
the process is active in either user or system mode. Unless you
have specified that OpenMP should do a busy wait, OpenMP Wait on
Linux will not be useful.
If clock-based profiling is performed on an MPI program, run under
Oracle Message Passing Toolkit or Sun HPC ClusterTools release 8.1
or later, two additional metrics:
MPI Work
MPI Wait
are provided. On Solaris, MPI Work accumulates when the MPI run‐
time is active. MPI Wait accumulates when the MPI runtime is
waiting for the send or receive of a message, or when the MPI run‐
time is active, but the thread is not running on a CPU.
On Linux, MPI Work and MPI Wait are accumulated only when the
process is active in either user or system mode. Unless you have
specified that MPI should do a busy wait, MPI Wait on Linux will
not be useful. If clock-based dataspace profiling is specified,
an additional metric:
Max. Mem Stalls
is provided.
Hardware Counter Overflow Profiling
Hardware counter overflow profiling records the number of events
counted by the hardware counter at the time the overflow signal
was processed. This type of profiling is now available on systems
running the Linux OS, provided that they have the Perfctr patch
installed.
Hardware counter overflow profiling can be done on systems that
support overflow profiling and that include the hardware counter
shared library, libcpc.so(3). You must use a version of the
Solaris OS no earlier than the Solaris 10 OS. On SPARC-based com‐
puters, you must use a version of the hardware no earlier than the
UltraSPARC III hardware. On computers that do not support over‐
flow profiling, an attempt to select hardware counter overflow
profiling generates an error.
The counters available depend on the specific processor chip and
operating system. Running the command collect-h with no other
arguments prints out a usage message that contains the names of
the counters. The counters that are aliased to common names are
displayed first in the list, followed by a list of the raw hard‐
ware counters. If neither the performance counter subsystem nor
collect know the names for the counters on a specific chip, the
tables are empty. In most cases, however, the counters can be
specified numerically. The lines of output are formatted similar
to the following:
Aliased HW counters available for profiling:
cycles[/{0|1}],9999991 ('CPU Cycles', alias for Cycle_cnt; CPU-cycles)
insts[/{0|1}],9999991 ('Instructions Executed', alias for Instr_cnt; events)
dcrm[/1],100003 ('D$ Read Misses', alias for DC_rd_miss; load events)
...
Raw HW counters available for profiling:
Cycle_cnt[/{0|1}],1000003 (CPU-cycles)
Instr_cnt[/{0|1}],1000003 (events)
DC_rd[/0],1000003 (load events)
SI_snoop[/0],1000003 (not-program-related events)
...
In the first line of aliased counter output, the first field,
"cycles", gives the counter name that can be used in the -h
counter... argument. It is followed by a specification of which
registers can be used for that counter. The next field,
"9999991", is the default overflow value for that counter. The
next field in parentheses, "CPU Cycles", is the metric name, fol‐
lowed by the raw hardware counter name. The last field, "CPU-
cycles", specifies the type of units being counted. There can be
up to two words for the type of information. The second or only
word of the type information can be either "CPU-cycles" or
"events". If the counter can be used to provide a time-based met‐
ric, the value is CPU-cycles; otherwise it is events.
The second output line of the aliased counter output above has
"events" instead of "CPU-cycles" at the end of the line, indicat‐
ing that it counts events, and cannot be converted to a time.
The third output line above has two words of type information,
"load events", at the end of the line. The first word of type
information can have the value of "load", "store", "load-store",
or "not-program-related". The first three of these type values
indicate that the counter is memory-related and the counter name
can be preceded by the "+" sign when used in the collect-h com‐
mand. The "+" sign indicates the request for data collection to
attempt to find the precise instruction and virtual address that
caused the event on the counter that overflowed.
On some chips, the counter interrupts are precise, and no back‐
tracking is needed. Such counters are indicated by the word
"(precise)" following the event type.
The "not-program-related" value indicates that the counter cap‐
tures events initiated by some other program, such as CPU-to-CPU
cache snoops. Using the counter for profiling generates a warning
and profiling does not record a call stack. It does, however, show
the time being spent in an artificial function called "collec‐
tor_not_program_related". Thread IDs and LWP IDs are recorded, but
are meaningless.
Each line in the raw hardware counter list includes the internal
counter name as used by cputrack(1), the register number(s) on
which that counter can be used, the default overflow value, and
the counter units, which is either CPU-cycles or events.
EXAMPLES:
Example 1: Using the aliased counter information listed in the
above sample output, the following command:
collect-h cycles/0,hi,+dcrm,9999
enables the CPU Cycle profiling on register 0. The "hi" value
enables a sample rate that is approximately 10 times faster than
the default rate of 9999991. The "dcrm" value enables the D$ Read
Miss profiling on register 1 and the preceding "+" enables Datas‐
pace profiling for the dcrm. The "9999" value sets the sampling to
be done every 9999 read misses, instead of the default value of
every 100003 read misses.
Example 2:
Running the collect-h command with no other arguments on an AMD
Opteron machine would produce a raw hardware counter output simi‐
lar to the following :
FP_dispatched_fpu_ops[/{0|1|2|3}],1000003 (events)
FP_cycles_no_fpu_ops_retired[/{0|1|2|3}],1000003 (CPU-cycles)
...
Using the above raw hardware counter output, the following com‐
mand:
collect-h FP_dispatched_fpu_ops~umask=0x3/2,10007
enables the Floating Point Add and Multiply operations to be
tracked at the rate of 1 capture every 10007 events. (For more
details on valid attribute values, refer to the processor documen‐
tation). The "/2" value specifies the data is to be captured using
the register 2 of the hardware.
Synchronization Delay Tracing
Synchronization delay tracing records all calls to the various
thread synchronization routines where the real-time delay in the
call exceeds a specified threshold. The data packet contains time‐
stamps for entry and exit to the synchronization routines, the
thread ID, and the LWP ID at the time the request is initiated.
(Synchronization requests from a thread can be initiated on one
LWP, but complete on another.)
Synchronization delay tracing data is converted into the following
metrics:
Synchronization Delay Events
Synchronization Wait Time
Heap Tracing
Heap tracing records all calls to malloc, free, realloc, memalign,
and valloc with the size of the block requested, its address, and
for realloc, the previous address.
Heap tracing data is converted into the following metrics:
Leaks
Bytes Leaked
Allocations
Bytes Allocated
Leaks are defined as allocations that are not freed. If a zero-
length block is allocated, it counts as an allocation with zero
bytes allocated. If a zero-length block is not freed, it counts as
a leak with zero bytes leaked.
For applications written in the Java[TM] programming language,
leaks are defined as allocations that have not been garbage-col‐
lected. Heap profiling for such applications is obsolescent and
will not be supported in future releases.
Heap tracing experiments can be very large, and might be slow to
process.
MPI Tracing
MPI tracing records calls to the MPI library for functions that
can take a significant amount of time to complete. MPI tracing is
implemented using the Open Source Vampir Trace code.
MPI tracing data is converted into the following metrics:
MPI Time
MPI Sends
MPI Bytes Sent
MPI Receives
MPI Bytes Received
Other MPI Events
MPI Time is the total LWP time spent in the MPI function. If MPI
state times are also collected, MPI Work Time plus MPI Wait Time
for all MPI functions other than MPI_Init and MPI_Finalize should
approximately equal MPI Work Time. On Linux, MPI Wait and MPI
Work are based on user+system CPU time, while MPI Time is based on
real time, so the numbers will not match.
The MPI Bytes Received metric counts the actual number of bytes
received in all messages. MPI Bytes Sent counts the actual number
of bytes sent in all messages. MPI Sends counts the number of
messages sent, and MPI Recieves counts the number of messages
received. MPI_Sendrecv counts as both a send and a receive. MPI
Other Events counts the events in the trace that are neither sends
nor receives.
Count Data
Count data is recorded by instrumenting the executable, and count‐
ing the number of times each instruction was executed. It also
counts the number of times the first instruction in a function is
executed, and calls that the function execution count.
Count data is converted into the following metric:
Bit Func Count
Bit Inst Exec
Bit Inst Annul
Data-race Detection Data
Data-race detection data consists of pairs of race-access events
that constitute a race. The events are combined into a race, and
races for which the call stacks for the two access are identical
are merged into a race group.
Data-race detection data is converted into the following metric:
Race Accesses
Deadlock Detection Data
Deadlock detection data consists of pairs of threads with con‐
flicting locks.
Deadlock detection data is converted into the following metric:
Deadlocks
Sampling and Global Data
Sampling refers to the process of generating markers along the
time line of execution. At each sample point, execution statistics
are recorded. All of the data recorded at sample points is global
to the program, and does not map to function-level metrics.
Samples are always taken at the start of the process, and at its
termination. By default or if a non-zero -S argument is specified,
samples are taken periodically at the specified interval. In
addition, samples can be taken by using the libcollector(3) API.
The data recorded at each sample point consists of microstate
accounting information from the kernel, along with various other
statistics maintained within the kernel.
RESTRICTIONS
The Collector can support up to 16K user threads. Data from additional
threads is discarded, and a collector error generated. To support more
threads, set the environment variable SP_COLLECTOR_NUMTHREADS to a
larger number.
By default, the Collector collects stacks that are 256 frames deep. To
support deeper stacks, set the environment variable SP_COLLECTOR_STACK‐
BUFSZ to a larger number.
The Collector interposes on some signal-handling routines to ensure
that its use of SIGPROF signals for clock-based profiling and SIGEMT
(Solaris) or SIGIO (Linux) for hardware counter overflow profiling is
not disrupted by the target program. The Collector library re-installs
its own signal handler if the target program installs a signal handler.
The Collector's signal handler sets a flag that ensures that system
calls are not interrupted to deliver signals. This setting could change
the behavior of the target program.
The Collector interposes on setitimer(2) to ensure that the profiling
timer is not available to the target program if clock-based profiling
is enabled.
The Collector interposes on functions in the hardware counter library,
libcpc.so, so that an application cannot use hardware counters while
the Collector is collecting performance data. The interposed functions
return a value of -1.
Dataspace profiling is not available on systems running the Linux OS,
nor on x86 based systems running the Solaris OS.
For this release, the data from collecting periodic samples is not
reliable on systems running the Linux OS.
For this release, wide data discrepancies are observed when profiling
multithreaded applications on systems running the RedHat Enterprise
Linux OS.
Hardware counter overflow profiling cannot be run on a system where
cpustat is running, because cpustat takes control of the counters, and
does not let a user process use them.
Java Profiling requires Java[TM] 2 SDK (JDK) 5, Update 19 or later JDK
5's; or Java[TM] 2 SDK (JDK) 6, Update 18 or later JDK 6's.
Data is not collected on descendant processes that are created to use
the setuid attribute, nor on any descendant processes created with an
exec function run on an executable that is not dynamically linked.
Furthermore, subsequent descendant processes might produce corrupted or
unreadable experiments. The workaround is to ensure that all processes
spawned are dynamically-linked and do not have the setuid attribute.
Applications that call vfork(2) have these calls replaced by a call to
fork1(2).
SEE ALSOanalyzer(1), collector(1), dbx(1), er_archive(1), er_cp(1),
er_export(1), er_mv(1), er_print(1), er_rm(1), tha(1), libcollector(3),
and the Performance Analyzer manual.
September 2011 collect(1)