Perf

Overview

성능 분석 도구 perf 내용 정리

perf stat

perf stat: obtain event counts

Options

$ perf stat -h

 usage: perf stat [<options>] [<command>]

    -T, --transaction     hardware transaction statistics
    -e, --event <event>   event selector. use 'perf list' to list available events
        --filter <filter>
                          event filter
    -i, --no-inherit      child tasks do not inherit counters
    -p, --pid <pid>       stat events on existing process id
    -t, --tid <tid>       stat events on existing thread id
    -a, --all-cpus        system-wide collection from all CPUs
    -g, --group           put the counters into a counter group
    -c, --scale           scale/normalize counters
    -v, --verbose         be more verbose (show counter open errors, etc)
    -r, --repeat <n>      repeat command and print average + stddev (max: 100, forever: 0)
    -n, --null            null run - dont start any counters
    -d, --detailed        detailed run - start a lot of events
    -S, --sync            call sync() before starting a run
    -B, --big-num         print large numbers with thousands' separators
    -C, --cpu <cpu>       list of cpus to monitor in system-wide
    -A, --no-aggr         disable CPU count aggregation
    -x, --field-separator <separator>
                          print counts with custom separator
    -G, --cgroup <name>   monitor event in cgroup name only
    -o, --output <file>   output file name
        --append          append to the output file
        --log-fd <n>      log output to fd, instead of stderr
        --pre <command>   command to run prior to the measured command
        --post <command>  command to run after to the measured command
    -I, --interval-print <n>
                          print counts at regular interval in ms (>= 100)
        --per-socket      aggregate counts per processor socket
        --per-core        aggregate counts per physical processor core
    -D, --delay <n>       ms to wait before starting measurement after program start

Example

Example 1 <source lang=bash> $ perf stat ./main

Performance counter stats for './main':

       586.620001 task-clock (msec)         #    1.000 CPUs utilized          
               53 context-switches          #    0.090 K/sec                  
                1 cpu-migrations            #    0.002 K/sec                  
              921 page-faults               #    0.002 M/sec                  
    1,817,562,234 cycles                    #    3.098 GHz                    
  <not supported> stalled-cycles-frontend 
  <not supported> stalled-cycles-backend  
      944,063,599 instructions              #    0.52  insns per cycle        
      244,429,638 branches                  #  416.675 M/sec                  
          494,951 branch-misses             #    0.20% of all branches

      0.586831274 seconds time elapsed

</source>

Example 2 <source lang=bash> $ perf stat --repeat 10 -e cycles:u -e instructions:u -e cache-references:u\

    -e cache-misses:u -e stalled-cycles-frontend:u -e stalled-cycles-backend:u\
    -e ref-cycles:u -e branch-instructions:u -e branch-misses:u ./main

Performance counter stats for './main' (10 runs):

    1,494,202,725 cycles:u                   ( +-  1.83% ) [71.14%]
      935,216,574 instructions:u            #    0.63  insns per cycle          ( +-  0.03% ) [85.62%]
       17,506,493 cache-references:u                                            ( +-  0.29% ) [85.69%]
       14,663,441 cache-misses:u            #   83.760 % of all cache refs      ( +-  0.10% ) [71.43%]
  <not supported> stalled-cycles-frontend:u
  <not supported> stalled-cycles-backend:u
    1,275,469,977 ref-cycles:u                                                  ( +-  0.65% ) [85.75%]
      242,376,389 branch-instructions:u                                         ( +-  0.06% ) [86.18%]
          486,334 branch-misses:u           #    0.20% of all branches          ( +-  0.07% ) [85.86%]

      0.609760790 seconds time elapsed                                          ( +-  0.70% )

</source>

perf record

perf record: record events for later reporting

Options

$ perf record -h

 usage: perf record [<options>] [<command>]
    or: perf record [<options>] -- <command> [<options>]

    -e, --event <event>   event selector. use 'perf list' to list available events
        --filter <filter>
                          event filter
    -p, --pid <pid>       record events on existing process id
    -t, --tid <tid>       record events on existing thread id
    -r, --realtime <n>    collect data with this RT SCHED_FIFO priority
    -D, --no-delay        collect data without buffering
    -R, --raw-samples     collect raw sample records from all opened counters
    -a, --all-cpus        system-wide collection from all CPUs
    -C, --cpu <cpu>       list of cpus to monitor
    -c, --count <n>       event period to sample
    -o, --output <file>   output file name
    -i, --no-inherit      child tasks do not inherit counters
    -F, --freq <n>        profile at this frequency
    -m, --mmap-pages <pages>
                          number of mmap data pages
        --group           put the counters into a counter group
    -g                    enables call-graph recording
        --call-graph <mode[,dump_size]>
                          setup and enables call-graph (stack chain/backtrace) recording: fp dwarf
    -v, --verbose         be more verbose (show counter open errors, etc)
    -q, --quiet           don't print any message
    -s, --stat            per thread counts
    -d, --data            Sample addresses
    -T, --timestamp       Sample timestamps
    -P, --period          Sample period
    -n, --no-samples      don't sample
    -N, --no-buildid-cache
                          do not update the buildid cache
    -B, --no-buildid      do not collect buildids in perf.data
    -G, --cgroup <name>   monitor event in cgroup name only
    -u, --uid <user>      user to profile
    -b, --branch-any      sample any taken branches
    -j, --branch-filter <branch filter mask>
                          branch stack filter modes
    -W, --weight          sample by weight (on special events only)
        --transaction     sample transaction flags (special events only)
        --force-per-cpu   force the use of per-cpu mmaps

perf report

perf report: break down events by process, function, etc.

perf annotate

perf annotate: annotate assembly or source code with event counts

perf top

perf top: see live event count

perf bench

perf bench: run different kernel microbenchmarks

Perf

Contents

Overview

perf stat

Options

Example

perf record

Options

perf report

perf annotate

perf top

perf bench

See also

Navigation menu

Perf

Overview

perf stat

Options

Example

perf record

Options

perf report

perf annotate

perf top

perf bench

See also

Navigation menu

Search