Perf

From 탱이의 잡동사니
Jump to navigation Jump to search

Overview

성능 분석 도구 perf 내용 정리

perf stat

perf stat: obtain event counts

Options

$ perf stat -h

 usage: perf stat [<options>] [<command>]

    -T, --transaction     hardware transaction statistics
    -e, --event <event>   event selector. use 'perf list' to list available events
        --filter <filter>
                          event filter
    -i, --no-inherit      child tasks do not inherit counters
    -p, --pid <pid>       stat events on existing process id
    -t, --tid <tid>       stat events on existing thread id
    -a, --all-cpus        system-wide collection from all CPUs
    -g, --group           put the counters into a counter group
    -c, --scale           scale/normalize counters
    -v, --verbose         be more verbose (show counter open errors, etc)
    -r, --repeat <n>      repeat command and print average + stddev (max: 100, forever: 0)
    -n, --null            null run - dont start any counters
    -d, --detailed        detailed run - start a lot of events
    -S, --sync            call sync() before starting a run
    -B, --big-num         print large numbers with thousands' separators
    -C, --cpu <cpu>       list of cpus to monitor in system-wide
    -A, --no-aggr         disable CPU count aggregation
    -x, --field-separator <separator>
                          print counts with custom separator
    -G, --cgroup <name>   monitor event in cgroup name only
    -o, --output <file>   output file name
        --append          append to the output file
        --log-fd <n>      log output to fd, instead of stderr
        --pre <command>   command to run prior to the measured command
        --post <command>  command to run after to the measured command
    -I, --interval-print <n>
                          print counts at regular interval in ms (>= 100)
        --per-socket      aggregate counts per processor socket
        --per-core        aggregate counts per physical processor core
    -D, --delay <n>       ms to wait before starting measurement after program start

Example

<source lang=bash> $ perf stat --repeat 10 -e cycles:u -e instructions:u -e cache-references:u\

    -e cache-misses:u -e stalled-cycles-frontend:u -e stalled-cycles-backend:u\
    -e ref-cycles:u -e branch-instructions:u -e branch-misses:u ./main
Performance counter stats for './main' (10 runs):
    1,494,202,725 cycles:u                   ( +-  1.83% ) [71.14%]
      935,216,574 instructions:u            #    0.63  insns per cycle          ( +-  0.03% ) [85.62%]
       17,506,493 cache-references:u                                            ( +-  0.29% ) [85.69%]
       14,663,441 cache-misses:u            #   83.760 % of all cache refs      ( +-  0.10% ) [71.43%]
  <not supported> stalled-cycles-frontend:u
  <not supported> stalled-cycles-backend:u
    1,275,469,977 ref-cycles:u                                                  ( +-  0.65% ) [85.75%]
      242,376,389 branch-instructions:u                                         ( +-  0.06% ) [86.18%]
          486,334 branch-misses:u           #    0.20% of all branches          ( +-  0.07% ) [85.86%]
      0.609760790 seconds time elapsed                                          ( +-  0.70% )

</source> <source lang=bash> $ perf stat ./main

Performance counter stats for './main':
       586.620001 task-clock (msec)         #    1.000 CPUs utilized          
               53 context-switches          #    0.090 K/sec                  
                1 cpu-migrations            #    0.002 K/sec                  
              921 page-faults               #    0.002 M/sec                  
    1,817,562,234 cycles                    #    3.098 GHz                    
  <not supported> stalled-cycles-frontend 
  <not supported> stalled-cycles-backend  
      944,063,599 instructions              #    0.52  insns per cycle        
      244,429,638 branches                  #  416.675 M/sec                  
          494,951 branch-misses             #    0.20% of all branches        
      0.586831274 seconds time elapsed

</source>

perf record

perf record: record events for later reporting

Options

$ perf record -h

 usage: perf record [<options>] [<command>]
    or: perf record [<options>] -- <command> [<options>]

    -e, --event <event>   event selector. use 'perf list' to list available events
        --filter <filter>
                          event filter
    -p, --pid <pid>       record events on existing process id
    -t, --tid <tid>       record events on existing thread id
    -r, --realtime <n>    collect data with this RT SCHED_FIFO priority
    -D, --no-delay        collect data without buffering
    -R, --raw-samples     collect raw sample records from all opened counters
    -a, --all-cpus        system-wide collection from all CPUs
    -C, --cpu <cpu>       list of cpus to monitor
    -c, --count <n>       event period to sample
    -o, --output <file>   output file name
    -i, --no-inherit      child tasks do not inherit counters
    -F, --freq <n>        profile at this frequency
    -m, --mmap-pages <pages>
                          number of mmap data pages
        --group           put the counters into a counter group
    -g                    enables call-graph recording
        --call-graph <mode[,dump_size]>
                          setup and enables call-graph (stack chain/backtrace) recording: fp dwarf
    -v, --verbose         be more verbose (show counter open errors, etc)
    -q, --quiet           don't print any message
    -s, --stat            per thread counts
    -d, --data            Sample addresses
    -T, --timestamp       Sample timestamps
    -P, --period          Sample period
    -n, --no-samples      don't sample
    -N, --no-buildid-cache
                          do not update the buildid cache
    -B, --no-buildid      do not collect buildids in perf.data
    -G, --cgroup <name>   monitor event in cgroup name only
    -u, --uid <user>      user to profile
    -b, --branch-any      sample any taken branches
    -j, --branch-filter <branch filter mask>
                          branch stack filter modes
    -W, --weight          sample by weight (on special events only)
        --transaction     sample transaction flags (special events only)
        --force-per-cpu   force the use of per-cpu mmaps

perf report

perf report: break down events by process, function, etc.

perf annotate

perf annotate: annotate assembly or source code with event counts

perf top

perf top: see live event count

perf bench

perf bench: run different kernel microbenchmarks

See also