Perf
Jump to navigation
Jump to search
Overview
성능 분석 도구 perf 내용 정리
perf stat
perf stat: obtain event counts
Options
$ perf stat -h usage: perf stat [<options>] [<command>] -T, --transaction hardware transaction statistics -e, --event <event> event selector. use 'perf list' to list available events --filter <filter> event filter -i, --no-inherit child tasks do not inherit counters -p, --pid <pid> stat events on existing process id -t, --tid <tid> stat events on existing thread id -a, --all-cpus system-wide collection from all CPUs -g, --group put the counters into a counter group -c, --scale scale/normalize counters -v, --verbose be more verbose (show counter open errors, etc) -r, --repeat <n> repeat command and print average + stddev (max: 100, forever: 0) -n, --null null run - dont start any counters -d, --detailed detailed run - start a lot of events -S, --sync call sync() before starting a run -B, --big-num print large numbers with thousands' separators -C, --cpu <cpu> list of cpus to monitor in system-wide -A, --no-aggr disable CPU count aggregation -x, --field-separator <separator> print counts with custom separator -G, --cgroup <name> monitor event in cgroup name only -o, --output <file> output file name --append append to the output file --log-fd <n> log output to fd, instead of stderr --pre <command> command to run prior to the measured command --post <command> command to run after to the measured command -I, --interval-print <n> print counts at regular interval in ms (>= 100) --per-socket aggregate counts per processor socket --per-core aggregate counts per physical processor core -D, --delay <n> ms to wait before starting measurement after program start
Example
Example 1 <source lang=bash> $ perf stat ./main
Performance counter stats for './main':
586.620001 task-clock (msec) # 1.000 CPUs utilized 53 context-switches # 0.090 K/sec 1 cpu-migrations # 0.002 K/sec 921 page-faults # 0.002 M/sec 1,817,562,234 cycles # 3.098 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 944,063,599 instructions # 0.52 insns per cycle 244,429,638 branches # 416.675 M/sec 494,951 branch-misses # 0.20% of all branches
0.586831274 seconds time elapsed
</source>
Example 2 <source lang=bash> $ perf stat --repeat 10 -e cycles:u -e instructions:u -e cache-references:u\
-e cache-misses:u -e stalled-cycles-frontend:u -e stalled-cycles-backend:u\ -e ref-cycles:u -e branch-instructions:u -e branch-misses:u ./main
Performance counter stats for './main' (10 runs):
1,494,202,725 cycles:u ( +- 1.83% ) [71.14%] 935,216,574 instructions:u # 0.63 insns per cycle ( +- 0.03% ) [85.62%] 17,506,493 cache-references:u ( +- 0.29% ) [85.69%] 14,663,441 cache-misses:u # 83.760 % of all cache refs ( +- 0.10% ) [71.43%] <not supported> stalled-cycles-frontend:u <not supported> stalled-cycles-backend:u 1,275,469,977 ref-cycles:u ( +- 0.65% ) [85.75%] 242,376,389 branch-instructions:u ( +- 0.06% ) [86.18%] 486,334 branch-misses:u # 0.20% of all branches ( +- 0.07% ) [85.86%]
0.609760790 seconds time elapsed ( +- 0.70% )
</source>
perf record
perf record: record events for later reporting
Options
$ perf record -h usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -e, --event <event> event selector. use 'perf list' to list available events --filter <filter> event filter -p, --pid <pid> record events on existing process id -t, --tid <tid> record events on existing thread id -r, --realtime <n> collect data with this RT SCHED_FIFO priority -D, --no-delay collect data without buffering -R, --raw-samples collect raw sample records from all opened counters -a, --all-cpus system-wide collection from all CPUs -C, --cpu <cpu> list of cpus to monitor -c, --count <n> event period to sample -o, --output <file> output file name -i, --no-inherit child tasks do not inherit counters -F, --freq <n> profile at this frequency -m, --mmap-pages <pages> number of mmap data pages --group put the counters into a counter group -g enables call-graph recording --call-graph <mode[,dump_size]> setup and enables call-graph (stack chain/backtrace) recording: fp dwarf -v, --verbose be more verbose (show counter open errors, etc) -q, --quiet don't print any message -s, --stat per thread counts -d, --data Sample addresses -T, --timestamp Sample timestamps -P, --period Sample period -n, --no-samples don't sample -N, --no-buildid-cache do not update the buildid cache -B, --no-buildid do not collect buildids in perf.data -G, --cgroup <name> monitor event in cgroup name only -u, --uid <user> user to profile -b, --branch-any sample any taken branches -j, --branch-filter <branch filter mask> branch stack filter modes -W, --weight sample by weight (on special events only) --transaction sample transaction flags (special events only) --force-per-cpu force the use of per-cpu mmaps
perf report
perf report: break down events by process, function, etc.
perf annotate
perf annotate: annotate assembly or source code with event counts
perf top
perf top: see live event count
perf bench
perf bench: run different kernel microbenchmarks
See also
- https://kldp.org/node/155216 - 최적화 도중에 난제를 만났습니다.(지역성 관련)
- https://perf.wiki.kernel.org/index.php/Main_Page - perf: Linux profiling with performance counters