Perf: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
| (One intermediate revision by the same user not shown) | |||
| Line 45: | Line 45: | ||
=== Example === | === Example === | ||
Example 1 | |||
<source lang=bash> | |||
$ perf stat ./main | |||
Performance counter stats for './main': | |||
586.620001 task-clock (msec) # 1.000 CPUs utilized | |||
53 context-switches # 0.090 K/sec | |||
1 cpu-migrations # 0.002 K/sec | |||
921 page-faults # 0.002 M/sec | |||
1,817,562,234 cycles # 3.098 GHz | |||
<not supported> stalled-cycles-frontend | |||
<not supported> stalled-cycles-backend | |||
944,063,599 instructions # 0.52 insns per cycle | |||
244,429,638 branches # 416.675 M/sec | |||
494,951 branch-misses # 0.20% of all branches | |||
0.586831274 seconds time elapsed | |||
</source> | |||
Example 2 | |||
<source lang=bash> | <source lang=bash> | ||
$ perf stat --repeat 10 -e cycles:u -e instructions:u -e cache-references:u\ | $ perf stat --repeat 10 -e cycles:u -e instructions:u -e cache-references:u\ | ||
| Line 63: | Line 84: | ||
0.609760790 seconds time elapsed ( +- 0.70% ) | 0.609760790 seconds time elapsed ( +- 0.70% ) | ||
</source> | </source> | ||
| Line 149: | Line 152: | ||
* https://perf.wiki.kernel.org/index.php/Main_Page - perf: Linux profiling with performance counters | * https://perf.wiki.kernel.org/index.php/Main_Page - perf: Linux profiling with performance counters | ||
[[category: | [[category:command/utility]] | ||
Latest revision as of 12:05, 27 July 2016
Overview
성능 분석 도구 perf 내용 정리
perf stat
perf stat: obtain event counts
Options
$ perf stat -h
usage: perf stat [<options>] [<command>]
-T, --transaction hardware transaction statistics
-e, --event <event> event selector. use 'perf list' to list available events
--filter <filter>
event filter
-i, --no-inherit child tasks do not inherit counters
-p, --pid <pid> stat events on existing process id
-t, --tid <tid> stat events on existing thread id
-a, --all-cpus system-wide collection from all CPUs
-g, --group put the counters into a counter group
-c, --scale scale/normalize counters
-v, --verbose be more verbose (show counter open errors, etc)
-r, --repeat <n> repeat command and print average + stddev (max: 100, forever: 0)
-n, --null null run - dont start any counters
-d, --detailed detailed run - start a lot of events
-S, --sync call sync() before starting a run
-B, --big-num print large numbers with thousands' separators
-C, --cpu <cpu> list of cpus to monitor in system-wide
-A, --no-aggr disable CPU count aggregation
-x, --field-separator <separator>
print counts with custom separator
-G, --cgroup <name> monitor event in cgroup name only
-o, --output <file> output file name
--append append to the output file
--log-fd <n> log output to fd, instead of stderr
--pre <command> command to run prior to the measured command
--post <command> command to run after to the measured command
-I, --interval-print <n>
print counts at regular interval in ms (>= 100)
--per-socket aggregate counts per processor socket
--per-core aggregate counts per physical processor core
-D, --delay <n> ms to wait before starting measurement after program start
Example
Example 1 <source lang=bash> $ perf stat ./main
Performance counter stats for './main':
586.620001 task-clock (msec) # 1.000 CPUs utilized
53 context-switches # 0.090 K/sec
1 cpu-migrations # 0.002 K/sec
921 page-faults # 0.002 M/sec
1,817,562,234 cycles # 3.098 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
944,063,599 instructions # 0.52 insns per cycle
244,429,638 branches # 416.675 M/sec
494,951 branch-misses # 0.20% of all branches
0.586831274 seconds time elapsed
</source>
Example 2 <source lang=bash> $ perf stat --repeat 10 -e cycles:u -e instructions:u -e cache-references:u\
-e cache-misses:u -e stalled-cycles-frontend:u -e stalled-cycles-backend:u\
-e ref-cycles:u -e branch-instructions:u -e branch-misses:u ./main
Performance counter stats for './main' (10 runs):
1,494,202,725 cycles:u ( +- 1.83% ) [71.14%]
935,216,574 instructions:u # 0.63 insns per cycle ( +- 0.03% ) [85.62%]
17,506,493 cache-references:u ( +- 0.29% ) [85.69%]
14,663,441 cache-misses:u # 83.760 % of all cache refs ( +- 0.10% ) [71.43%]
<not supported> stalled-cycles-frontend:u
<not supported> stalled-cycles-backend:u
1,275,469,977 ref-cycles:u ( +- 0.65% ) [85.75%]
242,376,389 branch-instructions:u ( +- 0.06% ) [86.18%]
486,334 branch-misses:u # 0.20% of all branches ( +- 0.07% ) [85.86%]
0.609760790 seconds time elapsed ( +- 0.70% )
</source>
perf record
perf record: record events for later reporting
Options
$ perf record -h
usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
--filter <filter>
event filter
-p, --pid <pid> record events on existing process id
-t, --tid <tid> record events on existing thread id
-r, --realtime <n> collect data with this RT SCHED_FIFO priority
-D, --no-delay collect data without buffering
-R, --raw-samples collect raw sample records from all opened counters
-a, --all-cpus system-wide collection from all CPUs
-C, --cpu <cpu> list of cpus to monitor
-c, --count <n> event period to sample
-o, --output <file> output file name
-i, --no-inherit child tasks do not inherit counters
-F, --freq <n> profile at this frequency
-m, --mmap-pages <pages>
number of mmap data pages
--group put the counters into a counter group
-g enables call-graph recording
--call-graph <mode[,dump_size]>
setup and enables call-graph (stack chain/backtrace) recording: fp dwarf
-v, --verbose be more verbose (show counter open errors, etc)
-q, --quiet don't print any message
-s, --stat per thread counts
-d, --data Sample addresses
-T, --timestamp Sample timestamps
-P, --period Sample period
-n, --no-samples don't sample
-N, --no-buildid-cache
do not update the buildid cache
-B, --no-buildid do not collect buildids in perf.data
-G, --cgroup <name> monitor event in cgroup name only
-u, --uid <user> user to profile
-b, --branch-any sample any taken branches
-j, --branch-filter <branch filter mask>
branch stack filter modes
-W, --weight sample by weight (on special events only)
--transaction sample transaction flags (special events only)
--force-per-cpu force the use of per-cpu mmaps
perf report
perf report: break down events by process, function, etc.
perf annotate
perf annotate: annotate assembly or source code with event counts
perf top
perf top: see live event count
perf bench
perf bench: run different kernel microbenchmarks
See also
- https://kldp.org/node/155216 - 최적화 도중에 난제를 만났습니다.(지역성 관련)
- https://perf.wiki.kernel.org/index.php/Main_Page - perf: Linux profiling with performance counters