May 22, 2018
Andreas Abel
In this talk, we present the design and implementation of a tool to construct faithful models of the latency, throughput, and port usage of x86 instructions. To this end, we first discuss common notions of instruction throughput and port usage, and introduce a more precise definition of latency that, in contract to previous definitions, considers dependencies between different pairs of input and output operands. We then develop novel algorithms to infer latency, throughput, and port usage based on automatically- generated microbenchmarks and hardware performance counters that are more accurate and precise than existing work. To facilitate the rapid construction of optimizing compilers and tools for performance prediction, the output of our tool is provided in a machine-readable format. We provide experimental results for processors of all generations of Intel’s Core architecture, i.e., from Nehalem to Coffee Lake, and discuss various cases where the output of our tool differs considerably from prior work.