[devel] Не проприетарные, а суверенные / Apple M1 (Was: I: gcc 11.2.1 && binutils 2.37)

Чт Сен 23 12:20:59 MSK 2021

Кстати вышло описание процессора Apple M1. Там объясняется, как ему
удается исполнять по 8 инструкций за такт. Боюсь что VLIW его не
догонит.

What most code looks like is that it consists of short chains of
sequentially dependent macroinstructions (say 5 to 7
macroinstructions, 10 to 20 instructions long in total) which store
their result to memory or a register, and that memory or register is
not accessed until many (hundreds) of cycles later. This means that
while each sequentially dependent macroinstruction has to execute one
after the other, you can execute many of the chains in parallel...
That sounds good but you need a variety of machinery to track which
instructions are independent of previous instructions, and to track
the program order of instructions so that as branches are resolved as
correct, you know which of the instructions in program order now
resolve as correct.
(This fact is why so many people’s intuition about the value of
superscalarity is so flawed. Most people hone their assembly
optimization skills on long stretches of sequentially dependent
instructions; but such code is actually unrepresentative of most of
what runs on a CPU.
This fact is also why OoO superscalarity works so well, whereas most
attempts to create static wide machines have been problematic. All the
pieces -- out of order, prediction, and superscalarity -- work
synergistically. In particular most of these chains that are running
in parallel come from different basic blocks [ie are separated by some
sort of if() statement that the compiler can’t see past] and so are
impossible to aggregate statically.)
https://drive.google.com/file/d/1WrMYCZMnhsGP4o3H33ioAUKL_bjuJSPt/view