Dyn
view release on metacpan or search on metacpan
dyncall/doc/manual/callconvs/callconv_arm64.tex view on Meta::CPAN
%//////////////////////////////////////////////////////////////////////////////
%
% Copyright (c) 2014-2022 Daniel Adler <dadler@uni-goettingen.de>,
% Tassilo Philipp <tphilipp@potion-studios.com>
%
% Permission to use, copy, modify, and distribute this software for any
% purpose with or without fee is hereby granted, provided that the above
% copyright notice and this permission notice appear in all copies.
%
% THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
% WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
% MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
% ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
% WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
% ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
% OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
%
%//////////////////////////////////////////////////////////////////////////////
\subsection{ARM64 Calling Conventions}
\paragraph{Overview}
ARMv8 introduced the AArch64 calling convention. ARM64 chips can be run in 64 or 32bit mode, but not by the same process. Interworking is only intra-process.\\
The word size is defined to be 32 bits, a dword 64 bits. Note that this is due to historical reasons (terminology didn't change from ARM32).\\
For more details, take a look at the Procedure Call Standard for the ARM 64-bit Architecture \cite{AAPCS64}.\\
\paragraph{\product{dyncall} support}
The \product{dyncall} library supports the ARM 64-bit AArch64 PCS ABI, as well as Apple's and Microsoft's conventions which are derived from it, for both, calls and callbacks.
\subsubsection{AAPCS64 Calling Convention}
\paragraph{Registers and register usage}
ARM64 features thirty-one 64 bit general purpose registers, namely {\bf r0-r30},
which are referred to as either {\bf x0-x30} for 64bit access, or {\bf w0-w30}
for 32bit access (with upper bits either cleared or sign extended on load).\\
Also, there is {\bf sp/xzr/wzr}, a register with restricted use, used for the
stack pointer in instructions dealing with the stack ({\bf sp}) or a hardware
zero register for all other instructions {\bf xzr/wzr}, and {\bf pc}, the
program counter. Additionally, there are thirty-two 128 bit registers {\bf v0-v31},
to be used as SIMD and floating point registers, referred to as {\bf q0-q31}, {\bf d0-d31}
and {\bf s0-s31}, respectively (in contrast to AArch32, those do not overlap multiple
narrower registers), depending on their use:\\
\begin{table}[h]
\begin{tabular*}{0.95\textwidth}{3 B}
Name & Brief description\\
\hline
{\bf x0-x7} & parameters, scratch, return value\\
{\bf x8} & indirect result location pointer\\
{\bf x9-x15} & scratch\\
{\bf x16} & permanent in some cases, can have special function (IP0), see doc\\
{\bf x17} & permanent in some cases, can have special function (IP1), see doc\\
{\bf x18} & reserved as platform register, advised not to be used for handwritten, portable asm, see doc \\
{\bf x19-x28} & permanent\\
{\bf x29} & permanent, frame pointer\\
{\bf x30} & permanent, link register\\
{\bf sp} & permanent, stack pointer\\
{\bf pc} & program counter\\
{\bf v0-v7} & scratch, float parameters, return value\\
{\bf v8-v15} & lower 64 bits are permanent, scratch\\
{\bf v16-v31} & scratch\\
{\bf xzr} & zero register, always zero\\
\end{tabular*}
\caption{Register usage on arm64}
\end{table}
\paragraph{Parameter passing}
\begin{itemize}
\item stack parameter order: right-to-left
\item caller cleans up the stack
\item first 8 integer arguments are passed using x0-x7
\item first 8 floating point arguments are passed using d0-d7
\item subsequent parameters are pushed onto the stack
\item if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first 8 integer
and 8 floating-point registers to a reserved stack area adjacent to the other parameters on the stack (only the unnamed integer parameters require saving, though)
\item aggregates (struct, union) with 1 to 4 identical floating-point members (either float or double) are passed field-by-field (8-byte aligned if passed via stack), except if passed as a vararg
\item other aggregates (struct, union) \textgreater\ 16 bytes in size are passed indirectly, as a pointer to a copy (if needed)
\item {\it non-trivial} C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
\item all other aggregates (struct, union), after rounding up the size to the nearest multiple of 8, are passed as a sequence of dwords, like integers
\item aggregates are never split across registers and stack, so if not enough registers are available an aggregated is passed via the stack (for aggregates that
would've been passed as floating point values, any still unused float registers will be skipped for any subsequent arg)
\item stack is required throughout to be eight-byte aligned
\end{itemize}
\paragraph{Return values}
( run in 2.114 seconds using v1.01-cache-2.11-cpan-5837b0d9d2c )