IMPROVE: Spec BTI, based on feedback

This commit is contained in:
Jean-Claude 2022-08-21 01:07:28 +02:00
parent a46364a2a7
commit 443197381e
Signed by: jeanclaude
GPG Key ID: 8A300F57CBB9F63E
6 changed files with 142 additions and 128 deletions

View File

@ -223,7 +223,7 @@ For Zen 3, \citeauthor{retbleed} could not detect collisions across privilege bo
\subsubsection{Function Return Prediction}
\label{subsubsec:funcReturnPrediction}
This predictor is used for return instructions.
The \techterm{function return predictor}, which may also be referred to as \techterm{return target predictor}, is used for return instructions.
It works by assuming that a function always returns to the place it was called from.
It uses a stack-like cache called \ac{rsb} to support multiple nested function calls.
When encountering a call, the expected return address is pushed to the \ac{rsb}.\footnote{For the x86\_64 \ac{isa}, a function residing at \ac{pc} is expected to return to $\ac{pc} + 4$.}

View File

@ -1,24 +1,27 @@
% !TEX root = ../thesis.tex
\section{Overview}
\retbleed{} causes the return instruction predictor to stop using the \ac{rsb} but to fall back to the \ac{btb}.
By poisoning the \ac{btb}, an attacker can steer the speculative destination of a return instruction executed in kernel space.
This leads to micro-architectural traces, which are inferred using a side-channel attack.
\retbleed{} causes the function return predictor to stop using the \ac{rsb} but to fall back to the \ac{btb}.
By poisoning the \ac{btb}, an attacker can steer the speculative control flow after a return instruction executed in kernel space.
The speculative execution leads to microarchitectural traces, which are inferred using a side-channel attack.
\paragraph{Why \aclp{pf}?}
\label{para:whyPf}
The \ac{ibpb}-on-\ac{pf} mitigation relies on \acs{pf}, raised during the attack.
To understand why this mitigation may be possibly incomplete, we need to understand why \acp{pf} are raised in the first place.
The attacker hijacks the kernel branch by injecting a \ac{btb} entry which gets used by the return instruction predictor.
It will use the destination of the injected entry as the most likely destination of the branch.
The source of the malicious branch is selected to collide with the address of the return instruction that the attacker wants to hijack.
The destination is the address of the gadget.
Due to \ac{smap} and \ac{smep}, the kernel cannot execute arbitrary user space code, and therefore, the gadget itself must reside in kernel space.
The destination is the address of the disclosure gadget.
Due to \ac{smap} and \ac{smep}, the kernel cannot execute arbitrary user space code, and therefore, the disclosure gadget itself must reside in kernel space.
This results in a poisonous branch, where the source resides in the attacker's user space, and the destination is in the kernel space.
I.e., the attacker must do \ac{bti} across privilege boundaries.
As a user is not allowed to jump to an arbitrary location in kernel space, a \ac{pf} is raised, and the \ac{pf} handler takes over to inform the user about the privilege boundary contempt.
As a user is not allowed to jump to an arbitrary location in kernel space, a \ac{pf} is raised, and the \ac{pf} handler takes over to inform the user about the privilege boundary breach.
\paragraph{The Plan.}
As stated in our first research question (\ref{para:rs1}), we want to create a primitive which does the described \ac{bti} without causing a \acp{pf}.
As stated in \rqref{rq1}, we want to create a speculation primitive which does the described \ac{bti} without causing a \ac{pf}.
Our idea is to do speculative \ac{bti}, meaning that the poisonous branch of the speculation primitive is itself executed speculatively.
If speculative \ac{bti} works\footnote{To the best of our knowledge, this has never been explicitly shown before.}, the speculatively executed invalid branch, is picked up by the \ac{btb} without any \acs{pf} being raised.
This leads us to the development of a speculative version of the \cpbti{} PoC\footnote{Also referred to as \ac{utk} \ac{bti} PoC in the \retbleed{} paper.}.
We will discuss the creation of the PoCs for both Intel and AMD.
If speculative \ac{bti} works\footnote{To the best of our knowledge, this has never been explicitly shown before.}, the speculatively executed branch is picked up by the \ac{btb} without raising any \acp{pf}.
This leads us to the development of a speculative version of the \cpbti{} \ac{poc}.
We will discuss the creation of the \acp{poc} for both Intel and AMD in the next section.

View File

@ -2,30 +2,29 @@
\section{Implementation}
\label{sec:implementation}
We use the PoCs provided by \retbleed{} as a basis for our PoCs\footnote{We will consider a slightly modified version of the \retbleed{} PoCs where we have done some simplifications.}.
We use the PoCs provided by \retbleed{} as a basis for our PoCs\footnote{We will consider a slightly modified version of the \retbleed{} PoCs where we have done some simplifications.}.~\cite{retbleedRepo}
Before we develop a speculative version of the \cpbti{} PoC, we want to verify that speculative \ac{bti} works in the same privilege domain.
For that, we modify the \retbti{} PoC to create a version of it, where the \ac{bti} is done speculatively.
We will refer to it as Spec \retbti.
For that, we modify the \retbti{} PoC to create a version of it where the \ac{bti} is done speculatively.
We will refer to it as \specretbti.
If we succeed, we proceed by creating a speculative version of \cpbti{} to see if the speculative \ac{bti} works across privilege domains, as it does with non-speculative \ac{bti}.
\retbleed{} causes the return instruction predictor to fall back to the \ac{btb}.
Both, AMD and Intel microarchitectures fall back to using the \ac{btb} for return instruction in specific causes, but the occasion is different.
\retbleed{} causes a \ac{btb}-to-\ac{btb} fallback of the function return predictor.
AMD and Intel microarchitectures have shown this behavior, but as seen in \autoref{para:exploitRetInstr}, it is caused by different events.
For Intel, this is achieved by underflowing the \ac{rsb}.
This happens if too many return instructions are encountered in a row.
We refer to the mechanism we use to achieve that as \techterm{return cycle}.
Which happens if too many return instructions are encountered in a row.
We refer to the mechanism we use to achieve that in our \ac{poc} as a \techterm{return cycle}.
Besides underflowing the \ac{rsb}, the \techterm{return cycle} has the additional purpose of creating a ``normalized'' branch history, meaning that the \ac{bhb} is set into a known and easily reproducible state.
This is important since the \ac{btb} is indexed using the \ac{bhb}.
Without a similar branch history, the \ac{bpu} will not use the injected entry during speculation.
Normalizing the history is important since the \ac{btb} is indexed using the \ac{bhb}.
The \ac{bpu} will not use the injected \ac{btb} entry, in case the history diverges too much from the one present during the poisoning.
To create the \techterm{return cycle}, a memory address holding a return instruction is repeatedly pushed to the stack.
An initial return instruction starts the recursive cycle.
After all addresses of the return instruction are popped from the stack, it returns to the address pushed, prior to creating the return cycle.
After all addresses of the return instruction are popped from the stack, the control flow returns to the address one the stack, which was pushed prior to creating the cycle.
\autoref{lst:createReturnCycle} shows the used method.
Alternatively, a recursive function call can also create the return cycle.
The length of the return cycle is microarchitecture dependent, but $28$ cycles is optimal for Coffee Lake and $29$ for Coffee Lake Refresh.\cite{retbleed}
The length of the return cycle is microarchitecture dependent, but $28$ cycles are optimal for Coffee Lake and $29$ for Coffee Lake Refresh.~\cite{retbleed}
\begin{lstlisting}[style=CStyle,caption={Creation of a \techterm{return cycle}. The address to which we return after the \techterm{return cycle} is pushed to the stack first. Next, the \techterm{return cycle} is created by repeatedly pushing the address \lstinline+RET_PATH+, where a return instruction is stored, to the stack. An initial return instruction starts the cycle.},label={lst:createReturnCycle}]
\begin{lstlisting}[style=CStyle,caption={Creation of a \techterm{return cycle}. The address \lstinline+cycle\_dst+, to which the control flow return after the \techterm{return cycle}, is pushed to the stack first. Next, the \techterm{return cycle} is created by repeatedly pushing the address \lstinline+RET_PATH+, where a return instruction is stored, to the stack. An initial return instruction starts the cycle.},label={lst:createReturnCycle}]
// Store return instruction to memory location RET_PATH
memcpy((u8*)RET_PATH, "\xc3", 1);
@ -33,7 +32,7 @@ asm(
// Address to which to return after the cycle
"pushq %[cycle_dst]\n\t"
// Crate return cycle of length 30
// Create return cycle of length 30
".rept 30\n\t"
"push $RET_PATH\n\t"
".endr\n\t"
@ -46,7 +45,14 @@ asm(
\end{lstlisting}
%stopzone
AMD CPUs resort back to the \ac{btb} for return instruction in case they collide with a previously encountered indirect branch.
The cycle is also not required for setting up the branch history, as the \ac{btb} does not seem to be indexed using any kind of branch history.\cite{retbleed}
AMD CPUs exhibit the \ac{rsb}-to-\ac{btb} fallback for return instruction in case they collide with the address of a previously encountered indirect branch.
Also, the cycle is also not required for setting up the branch history, as the \ac{btb} does not seem to be indexed using any kind of branch history.~\cite{retbleed}
Instead, the \ac{btb} is indexed using the start and end addresses of the branch, which can be thought of as a ``basic block''.
\todo{Is this correctly explained? Better explanation}
\paragraph{Preview.}
In the subsequent sections, we first discuss the development of \specretbti, followed by \speccpbti.
For both, we first provide more information on the non-speculative version of the \acp{poc}.
We always first describe the \acp{poc} for Intel, followed by the description for AMD.
As common with Spectre \acp{poc}, the \acp{poc} are split into two phases: the \techterm{training phase}, where the \ac{bti} is done, and the \techterm{speculation phase}, where a victim branch gets hijacked.
We do not describe the working and implementation of the convert channel, as we use the flush+reload-based~\cite{flushAndReload} one provided by the \acp{poc} without modification.
Whenever a return cycle is employed, it serves to underflow the \ac{rsb} and normalize the branch history, as discussed in \autoref{sec:implementation}.

View File

@ -2,17 +2,17 @@
\subsection{Speculative \retbti}
\label{subsubsec:specRetBti}
We will discuss the design of the Spec \retbti{} PoC.
This PoC aims to verify that speculative \ac{bti} works in the same privilege domain.
But before working on Spec \retbti{} we will discuss how the plain \retbti{} PoC works.
We will discuss the design of the \specretbti{} \ac{poc}.
This \ac{poc} aims to verify that speculative \ac{bti} works in the same privilege domain.
However, before working on \specretbti, we will discuss how the plain \retbti{} \ac{poc} works.
\paragraph{\retbti{} in detail.}
After the \techterm{return cycle}, which spins on a certain memory location, we get to \verb+BR1+.
After the \techterm{return cycle}, spinning on a particular memory location, we get to \verb+BR1+.
Here, the speculation primitive, a return instruction, is located.
During training, this return brings us to the \techterm{speculation gadget}, stored at \verb+TRAIN+.
However, in the speculation phase, since the \ac{pc} is the same (\verb+BR1+ in both cases) and the history is equivalent, thanks to the return cycle, the indirect branch predictor predicts erroneously that the destination is \verb+TRAIN+ instead of \verb+SPEC+.
This causes the \techterm{speculation gadget} to get execution.
This return brings us to the disclosure gadget, stored at \verb+TRAIN+, during the training phase.
It also does the \ac{bti}.
In the speculation phase, since the \ac{pc} is the same (\verb+BR1+ in both cases) and thanks to the return cycle, the histories are equivalent, the function return predictor, which has fallen back to using the \ac{btb}, predicts erroneously that the destination is \verb+TRAIN+ instead of \verb+SPEC+.
The misprediction causes the disclosure gadget to get executed.
This proceeding is depicted in \autoref{fig:retbti}.
\begin{figure}[ht]
@ -23,44 +23,65 @@ This proceeding is depicted in \autoref{fig:retbti}.
\end{subfigure}
\begin{subfigure}{\textwidth}
\incfig{../figures/ret_bti_spec}
\caption{When the control flow reaches \lstinline+BR1+ in the speculation phase the \acs{pc} and the branch history are equivalent to the ones of the training phase. Therefore, the return instruction predictor will use the \acs{btb} entry leading to \lstinline+TRAIN+, which was injected in the training phase. The speculation is indicated in red.}
\caption{When the control flow reaches \lstinline+BR1+ in the speculation phase, the \acs{pc} and the branch history are equivalent to the ones of the training phase. Therefore, the return instruction predictor will use the injected \acs{btb} entry, steering the speculative control flow to \lstinline+TRAIN+. The speculation is indicated in red.}
\label{fig:retbtiSpec}
\end{subfigure}
\caption{Control flow of the \retbti{} PoC for Intel. The training phase poisoning the \ac{bpu} is depicted in (a). This causes the victim to mispredict into the gadget, as visible in (b). A \techterm{return cycle} is used in both phases to underflow the \ac{rsb} and normalize branch history.}
\caption{Control flow of the \retbti{} \ac{poc} for Intel. The training phase, poisoning the \ac{bpu}, is depicted in (a). The \ac{bti} causes the victim to mispredict into the disclosure gadget, as visible in (b). A return cycle is used in both phases to underflow the \ac{rsb} and normalize branch history.}
\label{fig:retbti}
\end{figure}
As discussed in \autoref{sec:implementation}, no return cycle is needed for AMD as the \ac{rsb} fallback is caused by an address collision, and the branch history is also not needed for indexing the \ac{btb}.
Therefore, the primitive of Intel is adapted as follows.
The return cycles are removed, and for the training phase, we replace the return instruction with an indirect jump from \verb+BR1+ to \verb+TRAIN+.
As discussed in \autoref{sec:implementation}, no return cycle is needed for AMD as an address collision of the return instruction with a previously encountered indirect branch is sufficient for causing the \ac{rsb} fallback.
Since the \ac{btb} is not indexed using the branch history, it is also unnecessary to normalize it.
The primitives of Intel are adapted as follows.
The return cycles are removed for both phases.
For the training phase, the poisonous return instruction is replaced by an indirect jump.
The source and target remain the same.
Namely, it goes from \verb+BR1+ to \verb+TRAIN+.
These changes are already sufficient to poison the \ac{btb} and hijack any return instruction located at \verb+BR2+, which collides with \verb+BR1+.
\ask{Also add control flow schematics for AMD?}
\paragraph{Spec \retbti.}
For the speculative version, we want that the return from \verb+BR1+ to \verb+TRAIN+ is not executed architecturally but only speculatively.
\paragraph{\specretbti.}
For the speculative version, we want that the return from \verb+BR1+ to \verb+TRAIN+ is not executed architecturally but speculatively.
As in the non-speculative version, the return cycle brings us to \verb+BR1+.
To cause a speculation window during which the primitive can be executed, we exploit the return instruction predictor using SpectreRSB\cite{spectreRsb}, as explained in \autoref{subsubsec:spectreRsb}.
From \verb+BR1+ we call the ``rogue'' function located at \verb+ROGUE+.
That function changes the architectural return address such that it returns to \verb+END+ instead of \verb+BR1+.
To cause a speculation window during which the primitive can be executed, we exploit the return target predictor using SpectreRSB~\cite{spectreRsb}, as explained in \autoref{subsubsec:spectreRsb}.
From \verb+BR1+, we call the ``rogue'' function located at \verb+ROGUE+.
That function changes the architectural return address such that it returns to \verb+END+ instead of \verb+BR1+.
To do that, we increment the \verb+rsp+ by $8$ to let it skip the actual return address and point it to a location that we have pushed to the stack before calling \verb+ROGUE+.
As discussed in \autoref{subsubsec:funcReturnPrediction}, the return instruction predictor will predict that the function returns to \verb+BR1+ and, therefore, speculatively execute the indirect jump, which does the poisoning.
To make the speculation window large enough for the indirect jump to execute fully, we use the \verb+clflush+ instruction to remove the actual return address, appointed by the \verb+rsp+, from all layers of the cache.
\begin{figure}[ht]
\begin{subfigure}{\textwidth}
\incfig[0.75]{../figures/ret_spec_bti_train}
\caption{Depicts the training phase. A rogue function, located at \lstinline+ROGUE+, causes a speculation window. It allows the speculative execution of the poisonous return from \lstinline+BR1+ to \lstinline+TRAIN+.}
\label{fig:specRetbtiTrain}
\end{subfigure}
\begin{subfigure}{\textwidth}
\incfig[0.75]{../figures/ret_spec_bti_spec}
\caption{The rogue function is also executed during the speculation phase, to ensure that histories are equivalent. However, in contrast to the training phase, no speculation window is created. After returning from the rogue function, the return instruction is mispredicted, using the injected entry pointing to \lstinline+TRAIN+.}
\label{fig:specRetbtiSpec}
\end{subfigure}
\caption{Control flow of the \specretbti{} PoC for Intel. During the training phase, depicted in (a), a speculatively executed return poisons the \ac{btb}. This leads to the hijacking of a return instruction, as visible in (b). Speculatively executed branches are indicated in red, while the architectural branches are black.}
\label{fig:specRetbti}
\end{figure}
To use SpectreRSB, the rogue function increment the \verb+%rsp+ by $8$ to let it skip the actual return address and point it to a location that we have pushed to the stack before calling \verb+ROGUE+.
As discussed in \autoref{subsubsec:funcReturnPrediction}, the return target predictor predicts that the function returns to \verb+BR1+ and, therefore, speculatively executes the return instruction.
This return instruction does the \ac{bti}.
To make the speculation window large enough for the return to execute fully, we use the \verb+clflush+ instruction to remove the actual return address, appointed by the \verb+%sp+, from all cache layers.
This way, the CPU has to fetch it from memory, incurring a more significant delay.
Three things must be fulfilled to make the indirect jump at \verb+BR1+ speculatively take the branch to \verb+TRAIN+, during the speculation phase.
Three things must be fulfilled to make the indirect jump at \verb+BR1+ speculatively take the branch to \verb+TRAIN+ during the speculation phase.
First and foremost, the branch to \verb+TRAIN+ must have been injected, which we have done in the training phase.
Secondly, the \ac{pc} of the return instruction which we want to hijack must be the same as during training.
Secondly, the \ac{pc} of the return instruction that we want to hijack must be the same as during training.
Lastly, the branch history at the victim and attacker branch must be the same.
To ensure matching histories, during the training phase we must retain the rogue function, however, without it modifying the stack and causing any speculation, and importantly, without it introducing any branches.
We were able to implement that part oft the \techterm{speculation primitive} using the \verb+cmove+ instruction.
It allows us to modify the stack depending on whether we are in the training or speculation phase, indicated by the state of some register.
The rogue function is displayed in \autoref{lst:roguePrimitive} and the control flow of both, training and speculation phase are in \autoref{fig:retbti}.
When returning to \verb+BR1+ from the rogue function, the \ac{pc} and \ac{bhb} are equal to the ones of the training phase, and therefore, the predictor will use the malicious entry to erroneously guide the indirect jump to the gadget at \verb+TRAIN+.
To ensure matching histories, we must retain the rogue function during the training phase.
However, it must not cause any speculation or introduce new branches.
We implemented that part of the speculation primitive using the \verb+cmove+ instruction.
It allows us to modify the stack depending on whether we are in the training or speculation phase, which is indicated by the state of some register.
The rogue function is displayed in \autoref{lst:roguePrimitive}, and the control flow of both, training and speculation phase, are depicted in \autoref{fig:specRetbti}.
\begin{lstlisting}[style=CStyle,caption={Rogue function causing a speculation window by employing SpectreRSB\cite{spectreRsb}. This function must be executed in both phases to maintain a consistent branch history. Depending on the state of \lstinline+rsi+ the \lstinline+cmove+ instruction increments the \lstinline+rsp+ by $8$, skipping the actual return address. The speculation window is enlarged by using the \lstinline+clflush+ instruction.},label={lst:roguePrimitive}]
When returning to \verb+BR1+ from the rogue function, the \ac{pc} and \ac{bhb} are equal to the ones of the training phase.
Therefore, the predictor will use the malicious entry to guide the speculative control flow of the return instruction to the gadget at \verb+TRAIN+.
\begin{lstlisting}[style=CStyle,caption={Rogue function, causing a speculation window by employing SpectreRSB. This function must be executed in both phases to maintain a consistent branch history. Depending on the state of \lstinline+\%rsi+ the \lstinline+cmove+ instruction increments the \lstinline+\%rsp+ by $8$, skipping the actual return address. The speculation window is enlarged by using the \lstinline+clflush+ instruction.},label={lst:roguePrimitive}]
asm(
".align 0x80000\n\t"
"rogue_spec_dst:\n\t"
@ -71,7 +92,7 @@ asm(
"rogue_gadg_dst:\n\t"
// If %rsi = 1: add 8 to rsp => cause speculation
// If %rsi = 0: do othing
// If %rsi = 0: do nothing
"lfence\n\t"
"movq %rsp, %rdx\n\t"
"addq $0x8, %rdx\n\t"
@ -83,25 +104,8 @@ asm(
);
\end{lstlisting}
\begin{figure}[ht]
\begin{subfigure}{\textwidth}
\incfig[0.75]{../figures/ret_spec_bti_train}
\caption{Depicts the training phase. A rogue function, located at \lstinline+ROGUE+, causes a speculation window. It allows the speculative execution of the poisonous return from \lstinline+BR1+ to \lstinline+TRAIN+.}
\label{fig:specRetbtiTrain}
\end{subfigure}
\begin{subfigure}{\textwidth}
\incfig[0.75]{../figures/ret_spec_bti_spec}
\caption{To ensure that histories are equivalent, the rogue function is also executed during the speculation phase. However, in contrast to the training phase, no speculation window is created. After returning from the rogue function, the return instruction is mispredicted using the injected entry pointing to \lstinline+TRAIN+.}
\label{fig:specRetbtiSpec}
\end{subfigure}
\caption{Control flow of the Spec \retbti{} PoC for Intel. During the training phase, depicted in (a), a speculatively executed return poisons the \ac{btb}. The speculative \ac{bti} leads to the hijacking of a return instruction, as visible in (b). Speculatively executed branches are indicated in red, while the architectural branches are black.}
\label{fig:specRetbti}
\end{figure}
Adapting \retbti{} to \specretbti{} is more straightforward for AMD than for Intel, as the branch history is irrelevant for indexing into the \ac{btb}.
While the speculation phase of \specretbti{} is equivalent to the one of \retbti, a rogue function is added to the training phase right before the poisonous indirect branch.
This rogue function causes unconditional speculation using SpectreRSB~\cite{spectreRsb}.
Adapting \retbti{} PoC for AMD is more straightforward than for Intel, as the branch history is not relevant for indexing into the \ac{btb}.
While the speculation phase of Spec \retbti{} remains unchanged, we add a rogue function to the training phase.
The function differs from the one for Intel in that it does not conditionally overwrite the \verb+rsb+, but speculation is cased every time.
\ask{Also add control flow schematics for AMD?}
We comment on the results of this PoC later in \autoref{sec:evaluation} and \ref{sec:discussion}.
We comment on the results of these \acp{poc} later in \autoref{sec:evaluation} and \ref{sec:discussion}.

View File

@ -2,81 +2,82 @@
\subsection{Speculative \cpbti}
\label{subsubsec:specCpBti}
The development of the speculative version of \cpbti{} is of main interest to us.
It shows if Spec \ac{bti} works across privilege boundaries and, therefore, demonstrates if \retbleed's primitives can be implemented without raining any \acp{pf}.
A successful implementation of \speccpbti{} implies a positive answer to \rqref{rq1}.
The development of the speculative version of \cpbti{} is of central interest to us.
It shows if Spec \ac{bti} works across privilege boundaries and therefore demonstrates if \retbleed's primitives can be implemented without raining any \acp{pf}.
A successful implementation of Spec \cpbti{} implies a positive answer to our first research question (\ref{para:rs1}).
These PoCs consist of a user space program and a kernel module.
These \acp{poc} consist of a user space program and a kernel module.
The user space program is the attacker who poisons the \ac{btb} across privilege boundaries to hijack a return instruction executed by the kernel.
The kernel module provides the \techterm{speculation primitive} and \techterm{speculation gadget} at predetermined and optimal\footnote{In the way that we can easily find a colliding branch.} locations.
The kernel module provides the speculation primitive and disclosure gadget at predetermined and optimal\footnote{In the way that we can easily find a colliding branch.} locations.
Before discussing the speculative variant, we will look at the plain \cpbti{} PoC to set a foundation on which we can build.
Before discussing the speculative variant, we will look at the plain \cpbti{} \ac{poc} to set a foundation on which we can build.
\paragraph{\cpbti{} in detail.}
Similarly to the \retbti{}, a return cycle is set up to spin on \verb+KBR_SRC'+ and returns to \verb+KBR_DST+ after the final return instruction.
\verb+KBR_DST+ is the location of the speculation gadget, and it is stored in the kernel space.
Since a jump to an arbitrary location in kernel space is prohibited, the \ac{pf} handler takes over, raises a \ac{pf} and informs the user.
A return cycle is set up to spin on \verb+KBR_SRC'+.
The final return leads to \verb+KBR_DST+.
\verb+KBR_DST+ is the location of the disclosure gadget stored in the kernel space.
Since a jump to an arbitrary location in kernel space is prohibited, the \ac{pf} handler takes over, raises a \ac{pf}, and informs the user.
Even if the branch to the gadget was unsuccessful, it has still influenced the \ac{btb} by injecting an entry.
\autoref{fig:cpBtiTrain} shows the described training phase, while \autoref{fig:cpBtiSpec} depicts the speculation phase, which we will discuss next.
To make use of the poisoned \ac{btb}, control is handed over to the kernel, which starts executing the \techterm{return cycle} located at \verb+KBR_SRC+.
Similarly, as with the \retbti{} PoC for AMD, the source of the victim and attacker branch are different (as they lie in different address spaces).
To make the hijacking work, \verb+KBR_SRC'+ is selected so that it collides with \verb+KBR_SRC+.
Therefore, as the \acp{pc} collide and the histories are similar, the \ac{bpu} will use the malicious entry for its prediction, guiding the control flow to \verb+KBR_DST+.
The version for AMD only differs in the way that no return cycle is used.
\autoref{fig:cpBtiTrain} shows the training phase which we have just described.
The speculation phase, which we will discuss next, is depicted in \autoref{fig:cpBtiSpec}.
\begin{figure}[ht]
\begin{subfigure}{\textwidth}
\incfig[0.75]{../figures/cp_bti_train}
\caption{Depicts the training phase. While a jump from userspace to kernel space is forbidden and caught by the \ac{pf} handler, indicated in blue, is still recorded by the \ac{btb}. This way, a branch from \lstinline+KBR_SRC'+ to \lstinline+KBR_DST+ is injected.}
\caption{Depicts the training phase. While a jump from userspace to kernel space is forbidden and caught by the \ac{pf} handler, indicated in blue, it is still recorded by the \ac{btb}. This way, a branch from \lstinline+KBR_SRC'+ to \lstinline+KBR_DST+ is injected.}
\label{fig:cpBtiTrain}
\end{subfigure}
\begin{subfigure}{\textwidth}
\incfig[0.75]{../figures/cp_bti_spec}
\caption{Depicts the speculation phase. The injected branch, whose source \lstinline+KBR_SRC'+ is selected in such a way that it collides with \lstinline+KBR_SRC+, leads to a mispeculation. It works even when injected from a different privilege domain. The speculation is indicated in red.}
\caption{Depicts the speculation phase. The injected branch, whose source \lstinline+KBR_SRC'+ is selected to collide with \lstinline+KBR_SRC+, leads to a misprediction. It works even when injected from a different privilege domain. The speculation is indicated in red.}
\label{fig:cpBtiSpec}
\end{subfigure}
\caption{Control flow of \cpbti{} for Intel. A jump to an arbitrary kernel address from userspace results in a \ac{pf}, as shown in (a). However, it is still taken up by the \ac{btb} and can lead to misprediction across privilege boundaries, as shown in (b). \lstinline+KBR_SRC'+ is selected such that it collides with \lstinline+KBR_SRC+.}
\label{fig:cpBti}
\end{figure}
\paragraph{Spec \cpbti.}
Similarly, as for Spec \retbti{}, we want to modify the \cpbti{} PoC, such that the \ac{bti} is done speculatively.
In contrast to the non-speculative version, the final return from the return cycle does not do the injection itself, but it brings us to \verb+KBR_SRC'+.
\verb+KBR_SRC'+ is selected to collide with \verb+KBR_SRC+ which is the address of the victim branch.
As the branch from \verb+KBR_SRC'+ to \verb+KBR_DST+ should be executed speculative, we create a speculation window.
This is done by the function at \verb+ROGUE+ which employs SpectreRSB\cite{spectreRsb} to exploit the return instruction predictor and cause a speculative return to \verb+KBR_SRC'+.
To make use of the poisoned \ac{btb} entry, control is handed over to the kernel.
It executes the \techterm{return cycle} located at \verb+KBR_SRC+.
Similarly, as with the \retbti{} \ac{poc} for AMD, the source of the victim and attacker branch are different, as they lie in different address spaces.
As with the mentioned AMD \ac{poc}, \verb+KBR_SRC'+ is selected so that it collides with \verb+KBR_SRC+.
Therefore, as the \acp{pc} collide and the histories are equivalent\footnote{Even if the return cycle spins on colliding addresses, it hard to say if the histories are actually equivalent or just ``similar enough'' to make the prediction work. We will comment on that matter in \autoref{sec:discussion}.\todo{actually do that}}, the \ac{bpu} will use the malicious entry for its prediction, guiding the speculative control flow to \verb+KBR_DST+.
The version for AMD only differs in the way that no return cycles are used.
\paragraph{\speccpbti.}
Similarly, as for \specretbti{}, we want to modify the \cpbti{} \ac{poc} such that the \ac{bti} is done speculatively.
In contrast to the non-speculative version, the final return from the return cycle does not do the injection itself but brings us to \verb+KBR_SRC'+.
\verb+KBR_SRC'+ is selected to collide with \verb+KBR_SRC+, which is the address of the victim branch.
As the branch from \verb+KBR_SRC'+ to \verb+KBR_DST+ should be executed speculatively, we create a speculation window using the rogue function located at \verb+ROGUE+, as we did for \specretbti.
This function causes the speculative control flow to be steered to \verb+KBR_SRC'+.
From here, a return instruction injects the \ac{btb} entry with a target of \verb+KBR_DST+.
While this branch is invalid as it crosses privilege boundaries, since executed speculatively, \textbf{no \ac{pf} is raised}, but a \ac{btb} entry is strill created.
This process is visualized in \autoref{fig:specCpBtiTrain}.
The rogue function is simpler for Spec \cpbti{} than for Spec \retbti{} as the speculation window is created in all cases.
After having seen how the \ac{btb} can be poisoned from user space without causing any \acp{pf}, we will discuss how the injected \ac{btb} entry can impact the control flow of the kernel module.
After switching to the kernel module and executing the return cycle, \verb+KBR_DST+ is reached.
To make the branch history similar to the one of the training phase, we need to mimic the branches introduced by the rogue function.
To achieve this, we introduce a ``dummy'' function at \verb+FAKE_ROGUE+ which does nothing but return back to \verb+KBR_SRC+.
Here, we encounter the victim return.
Since the return instruction predictor has fallen back to the \ac{btb}, the \ac{pc} collides with the source of the injected branch, and the history is similar to the one during training, the injected \ac{btb} entry will be used for the branch prediction.
Therefore, the gadget stored at \verb+KBR_DST+ is executed speculatively.
The training phase is visible in \autoref{fig:specCpBtiSpec}.
We have developed a Spec \cpbti{} version for AMD too.
It is equivalent to the version of Intel, with the return cycle and fake rogue function removed.
While this branch is invalid as it crosses privilege boundaries, since executed speculatively, \textbf{no \ac{pf} is raised} while the injection still works.
The rogue function is equivalent to the one for the \specretbti{} \ac{poc} for AMD, where the speculation window is created unconditionally.
This training process is visualized in \autoref{fig:specCpBtiTrain}.
\begin{figure}[ht]
\begin{subfigure}{\textwidth}
\incfig[0.75]{../figures/cp_spec_bti_train}
\caption{Depicts the training phase. The rogue function \lstinline+ROGUE+ causes a speculation window. It allows the speculative execution of the poisonous return from \lstinline+KBR_SRC'+ to \lstinline+KBR_DST+. While this branch crosses privilege boundaries, no \ac{pf} is raised due to the speculative execution.}
\caption{Depicts the training phase. The rogue function at \lstinline+ROGUE+ causes a speculation window. It allows for the speculative execution of the poisonous return from \lstinline+KBR_SRC'+ to \lstinline+KBR_DST+. While this branch crosses privilege boundaries, no \ac{pf} is raised due to being executed speculatively.}
\label{fig:specCpBtiTrain}
\end{subfigure}
\begin{subfigure}{\textwidth}
\incfig{../figures/cp_spec_bti_spec}
\caption{To make the branch history of the speculation phase similar to the one of the training phase, a ``dummy'' function mimics the branches of the rogue function. The injected \ac{btb} entry is used to predict the target of the subsequent return, causing a speculative execution of \lstinline+KBR_DST+.}
\caption{To make the branch history of the speculation phase similar to the one of the training phase, a ``dummy'' function mimics the branches introduced by the rogue function. The injected \ac{btb} entry is used to predict the target of the subsequent return, causing a speculative execution of the code residing at \lstinline+KBR_DST+.}
\label{fig:specCpBtiSpec}
\end{subfigure}
\caption{Control flow of \cpbti{} for Intel. In the training phase, shown in (a), speculative \ac{bti} is employed to hijack a return instruction executed in kernel space, shown in (b). Address \lstinline+KBR_SRC'+ is selected such that it collides with \lstinline+KBR_SRC+. Architectural branches are black, while speculatively executed ones are drawn in red.}
\caption{Control flow of \speccpbti{} for Intel. In the training phase, shown in (a), speculative \ac{bti} is employed to hijack a return instruction executed in kernel space, shown in (b). As the injection is done speculatively, no \ac{pf} is raised. Address \lstinline+KBR_SRC'+ is selected such that it collides with \lstinline+KBR_SRC+. Architectural branches are black, while speculatively executed ones are drawn in red.}
\label{fig:specCpBti}
\end{figure}
After seeing how the \ac{btb} can be poisoned from user space without causing any \acp{pf}, we will discuss how the injected \ac{btb} entry can impact the control flow of the kernel module.
After switching to the kernel module and executing the return cycle, \verb+KBR_DST+ is reached.
To make the branch history similar to the one of the training phase, we need to mimic the branches introduced by the rogue function.
This is done by the ``dummy'' function at \verb+FAKE_ROGUE+, which does nothing but returning back to \verb+KBR_SRC+.
Here, we encounter the victim return.
Since the return target predictor has fallen back to the \ac{btb}, the \ac{pc} collides with the source of the injected branch, and the history is similar to the one during training, the injected \ac{btb} entry will be used for the branch prediction resolution.
Therefore, the gadget stored at \verb+KBR_DST+ is executed speculatively.
The training phase is shown in \autoref{fig:specCpBtiSpec}.
We have developed a \speccpbti{} version for AMD too.
It is equivalent to the version of Intel, with the return cycles and the fake rogue function removed.

View File

@ -1,6 +1,6 @@
\newcommand{\retbleed}{\textsc{Retbleed}}
\newcommand{\retbti}{\textsc{Ret}-\acs{bti}}
\newcommand{\specretbti}{Spec \textsc{Ret}-\acs{bti}}
\newcommand{\specretbti}{Spec~\textsc{Ret}-\acs{bti}}
\newcommand{\cpbti}{\acs{cp}-\acs{bti}}
\newcommand{\speccpbti}{Spec \acs{cp}-\acs{bti}}
\newcommand{\speccpbti}{Spec~\acs{cp}-\acs{bti}}
\newcommand{\techterm}[1]{\textsl{#1}}