Theory of quantum system certification: a tutorial

The precise control of complex quantum systems promises numerous technological applications including digital quantum computing. The complexity of such devices renders the certification of their correct functioning a challenge. To address this challenge, numerous methods were developed in the last decade. In this tutorial, we explain prominent protocols for certifying the physical layer of quantum devices described by quantum states and processes. Such protocols are particularly important in the development of near-term devices. Specifically, we discuss methods of direct quantum state certification, direct fidelity estimation, shadow fidelity estimation, direct quantum process certification, randomized benchmarking and cross-entropy benchmarking. Moreover, we provide an introduction to powerful mathematical methods, which are widely used in quantum information theory, in order to derive theoretical guarantees for the protocols.

We are witnessing rapid progress in the experimental abilities to manipulate physical systems in their inner quantum properties such as state superposition and entanglement. Most importantly, we begin to have precise control over complex quantum systems on scales that are out of reach of simulations on the existing even mostpowerful classical computing devices. Harnessing their computational power promises the development of digital quantum computers that solve important problems much faster than any classical computer. Envisioned applications also include, e.g., the study of complex phases of matter in analogue simulations and cryptographically secure communication [1]. Hence, quantum technology promises highly useful devices with diverse domains of application ranging from fundamental research to commercial businesses.
With the advent of these novel technologies comes the necessity for certifying their correct functioning. The certification of quantum devices is a particularly daunting task in the interesting regime of high complexity as the most straight-forward strategy of predicting the device's outcome on a classical computer and compare is bound to fail. Indeed, predicting the behaviour of complex quantum devices quickly exhausts the available classical computing power. Ironically, it is the same complexity that makes quantum technology powerful that hinders their certification. This challenging prospective has already motivated extensive effort in developing certification tools for quantum devices in the last decades.
Intriguingly, many different fields within the quantum sciences have tackled the problem of certification from a variety of different perspectives and have developed a large landscape of different protocols. Theses protocols operate under very distinct assumptions and resource requirements that are well-motivated by the different perspectives. For example, certifying the correct function of a small-scale quantum device used in basic research allows to invest sizeable effort, might aim at a highly discriminative certificate providing plenty of information, and can rely on a precise model of the physics of the device. A very different example is the certification of a server, correctly performing a quantum computation, by a remote client with standard desktop hardware. Such a protocol should be light-weight on the client-side and not rely on a detailed model of the server.
An attempt at a panoramic overview of the many different approaches that all fall within the field of quantum certification was recently conducted in Ref. [2]. Therein, a very general classification framework for quantum certification protocols was proposed that is abstract enough to capture their wide range. Let us start by sketching the general framework. This will subsequently allow us to precisely locate the much narrower scope of the present tutorial.

A. Anatomy of quantum certification protocols
A certification protocol is a set of instructions that outputs either 'accept' or 'reject' concerning the hypothesis that the device is functioning correctly, with a certain level of confidence.
The correct functioning of a device is defined in terms of a measure of quality. The measures of qualities range from rigorous worst-case discrimination of 'fundamental' physical objects, that model the device, to performance benchmarks defined in terms of tasks directly on the application layer of the device. Note that in principle a measure of quality can be solely defined in terms of a protocol that reproducibly measures it. On the other hand, measures of quality that directly aim at the deviation of physical objects modelling the function of the device can provide an understanding of the device that is highly attractive in the development of the technology.
In this tutorial, we will encounter a couple of such physically-motivated measures of quality and study their mathematical properties and operational interpretations.
The measures of quality that we will study will all map to the real-line. The certification protocol will then provide an -certificate that accepts or rejects the hypothesis of the measure of quality being smaller than a given . For this reason, most protocols that we present are estimation protocols for specific measures of quality that can be easily turned into -certification protocols by a standard method.
Theoretically, it is convenient to describe the protocol as involving three distinct objects, Fig. 1 (left): First, the device that is under scrutiny. Ideally, we try to be fairly conservative in the model and assumptions describing the device. Second, the protocol employs a measurement apparatus. The measurement apparatus, also a quantum device, is typically assumed to be much more precisely characterised compared to the device. Note that the device and measurement apparatus are not necessarily physically distinct devices. Choosing the split might be ambiguous and yield different formulation of the assumptions of the protocol. An extreme example are deviceindependent certification protocols that regard all quantum parts as a single device that is not subjected to any assumptions. In particular, they do not involve an anyhow characterized separate quantum measurement apparatus. The third object, is the classical processor, a classical computing device, that might take care of potentially required pre-and post-processing tasks for the device control and the processing of the output data to arrive at a certificate or even communicates with the device and measurement apparatus in multiple rounds of an interactive protocol.
The landscape of protocols can be roughly organized according to three 'axes': The first axis comprises the set of assumptions that are imposed on the device and measurement apparatus to guarantee the functioning of the protocol.
A second axis summarizes the complexity of the re- sources that the protocol consumes. Each protocol will require a certain number of different measurement settings, its measurement complexity, that each require the implementation of measurements that involve a certain quantum measurement complexity, to arrive at statistical estimates it might require a total number of repetitions of device invocations referred to as the sample complexity. Furthermore, as we have already highlighted at the beginning, a particularly important figure of merit for a protocol is that it comes with practically manageable demands in space and time for the classical processing tasks, its classical processing complexity. For our present scope, the mentioned complexity categories are the most important and will be in the focus of our discussion. Note however that this list is by far not complete, for example interactive protocols might be compared in terms of challenging demands in the timing of the device's control.
The third and final axis is the information gain of the protocol. At a first glimpse this might come as a surprise as a protocol that outputs 'accept' or 'reject' might be regarded as always providing one bit of information. But different measures of quality have different discriminatory power among the hypothesis class that models the device compatible with the protocol's set of assumptions. For example, let us imagine a device preparing quantum states on demand. We might require the device to produce a quantum state that is -close in some distance measure to a specific target state. An alternative specification of the device might require it to always output the same quantum state but this quantum state should only be within a specified set of quantum states. In this situation, we can roughly say that the information gain restricting the device (within its allowed hypothesis class) is higher in the first specification compared to the latter one.
Concomitant with less information gain, it is conceivable that one can design a protocol for the latter specification with significantly less complexity compared to the first specification. Analysing the information gain in performing a certification task often allows one to derive lower bounds on the complexity of any protocol for this task. Beside the discriminatory power of the measure of quality, other intermediate steps in the certification protocol can reveal significantly more information about the device than is ultimately reflected in the measure of quality and the final certificate. For example, a potential certification protocol for our device that prepares quantum states might perform a high-precision, complete tomographic reconstruction of the quantum state and subsequently calculate the measure of quality using the tomographic estimate together with its error bounds. Conceptually, this example illustrates that certification is a subtask of the broader task of quantum system characterisation, that encompasses protocols aiming at different types of information about a quantum system, e.g. identification of a quantum system or testing for a specific property. Protocols that perform quantum system identification or property estimation naturally also give rise to certification protocols. Note that in practice, the hidden information gain of a certification protocol can provide valuable information to calibrate and improve the device.
Another related task in quantum system characterisation is the benchmarking of quantum devices. Benchmarking aims at comparing the performance of multiple devices. This can be done by comparing the achievable -value of -certificates of the respective devices. Benchmarking especially provides pragmatic impetus towards measures of quality that are not directly interpretable on the physical layer. Instead for the benchmarking of quantum devices it suffices to implicitly define a reproducible performance measure directly in terms of a protocol that estimates it as long as the measure is expected to be correlated with the performance in practically relevant tasks.
B. Quantum certification for near-term devices -Scalable certification of the physical-layer For this tutorial, we selected protocols that are particularly important for the certification of near-term quantum devices. Current and near-term quantum devices are still expected to be fairly noisy and of intermediate size, so-called noisy and intermediate scale quantum (NISQ) devices [3]. On the one hand, NISQ devices are in a regime of complexity where prominent certification methods that use full tomographic characterisation become practically infeasible. On the other hand, there is still a large technological leap required in order to arrive at truly scalable devices, e.g. implementing fault-tolerant quantum computing. Such a full-fledged quantum device is described using multiple layers of abstraction from the physical layer over, e.g. physical and logical gate layers, to an application layer, see Figure 1 (right). When a device already comes with multiple layers of abstraction one can also certify the functioning on the higher levels. NISQ devices however only allow for a bit of abstraction above the physical layer. For this reason, near-term quantum devices pose the need for certification techniques that aim at the physical layer but are scalable to the intermediate system sizes of NISQ devices. Such scalable certification methods for the physical layer are the focus of this tutorial.
In the long term, for complex quantum devices highlevel certification on the application level, also referred to as verification, will become increasingly important. Ref. [4] provides a review of existing approaches for verifying quantum computations on devices that are close to being able to accurately perform a universal set of operations. Nonetheless, also in the long run, scalable certification of the physical layer remains important for the diagnostic of the components of more complex quantum devices in the development and during run-time.
We model the physical-layer generically in terms of quantum states and processes throughout the tutorial. The model is general enough to capture different types of quantum devices used, e.g. in quantum communication networks and analogue simulators. Nonetheless, in this tutorial we will take the certification of digital quantum computing devices as our main guiding problem. Especially, the last two methods that we discuss, randomized benchmarking (RB) and cross-entropy benchmarking (XEB), are specifically designed for digital quantum computing devices. RB aims at estimating the physical noise that compromises a gate layer. XEB aims at certifying the generation of samples from a probability distribution encoded in a quantum circuit. As such XEB can be regarded as a certification for the application layer of a digital quantum computing device. But the application is deliberately designed very close to the physical layer.
In addition, we have chosen a set of protocols that can be presented and analysed using a common set of mathematical methods. This allows us to combine our presentation of the certification protocol with a detailed introduction into the mathematical formalism that is required in order to prove rigorous performance guarantees for the protocols.
Lastly, we restrict our focus to certification protocols that employ measures of quality that are close to being natural measures of distance on the very fundamental physical description of the devices as quantum states and quantum processes. Also important and equally fundamental, but not captured in this tutorial, is the certification of specific properties such as entanglement or nonclassicality. Certain distinct properties, e.g. sufficiently high entanglement, allow for the certification of specific quantum states and processes even device-independently. This class of so-called self-testing protocols is reviewed in Ref. [5].
One of the most intriguing aspects of the field of quantum certification is definitely the impressive stretch over multiple disciplines that come into play. Quantum certification is equally a field in applied mathematics, theoretical computer science, applied numerical computer science, experimental physics and quantum hardware and software engineering. It comprises proofs of theorems, classical numerical studies of actual implementations, and performing the protocol in an actual quantum experiment including a diligent analysis of 'real-world' data. Each of the disciplines involved comes with its own methods accustomed to the arising challenges. At the same time, looking at certification on different stages from theory to experiment holds valuable lessons that go in both directions. Having said this, we present a practically wellmotivated but theoretical formal framework for a set of quantum certification protocols. We do not delve into the exciting world of numerical and experimental implementations of the certification protocols that bring our model assumptions to the harsh scrutiny of 'real-world' physics. Instead practical considerations and desiderata will constantly serve as our motivation and inform our discussion.

C. Structure and overview of the tutorial
The tutorial is split up in two major subsequent parts: The first part focusing on certification protocols for quantum states, Section II, and the second part focusing on certification protocols for quantum processes, Section III. Furthermore, the tutorial consists of two different types of chapters: Chapters that introduce the mathematical preliminaries, and chapters that present and analyse the certification protocols. We try to bring these two types of chapters in an dialogue that goes back and forth between providing the motivation and tools for understanding the mathematical framework and protocols. The chapters on certification protocols conclude with suggestions for further reading on variants and extensions of the protocol and its theoretical analysis.
We would like to highlight that the mathematical methods are core foundations of the broad field of theoretical quantum information and are by far not limited to quantum certification or even quantum characterisation in their applications. Quite on the contrary, we expect the mathematical introductory chapters to serve as a valuable resource for students and researches working on quantum information in general. At the same time experts in quantum information mainly interested in the presented certification methods might want to simply skip the mathematical introductory chapters. They can conveniently find the protocol chapters in the table of content by looking out for chapter titles that are typeset in italic font.
In more detail, the mathematical methods and certification protocols presented here are the following: We start our discussion on quantum states with a brief introduction to the mathematical formalism of quantum mechanics, such as mathematical notions of operators and the modelling of quantum mechanical measurements (Section II A). This allows us to formally introduce quantum state certification as a one-sided statistical test in Section II B. Certification protocols rely on quantum mechanical measurements, which are probabilistic in nature. Therefore, the confidence of the protocols is controlled using so-called tail bounds introduced in Section II C. As an example for an application of tail bounds, we derive the estimation error and the confidence when estimating expectation values of observables in Section II D. In order to quantify the accuracy of quantum state preparations, we introduce relevant metrics on quantum states in Section II E. A popular metric is given by the (Uhlmann) fidelity. We provide a certification protocol in terms of the fidelity in Section II F. Stabilizer states, Section II G, are an important class of quantum states that can be certified with particularly few Pauli measurements (Section II H). Another approach to certification employs estimation protocols. Estimating the fidelity requires more measurements compared to the one sided certification protocol. A tool to reduce the measurement effort is importance sampling introduced in Section II I. Direct fidelity estimation uses this method to estimate the fidelity w.r.t. pure target states from relatively few state copies, Section II J.
For the remaining part of the tutorial random quantum states and random unitaries play an important role. For this reason, we introduce them in Section II K. Certain random unitary operations allow, in general, for an estimation of the fidelity from fewer state copies than direct fidelity estimation, which we explain in Section II L on shadow fidelity estimation.
We start our discussion of quantum processes with some mathematical preliminaries (Section III A), where we introduce the Choi-Jamiołkowski isomorphism (a.k.a. channel-state duality), process fidelity measures quantifying average-case error measures and a worst-case error measure, the diamond norm. Most certification methods for quantum processes use average-case error measures. The presented quantum state certification methods can be translated to quantum processes using the Choi-Jamiołkowski isomorphism. As an example, Section III B presents the resulting protocol for direct quantum process certification. Such translated protocols, typically require high-quality state preparations and measurements to probe the quantum processes. A method tailored to quantum gates that allows to extract the average gate fidelity without requiring highly accurate state preparations and measurements is randomized benchmarking (Section III C). As our last protocol we discuss cross-entropy benchmarking in Section III D; this method has been used by Google to build trust in their recent experiment demonstrating the potential power of quantum computers in the task of generating certain random samples.

II. QUANTUM STATES
The first part of the tutorial is devoted to protocols that aim at certifying that a quantum state generated by a device is the correct one. We start by quickly reviewing and introducing the mathematical formalism of quantum mechanics. We expect that most of the presented material and the used basic mathematical notions are already known to the reader. Therefore, we will be fairly brief in our presentation and aim at quickly setting up the notation that we will use throughout the tutorial. For sake of completeness, we provide many details on the mathematical formalism. However, the main ideas behind the protocols and their theoretical guarantees can also be followed with a more superficial understanding of the mathematical preliminaries.

A. Mathematical objects of quantum mechanics
In order to discuss quantum states we set up some mathematical notation. We focus on finite-dimensional quantum mechanics in accordance with our emphasis on digital quantum computing. Hence, we assume all vector spaces to be finite-dimensional. The space of linear operators from a vector space V to a vector space W is denoted by L(V, W ) and we set L(V ) := L(V, V ). A Hilbert space is a vector space with an inner product ·, · . Let H and K be complex Hilbert spaces throughout the tutorial. We denote the adjoint of an operator X ∈ L(H, K) by X † , i.e. k, Xh = X † k, h for all h ∈ H and k ∈ K.
As customary in physics, we will use the bra-ketnotation (Dirac notation): We denote vectors by ketvectors |ψ ∈ H and linear functionals on H by bravectors ψ |, which are elements of the dual space H * . Furthermore, we understand ket-vectors and bra-vectors with the same label as being related by the canonical isomorphism induced by the inner product. In bra-ket notation we frequently drop tensor-product operators to shorten the notation, e.g. |ψ |φ := |ψ ⊗ |φ ∈ K ⊗ H or |ψ ψ | := |ψ ⊗ ψ | ∈ K ⊗ H ∼ = L(K, H) for |ψ ∈ K and |φ ∈ H.
To describe the state of a quantum system we require the notion of density operators. The real subspace of selfadjoint operators, X = X † , is denoted by Herm(H) ⊂ L(H) and the convex cone of positive semidefinite operators by Pos(H) := {X ∈ Herm(H) | ψ| X |ψ ≥ 0}. The trace of an operator X ∈ L(H) is Tr[X] := i i| X |i , where { |i } ⊂ H is an arbitrary orthonormal basis of H. The vector space L(H) is itself a Hilbert space endowed with the Hilbert-Schmidt (trace) inner-product The set of density operators is defined as S(H) := {ρ ∈ Pos(H) : The outcome of a quantum measurement is modelled by a random variable. Abstractly, a random variable is defined as a measurable function from a probability space to a measurable space X . Here, we will exclusively be concerned with two types of random variable: (i) Those that take values in a finite, discrete set X ∼ = [n] := {1, . . . , n} (understood as the measurable space with its power set as the σ-algebra) and (ii)) those that take values in the reals X = R (with the standard Borel σ-algebra generated by the open sets). In practice, the underlying probability space is often left implicit and one describes a random variable X taking values in X directly by its probability distribution P that assigns a probability to an element of the σ-algebra of X . For example, for a random variable X taken values in R and I ⊂ R an interval, we write P[X ∈ I] for the probability of X assuming a value in I. Abstractly speaking, P is the push-forward of the measure of the probability space to X induced by the random variable X. Thus, P is sufficient to describe X. The underlying probability space is, however, important to define correlations between multiple random variables which are understood to be defined on the same probability space.
The probability distribution of a discrete random variable X taking values in a finite set X ∼ = [n] is characterised by its probability mass function p X : [n] → [0, 1], k → p X (k) := P[X = k] := P(X ∈ {k}). A real random variable X is characterised by its (cumulative) probability distribution P X : R → [0, 1], x → P X (x) := P[X < x] := P[X ∈ (∞, x)] or in case it is absolutely continuous by its probability density p X : R → [0, 1], x → p X (x) := d dt x P (t). Note that if a discrete random variable takes values in a discrete subset of R we can also assign a non-continuous (cumulative) probability distribution.
The most general way to define a linear map from density operators S(H) to random variables is by means of a positive operator valued measure (POVM). A POVM is a map from (the σ-algebra) of X to Pos(H). For a discrete random variable X taking values in [n] a POVM is uniquely defined by a set of effects where 1 H ∈ L(H) denotes the identity operator. Strictly speaking the POVM is the map on the power set of [n] that extends k → E k additively. It is convenient and common to refer to the set of effects as the POVM. A POVM M (with effects) {E i ∈ Pos(H)} n i=1 induces a map from S(H) to random variables. To this end, we associate to ρ the random variable M ρ with probability mass function p Mρ (k) := ρ, E k .
These are the ingredients to formalize the static postulates of quantum theory. We will only require dynamics in Section III on quantum process certification.
Postulate (quantum states and measurements): • Every quantum system is associated with a (separable) complex Hilbert space H.
• The state of a quantum system, its quantum state, is described by a density operator ρ ∈ S(H) • A measurement with potential outcomes in a finite, discrete set O ∼ = [n] is described by • If a quantum system is in the state ρ ∈ S(H) and the measurement M is performed the observed outcome is a realisation of the random variable M ρ associated to ρ by M.
The set S(H) is convex. Its extremal points are rankone operators. A quantum state ρ ∈ S(H) of unit rank is called a pure state. In particular, there exist a state vector |ψ ∈ H such that ρ = |ψ ψ |. The state vector associated to a pure quantum state is only unique up to a phase factor. A general quantum state is therefore a convex combinations of the form i p i |ψ i ψ i |, where p is a probability vector, i.e., an entry-wise non-negative vector p ∈ R d , p ≥ 0 that is normalized, i.e., i p i = 1. A quantum state that is not pure is called mixed.
Given two quantum systems, their joint system should also be a quantum system. This expectation is captured by the following postulate.
Postulate (composite quantum systems): The Hilbert space of two quantum systems with Hilbert spaces H 1 and H 2 , respectively, is the tensor product H 1 ⊗ H 2 .

This construction induces an embedding from
Dually to that, for any state ρ ∈ S(H 1 ⊗ H 2 ) where ρ 1 is the that ρ reduced to system 1. The reduced state captures all information of ρ that can be obtained from measuring system 1 alone and can be explicitly obtained by the partial trace over the second subsystem as By F ∈ L(H ⊗ H) we denote the flip operator (or swap operator ) that is defined by linearly extending F |ψ |φ := |φ |ψ . In of H, we can express |ψ ∈ H ⊗ H by a coefficient matrix A ∈ C dim H×dim H as |ψ = i,j A ij |i |j . The coefficient matrix of F |ψ is given by the matrix transpose A of A with entries (A ) i,j = A j,i .

Lemma 1 (The swap-trick):
Let F ∈ L(H ⊗ H) be the flip operator (6). For any X ∈ L(H) it holds that Proof. The identity can be checked by direct computation with basis elements or by using tensor network diagrams. We leave it as an exercise.

B. A definition of quantum state certification
In this section, we define what we mean by a certification test for a quantum state. This definition will serve as the blue-print for the specific protocols that we present in the subsequent sections of the chapter. A state certification test solves the task of making sure that a quantum state prepared by a deviceρ is a sufficiently good approximation of a target state ρ. Due to the statistical nature of quantum measurements, the protocol for a certification test typically requires multiple copies of Figure 2: The task of quantum state certification is to detect when a state preparationρ is not close to a chosen target state ρ, i.e. when dist(ρ,ρ) > .
the quantum states. For this reason it is appropriate to think of quantum state certification as the certification of a device that repeatedly prepares a target state ρ.
In this tutorial we restrict our attention to single round protocols, where a fixed number nρ of copies of a target state is prepared and measured subsequently. Without further assumptions the output of the device is described by an output stateρ ∈ S((C d ) ⊗nρ ) on which the measurements are performed. Based on the measurement data the classical post-processor then decides to accept or reject the hypothesis that the device prepared the target state within a specified accuracy. This procedure is formalized by the notion of ancertification test. An -certification test should output "accept" if the prepared state is the targeted state in the majority of attempts. This requirement is referred to as completeness. Additionally, one demands ancertification to likely output "reject" in case the prepared states deviates from the target state beyond a tolerance. The deviation is quantified in terms of a distance measure on S(C d ) taking values in R + , the non-negative reals, and 'beyond tolerance' means that it exceeds a certain tolerated error threshold > 0. We arrive at the following definition for a single round -certification test: Definition 2 (Quantum state -certification test): Let ρ ∈ S(C d ) be a quantum state, the target state, > 0 and dist : S(C d ) × S(C d ) → R + be a distance measure. An -certification test for ρ w.r.t. dist consists of a quantum measurement on the device outputρ ∈ S((C d ) ⊗nρ ) followed by classical post-processing of the measurement data outputting either "accept" or "reject" and satisfying the completeness condition, and the soundness condition holds for the reduced statesρ i ofρ, Note that more generally one could also define certification tests with respect to measures directly on the composite space S((C d ) ⊗nρ ).
The terms completeness and soundness are inspired by interactive proof systems. The role the these conditions can be clarified from the perspective of statistical hypothesis testing. In hypothesis testing one has a null hypothesis H 0 (usually the one one hopes to disprove) and an alternative hypothesis H 1 and one needs to figure out which is true based on statistical data. In this setting, there are two types of error, In state certification we choose null hypothesis H 0 to be 'dist(ρ, ρ) > ' and 'ρ = ρ' to be the alternative hypothesis H 1 . Then, for the output of the -state certification test P["reject" |ρ = ρ] is the type II error and P["accept" | dist(ρ, ρ) > ] the type I error. The completeness condition (8) corresponds to requiring that the type II error is bounded by 1/3. Analogously, the soundness condition (9) is the requirement that the type I error is bounded by 1/3. For a test to meet the soundness and completeness condition additional assumptions on the prepared statẽ ρ can be required. A common assumption is that the device prepares a sequence of independent states. This means thatρ =ρ 1 ⊗ρ 2 ⊗ · · · ⊗ρ nρ withρ i ∈ S(C d ) for all i. In principle, it is also conceivable that a device prepares entangled states to maliciously trick a certifier working under the independence assumption. But in many circumstances minimal control over the device or beliefs about its physically plausible limitations justify the independence assumptions.
An even stronger assumption is that the prepared states are independent and identically distributed (iid.). In this case,ρ =ρ nρ . In the experimental practice it can be challenging to fulfil this assumption. For example, drifts in environmental parameters of a device can yield to a systematic deviation of the state copies that defy the iid. assumption. Nonetheless in our experience, in many instances the iid. assumption is justified by a basic understanding of the functioning of the device and valid to a sufficient degree.
The most important measures of complexity for acertification test are the following:

Definition 3 (sampling complexity):
The sample complexity of a family of any such a test {T nρ } is (the scaling of) nρ with d and .
The sampling complexity is the scaling of the number of states that the device needs to prepare for the test with the input parameters. In particular, in context of digital quantum computing the statement that a "protocol is efficient" is often understood as having sampling complexity in O(polylog(d)) as this translates into a sampling complexity in O(poly(n)) for a system of n qubits. Most guarantees that we prove for protocols in this tutorial, will consist in upper bounds on the sampling complexity of a test.
Another important measure for the practical feasibility of the protocol is the measurement complexity that quantifies how difficult it is to perform the quantum measurements of the protocol. In contrast to the precise definition of the sampling complexity, the measurement complexity should be regarded more as a collection of different ways to formalise the demands of the measurement. For this reason, the discussion of the measurement complexity is of more qualitative nature.
In the context of state certification, an important aspect of measurement complexity is the number of copies that the POVM needs to act on simultaneously. The special case that encompasses all of the presented protocols are sequential measurements where the measurement are only performed on the nρ individual state copies separately. Therefore, the measurement device does not need to be able to store state copies before performing a measurement significantly lowering its complexity.
Another relaxation of the measurement complexity of sequential measurements are non-adaptive measurements where the performed measurement on an individual copy does not depend on the previously obtained measurement results. Furthermore, the complexity of the implementation of the POVM can be quantified, e.g. by measures for the complexity of the circuits required for its implementation in terms of local gates. The qualitative assessment of the measurement complexity as being experimentally feasible or not can vary widely for different devices and platforms.
A certification test is only required to accept the target state. However, in practice, such test will accept states from some region around the target state with large probability. This property of a certification test is called robustness (against deviations from the target states). One way of how such a robustness can be guaranteed is by estimating the distance of the targeted state ρ and the prepared stateρ, as we will see in Section II J on fidelity estimation, which bounds on the distance. In this way, one obtains more information (a distance) than just certification (just "accept" or "reject").
Clearly, one can also certify through full quantum state tomography. However, the number of single sequential measurements in general required for tomography of a stateρ ∈ S(C d ) scales as Ω(d rank(ρ)) and as Ω(d 2 rank(ρ) 2 ) in the case two-outcome Pauli string measurements [6]. So, for the relevant case of pure n-qubit states this number scales as least as 2 n . This measurement effort becomes infeasible already for relatively moderate n.
We will see that fidelity estimation can work with dramatically fewer measurements than full tomography, when the target state has additional structure. In many situations, certification can work with even fewer measurements than fidelity estimation thanks to an improved -dependence in the sample complexity.
Our definition of a validation and certification test used the somewhat arbitrary confidence value of 2/3. It is not hard to see that as long as the failure probability is bounded away from 1, the confidence can be amplified by repeating the test multiple times.

Proposition 4 (Confidence amplification):
Let T nρ be an -certification test of a quantum state ρ from n ρ iid. samples with maximum failure probability δ = 1 3 . We repeat the certification test N times and obtain a new certification test by performing a majority vote on the outcomes. Then the new test satisfies the completeness and soundness conditions for all σ ∈ S(C d ), where δ = e −c N and c > 0 is an absolute constant. The parameter 1 − δ is also called the confidence of the test.
Proof. This statement can be checked directly from Definition 2.
Finally, we want to mention that, especially in the computer science community, certification is often also called verification. In particular from an epistemological point of view, a physical model or hypothesis can never be fully verified. Therefore, we will stick to the term certification for the physical-layer where we actually model a device as being in quantum state. This allows one to reserve the term verification to certification on higher level of device abstraction such as the application layer.

C. Estimation and tail bounds
A main technical tool for deriving the sampling complexity of certification protocols are tail bounds. The measurement outcomes of a quantum mechanical experiments are random variables. Recall that the expected value of a random variable X on a probability space (Ω, Σ, P ) is defined as which gives rise to the well-known expressions for a discrete, finite random variable X taking values in {x k } k∈[n] or a (absolutely continuous) real random variable X with p X the probability mass function or probability density, respectively.
If we want to estimate a measure of quality, such as a distance measure for quantum states, we have to construct an estimator for that measure, which is a function of the measured outcomes. An estimatorÊ of a quantity E is itself a random variable (pushing forward the measure on the probability space). An estimatorÊ for E is said to be unbiased if E[Ê] = E. Our estimators will typically be families of random variables depending on a number of samples, i.e., the number of quantum states that the protocol consumes. In our notation we will often leave this dependency implicit. We expect that if a protocol calculates an unbiasedÊ it reveals E accurately in the limit of infinite samples. Such an estimator is called consistent. To capture the effect of finite statistics, we introduce the notion of an -accurate estimator.
Definition 5 ( -accurate estimator): The (scaling of) number of samples required for a family of estimators to be an -accurate estimator is its sampling complexity. The sampling complexity of estimators can be derived using tail bounds of random variables.
Tail bounds for random variables are bounds to the probability that a random variable assumes a value that deviates from the expected value, as visualized by the marked area in Figure 3. Indeed, for any non-negative random variable X it is unlikely to assume values that are much larger than the expected value E[X]: Theorem 6 (Markov's inequality): Let X be a non-negative random variable and t > 0. Then Proof. Markov's inequality is as elementary as its proof. Let (Ω, Σ, P ) be the probability space of X. For the proof we denote the indicator function 1 A of a subset A ⊂ Ω by To prove Markov's inequality we set {ω : X(ω) ≥ t} and observe that for all ω ∈ Ω. Now taking the expected value of both sides of this inequality. Figure 3: The (upper) tail of a random variable X is the probability of X being greater than some threshold t. This probability is given by the corresponding area under the graph of the probability density function (PDF) of X.
As a consequence of Markov's inequality, the variance of a real random variable X, can be used to control its tails: Theorem 7 (Chebyshev's inequality): Let X be a random variable, for all t ≥ 0.
Proof. The proof follows by simply applying Markov's inequality to the random variable X 2 .
Note that the assumption of mean zero is not really a restriction but only helps to state the theorem more concisely. In the case of a random variable Y that does not necessarily have a zero mean Chebyshev's inequality yields a tail bound by applying it to X := Y − E[Y ]; see also Figure 3. The same argument can be made for the tails bounds that follow.
A random variable X is called bounded if it takes values in a bounded subset of the reals almost surely. It's empirical mean is 1 n n i=1 X i where X i ∼ X are iid. copies of X. In the case of bounded random variables, the empirical mean concentrates much more than a naive application of Markov's or Chebychev's inequality suggests. More precisely, the following inequality holds (see, e.g., [7,Theorem 7.20]):

Theorem 8 (Höffdings inequality):
Let X 1 , . . . , X n be independent bounded random variables with a i ≤ X i ≤ b i almost surely for all i ∈ [n] and denote their sum by S n := n i=1 X i .
Then for all t > 0 it holds that and Proof. We only sketch the proof and recommend to flesh out the details as an exercise. The second statement directly follows from the first one. In order to prove the first one, let s > 0, apply Markov's inequality to The independence of the X i allows one to factorize the exponential and use the bounds on the range of X i individually. Finally, choosing the optimal s yields the theorem's statement.
Note that when one can additionally control the variance of bounded random variables then the Bernstein inequality [7,Corollary 7.31] can give a better concentration, especially for small values of t.
Another related tail bound is Azuma's inequality, which allows for a relaxation on the independence assumption (super-martingales with bounded differences).
The median of means estimator is an estimator that allows for much better tail bounds than the empirical mean for the case of unbounded i.i.d. random variables with finite variance. The intuition is that taking the median of several empirical means is more robust against statistical outliers compared to taking the overall empirical mean. Let {X i } be iid. random variables with mean µ and variance σ 2 and denote by S k := 1 k k i=1 X i the empirical mean from k i.i.d. samples. Take l empirical means S k,j , j ∈ [l], that are (iid.) copies of S k and set µ := median(S k,1 , . . . , S k, ) . Then In particular, for any δ ∈ (0, 1), k = 8 ln(1/δ) and m = k , with probability at least 1 − δ.
This theorem can be proven using Chebyshev's inequality for the empirical means S k,j and Hoeffding's inequality for a binomial distribution to obtain the concentration of the median. We refer to Ref. [8] for further details.
Finally, it is often required to bound the probability that at least one of several events happening. For a series of events A 1 , A 2 , . . . the union bound (Boole's inequality) guarantees that

D. Expectation value estimation for observables
To familiarize ourself with the application of tail bounds for the derivation of sampling complexities in quantum estimation, we turn our attention to a very basic task in quantum mechanics: the estimation of an expectation value of an observable.
We have formulated a general quantum measurement in terms of a POVM. An important special case of a POVM is a projector-valued measure (PVM) where the effects are orthogonal projectors. A measurement described by a PVM is also called a von Neumann / projective measurement.
An observable quantity is modelled by a self-adjoint operator A ∈ Herm(H). A self-adjoint operator has a eigendecomposition A = n α=1 a α P α with a α ∈ R and orthogonal projectors P i onto the eigenspaces. The set of outcomes associated to the measurements of A is its real eigenvalue spectrum spec(A) = {a α } α∈[n] and the measurement is described by the PVM that has the projectors P α as effects. Thus, associated to an observable A is the map from S(H) to random variables ρ → A ρ taking values in spec(A) with probability mass function p Aρ (a α ) = Tr[P α ρ]. This implies that the expectation value of an observable A ∈ Herm(H) in the state ρ is Given a quantum system in some state ρ ∈ S(H), we wish to estimate A ρ ; note that the expectation value itself cannot be observed directly but needs to be estimated from single measurements. One protocol for estimating A ρ is to perform the projective measurement of the observable multiple times and use the observed empirical mean as an estimator for A ρ . Let A (i) ρ be the random variable describing the outcome of the i-th measurement of A in state ρ. The empirical mean estimator of m measurements is It is easy to see that Y (m) is an unbiased estimator for A ρ . So how many copies of ρ does this protocol consume in order to arrive at an -accurate estimate of A ρ with confidence 1 − δ?
If the measurements are independent and the eigenvalue spectrum of A is bounded, i.e. a α ∈ [a, b] for all α ∈ [n], then Höffdings inequality (23) yields a bound on the sampling complexity.

Proposition 10 (Estimation of observables):
Let ρ ∈ S(H) and A ∈ Herm(H) be an observable with spec(A) ∈ [a, b]. Choose > 0 and δ ∈ (0, 1). The empirical mean estimator (29) of the expectation value A ρ from measurements of A on m independent copies of ρ satisfies with probability at least 1 − δ for all Proof. Having m independent state copies implies that the measurement outcomes are independent random variables. We choose X 1 , . . . , X m as independent copies of the random variable A ρ /m. Then, the empirical mean estimator is described by a sum of m independent random variables for any > 0. We wish this probability to be small, i.e., we require that for some δ ∈ (0, 1) and determine the critical value m 0 required for the estimation by solving the inequality for m = m 0 , which yields (31).
Proposition 10 guarantees that expectation values of bounded observables can be estimated with a measurement effort that is independent of the Hilbert space dimension. The confidence 1 − δ can be improved exponentially fast by increasing the measurement effort m.
One can define distance measures on S(H) in terms of expectation values of a set of observables. Naturally, the estimation protocol described in this section gives rise to an -certification test w.r.t. to such measures.

Further reading
Using the union bound, one can easily generalize Proposition 10 to derive the sampling complexity of estimating multiple observables. The total number of sufficient state copies ρ to estimate different observables then scales as m 0 ∈ O log(2 /δ) 2 . In this setting each observable is estimated from a different measurement setting. In contrast, Shadow estimation [9][10][11] provides a way to estimate multiple observables from a single measurement setting. For certain types of observables, the shadow estimation has sampling complexity of We will further discuss shadow estimation techniques in the context of state certification in Section II L.

E. Distance measures for quantum states
In our general definition of an -certification test, Definition 2, requires a distance measure on S(H). In this section we introduce some 'natural' measures on quantum states.
To this end, recall that any normal operator X ∈ L(H), i.e., any operator that commutes with its adjoint, [X, X † ] := XX † − X † X = 0, can be written in spectral composition X = i x i P i , where x i ∈ C are its eigenvalues and P j = P 2 j ∈ Pos(H) the corresponding spectral projectors. There are several useful norms of an operator X ∈ L(H, K). For any operator X ∈ L(H, K) between two Hilbert spaces H and K, the operator X † X is positive semidefinite, i.e., in Pos(H). In consequence, it has a positive semidefinite square root |X| := √ X † X ∈ Pos(H). The spectral norm (a.k.a. operator norm) X op ∈ R + of X is defined to be the largest eigenvalue of |X|. The trace norm is X 1 := Tr[|X|] and the Frobenius norm These norms can be defined in a variety of equivalent ways: The spectral norm coincides with the norm induced by the 2 -norm on H via X op = sup v 2 ≤1 Xv 2 , a manifestation of the Rayleigh principle. The Frobenius norm is induced by the Hilbert-Schmidt inner product (1). It can also be expressed in terms of the matrix representation of X as X F = i,j |X ij | 2 . Finally, all three norms are instance of the Schatten p-norms that are directly defined as p -norms on the singular value spectrum. The singular value spectrum σ(X) of X is defined as the eigenvalue spectrum of |X| and the p -norms are given by This gives rise to the unitarily invariant Schatten p-norm X p := σ(X) p and · op , · 1 , and · F are the Schatten p-norms with p = ∞, 1, 2, respectively.
The Euclidean inner product is bounded by p -norms by the Hölder inequality that states that for all x, y ∈ C d and pairs p, q ∈ {1, 2, . . . , The Hölder inequality generalizes the Cauchy-Schwarz inequality where p = q = 2. The Schatten p-norms in-herit a matrix Hölder inequality from the Hölder inequality: Let X, Y ∈ L(H, K) and p, q as before then The Hölder inequality directly follows from the von Neumann inequality Tr |AB| ≤ σ(A), σ(B) where the singular value spectra σ(A) and σ(B) are each in descending [12]. Furthermore, the Schatten p-norms inherit the ordering of the p -norms, X ∞ ≤ . . . ≤ X 2 ≤ . . . ≤ X 1 for all X. Norm bounds in reversed order will in general introduce dimensional factors. For low-rank matrices these bounds can be tightened.

Lemma 11 (Reversed norm bounds):
For all X ∈ L(H, K) it holds that Proof. Let X ∈ L(H, K) and r = rank(X). We can always write X = XP r with P r a rank-r projector onto the orthogonal complement of the kernel of X. Now by the matrix Hölder inequality (35) For the second inequality, bound again using the matrix Hölder in- Taking the square root we conclude that X F ≤ √ r X op from which the second inequality follows.
A natural metric on quantum states is the tracedistance dist Tr : We have already seen that compared to the other Schatten p-norms the trace norm is the largest, i.e., the most 'pessimistic' distance measure. Furthermore, the trace norm has an operational interpretation in terms of the distinguishability of quantum stets by dichotomic measurements.

Proposition 12 (Operational interpretation of the trace distance):
Let ρ, σ ∈ S(H). It holds that Furthermore, the supremum is attained for the orthogonal projector P + onto the positive part of ρ − σ.
Proof. First we show that the supremum is attained for P + . The self-adjoint operator difference can be decomposed as into a positive part X + ∈ Pos(H) and a negative part The last two statements together yield that the trace distance between the two states is where P + is the orthogonal projector onto the support of X + . It can be calculated by means of the singular value decomposition of ρ − σ = U ΣV † as P + = U + V † + with U + and V + the matrices with singular left and right vectors, respectively, associated to the positive singular values as its columns.
In order to show that the supremum cannot become larger than the trace distance, we consider some operator P with 0 ≤ P ≤ 1. Then, indeed, where we have used the matrix Hölder inequality (35) and (40) in the last two steps.
Given two quantum states the optimal dichotomic POVM measurement {P, 1 − P } to distinguish the two states is the POVM that maximizes the probability of measuring the outcome associated to P in one state and minimizes the same probability for the other state. Of course exchanging the role of P and 1 − P works equivalently. We can think of the achievable differences in probabilities as a measure for the distinguishability of ρ and σ. Proposition 12 shows that the trace distance of two states coincides with the maximal distinguishability by any dichotomic POVM measurements. This distinguishability of a single shot measurement can be amplified by measuring multiple iid. copies of a quantum state with {P, 1 − P }. We will turn this insight into ancertification test for pure states in the next section.
Before we do this, let us introduce another important distance measure on quantum states. The (squared) fidelity of two quantum state ρ, σ ∈ S(H) is defined as Note that While not any more directly evident from (43), the fidelity is symmetric as is apparent from (42).
Some authors define the fidelity as √ ρ √ σ 1 without the square. For this reason, one might want to refer to the expression of (42) explicitly as the squared fidelity to avoid confusion. For brevity we however call F simply the fidelity hereinafter.
The fidelity is more precisely not a measure of 'distance' for two quantum states but of 'closeness'. In particular, F(ρ, ρ) = 1, which can be seen to be the maximal values of F(ρ, σ) for all ρ, σ ∈ S(H). Hence, 0 ≤ F(ρ, σ) ≤ 1 on S(H). Often it is convenient to work with the infidelity 1−F(ρ, σ) as the complementary measure of 'distance'.
When at least one of the states ρ or σ is pure, say ρ = |ψ ψ | then which can easily be proven using (43). Furthermore, for both states being pure we have Thus, for pure states the fidelity is the overlap of the states and can be related to the angle between the state vectors. We will in fact mostly encounter the case where at least one of the states is pure and mostly work with (44) instead of (42). The fidelity is related to the trace distance as follows.

Proposition 13 (Fuchs-van-de-Graaf inequalities [13, Theorem 1]):
For any states ρ, σ ∈ S(H) By the virtue of the Fuchs-van-de-Graaf inequalities one can, in many applications, regard the trace-distance and fidelity as equivalent measures since the inequality does not introduce dependencies on the Hilbert-space dimension. Note however that the square-root on the righthand side can still make an painstaking difference in practice. Aiming at a trace-norm distance of 10 −3 can in the worst-case require to ensure an infidelity of 10 −6 . This can be a crucial difference when it comes to the feasibility of certification. Importantly, the square-root scaling is unavoidable for pure states.
In Exercise 14 we showed that the upper bound of (45) is tight for pure states. Conversely, one might hope for more mixed states to arrive at an improved scaling closer to the lower-bound of (45). We will review such a bound in the analogous discussion of distance measures of quantum channels, Theorem 53 in Section III A.
In the next section, we present protocols that aim at directly providing an -certification test for certain states. Section II J and II L present two protocols that aim at estimating the fidelity: direct fidelity estimation and shadow fidelity estimation.

F. Direct quantum state certification
In this section we present approaches to certification protocols for quantum states that are direct in that they do not use a protocol designed for another task, such as an estimation protocol, as a subroutine. Our exposition mostly follows the work by Pallister et al. [14]. We start with perhaps the most direct attempt building on the insight of Proposition 12. This proposition illustrated the interpretation of the trace distance as the maximal distinguishabiltity by a dichotomic POVM and showed that the optimal POVM in this regard is given by the projection onto the positive part of the state difference. This indicates that the best way to distinguish a pure quantum state from all other states is to measures the POVM that has the state itself as an element.
We now turn this insight into an -certification test. It can be most easily formulated in terms of the infidelity 1 − F as the distance measure.
Given a pure target state ρ = |ψ ψ | with a state vector |ψ ∈ C d , we consider the POVM {Ω, 1−Ω} given by Ω = |ψ ψ |. We call the outcome corresponding to Ω "pass" and the one of 1 − Ω "fail". Then, for anỹ i.e., the probability of the POVM returning "pass" is the fidelity of the two states. This gives us a simple protocol that measures the POVM on a single state copy and accepts when the result is "pass" and rejects otherwise. This protocol is complete but not sound in the sense of Definition 2 as the probability of an acceptance is fixed to be 1−F(ρ,ρ), i.e., the probability of a false acceptance in not constantly bounded away from one. But using more state copies we can boost the probability to detect deviations of the form F(ρ,ρ) < 1 − with some targeted confidence 1 − δ.
In order to be able to capture a class of large measurement settings we first formulate the protocol for an arbitrary dichotomic POVM measurements.
if the outcome is "fail" then: 4: output "reject" and end protocol 5: output "accept" As stated, this protocol is adaptive in that it can end early in case of a rejection instance. However, one could easily turn into a non-adaptive protocol without changing the number of measurements in the performance guarantee below.
For ρ a pure state and Ω = ρ the protocol is a certification protocol w.r.t. the infidelity as more precisely summarized by the following proposition.

Proposition 16 (Performance guarantee I):
Let ρ ∈ S(C d ) be a pure target state and choose , δ > 0. Protocol 15 with Ω = ρ is ancertification test w.r.t. the infidelity from nρ independent samples for with confidence at least 1 − δ. Moreover, the protocol accepts the target state ρ with probability one.
Proof. The probability of the measurement outcome "pass" Hence, the final probability that the protocol accepts is Clearly, ifρ i = ρ for all i ∈ [nρ] then the protocol accepts almost surely. Now let us consider the case that the fidelity is small, i.e., Then the probability that the protocol wrongfully accepts is Now we wish this probability (type II error) be bounded by δ > 0, i.e., This maximum type II error is achieved for We note that for ∈ [0, a] ⊂ [0, 1) the following bounds hold which can be seen by using the fact that → ln 1 1− is smooth, has value 0 at 0, its first derivative is lower bounded by 1, and its second derivative is positive. Hence, for any nρ ≥ ln(1/δ) the required bound (53) is satisfied.
As a remark, the minimum number of samples in (54) is given by so that (48) captures the leading scaling of (54).
Perhaps surprisingly, the sample complexity (48) of this direct certification protocol does not depend on the physical system size at all. It has a zero type I error and one can control the type II error via the parameter δ. However, for many target states it is not practical to directly implement the required POVM. This motivates the following more complicated strategies. Say, we have access to a set of POVM elements These encode the measurements that are experimentally feasible. As one can only make finitely many measurements, we assume that |M| < ∞. Then for each state we pick a POVM element M ∈ M with some probability and consider the corresponding dichotomic POVMs Let ρ ∈ S(C d ) be a pure target state and (M, µ) be a probabilistic measurement strategy. For state preparationsρ 1 , . . . ,ρ nρ ∈ S(C d ) the protocol consists of the following steps.
if the outcome is "fail" then: 5: output "reject" and end protocol. 6: output "accept" Let us assume that the prepared states are iid. copies of a stateρ. Then following Protocol 17 with strategy (M, µ) the overall probability of measuring "pass" is where is the so-called effective measurement operator. We will see that it plays a similar role as the measurement operator Ω in Protocol 15 when it comes to proving performance guarantees. At the same time, it allows to capture more sophisticated measurement strategies. However, there is one constraint that allows for a simple analysis of Protocol 17: we require that i.e., that the there is no false reject of the target state ρ with probability one. In particular, it requires that Tr[M ρ] = 1 for all M ∈ M. This constraint still allows for optimal measurement strategies:
The proof of this statement is a consequence of the Chernoff-Stein lemma from information theory, which quantifies the asymptotic distinguishability of two distributions in terms of their relative entropy.
Since the constraint (60) implies that there is no false rejection the only remaining hypothesis testing error is a false acceptance, which is the event where a stateρ with F(ρ,ρ) < 1 − is accepted. This event has a worst-case probability over all statesρ in the rejection region that given as In the following lemma we will see that this maximum is determined by the spectral gap of the effective measurement operator Ω, where Proof. We note that Tr[ρΩ] = 1 means that a state vector |ψ with ρ = |ψ ψ | is an eigenvalue-1 eigenvector of Ω. Moreover, let us write Ω in spectral decomposition, with 1 = λ 1 ≥ λ 2 ≥ · · · ≥ λ d and P 1 = ρ. For the case λ 2 = 1 the choiceρ = P 2 yields a maximum of 1 in the maximization (63). Let us now consider the case λ 2 < 1.
Then forρ we have i.e., the claimed maximum in (63) is attained for some feasible σ.
To show that the claimed maximum is actually the maximum we consider some stateρ ∈ S(C d ) with Tr[ρρ] ≤ 1 − . We writeρ as convex combinationρ = (1 − )ρ + ρ ⊥ and observe that ≥ . Then Given a measurement strategy with effective measurement operator Ω this lemma provides a closed formula for the false acceptance probability (61). This allows us to state the following guarantee for Protocol 17.
Proposition 20 (Performance guarantee II [14]): Let ρ ∈ S(C d ) be a pure target state and , δ > 0. We consider an effective measurement operator 0 ≤ Ω ≤ 1 with Tr[Ωρ] = 1 and with spectral gap ν(Ω) > 0, which which is given by (62). Then the certification test from Protocol 17 is an -certification test w.r.t. the infidelity from nρ independent samples for with confidence at least 1 − δ. Moreover, the protocol accepts the target state ρ with probability one.
Compared to the sample complexity (48) of the naive Protocol 15, the sample complexity (68) has an overhead of a factor 1/ν(Ω), Proof of Proposition 20. The proof is mostly analogous to the one of Proposition 16.
This proposition tells us that as long as Ω has a constant gap between its largest and second largest eigenvalue the sample complexity of the certification protocol has the same scaling as the one where Ω is the target state itself. Now it depends on the physical situation of what feasible measurement strategies Ω are. Given a set M of feasible measurements we can single out an optimal strategy as follows.

Definition 21 (Minimax optimization):
Let ρ be a pure state and > 0. Moreover, let us assume the we have access to a compact set of binary measurements given by the operators M ⊂ {P : 0 ≤ P ≤ 1 , Tr[P ρ] = 1}. Then the best strategy Ω for the worst case state preparationρ is This quantity is called minimax value and a strategy Ω where the minimum is attained is called minimax optimal.
Such minimax optimizations are common in game theory and risk analysis.
For number of settings with physically motivated measurement restrictions the minimax strategy, or at least one that is close to it, has been obtained. For instance for stabilizer states, which are ubiquitous in quantum information. In the following we introduce stabilizer states and then derive a minimax optimal certification protocol for them.

G. Stabilizer states
An n-qubit Pauli string is σ s1 ⊗ · · · ⊗ σ sn , where s ∈ {0, 1, 2, 3} n and {σ i } are the Pauli matrices Then the Pauli group P n ⊂ U(2 n ) is the group generated by all n-qubit Pauli strings and i1. An n-qubit state |ψ is a stabilizer state if there is an abelian subgroup S ⊂ P n , called stabilizer (subgroup), that stabilizes |ψ and only |ψ , i.e., |ψ is the unique joint eigenvalue-1 eigenstate of all elements in that subgroup. Such subgroups are generated by n elements and contain |S| = 2 n elements in total. Note that they cannot contain the element −1.
An example of such a subgroup is the one of all Pauli strings made of 1's and σ z 's.
It is not difficult to show that a general n-qubit stabilizer state ρ with stabilizer S is explicitly given as H. Direct certification of stabilizer states Now we consider the certification of stabilizer target states by using a particularly suitable measurement strategy in the direct certification Protocol 17. The measurement strategy essentially consists in measuring stabilizer observables that are drawn uniformly at random from the stabilizer group of the target state. We accept exactly when the measurement outcome corresponds to the stabilized eigenspaces of eigenvalue +1. This strategy is minimax optimal (Definition 21) among all strategies based on measuring Pauli observables.
Theorem 22 (Minimax optimal Pauli measurements for STABs [14]): Let |ψ we an n-qubit stabilizer state with stabilizer group S ⊂ P n with elements S = {1 = S 0 , S 1 , . . . , S 2 n −1 }. For i ∈ [2 n − 1] denote by P i := 1 2 (1 + S i ) the projector onto the positive eigenspace of S i . Then the minimax optimal measurement strategy for having Pauli observables P n as accessible measurements (see Definition 21) is given by measuring S i with probability 1 2 n −1 . The resulting effective measurement operator Ω = 1 2 n −1 satisfies Ω |ψ = |ψ and has the second largest eigenvalue Proof. By Lemma 19, the minimax optimum is where where conv(S) denotes the convex hull of a set S, i.e., the set of all convex combinations of elements in S. We argue that the minimization over conv(S) can be replaced by a minimization over conv(S ) with S := S \ {1}. To see this, observe that if Ω = (1 − α)Ω + α1 for α ∈ [0, 1] then ν(Ω) ≤ ν(Ω ). Then minimax optimal measurement strategies are of the form for a probability vector µ. We note that since Tr[P i ] = 2 n−1 . Next, since |ψ is an eigenvalue-1 eigenvector of Ω, we have and, hence Moreover, Tr[Ω] = 2 n−1 − 1. The operatorΩ with the minimal norm Ω op under this constraint is of the form Ω = a1 for a > 0. Taking the trace of that equality, solving for a and denoting the orthogonal projector of with In order to finish the proof we show that Ω ∈ conv(S), i.e., that this choice of Ω is indeed compatible with (78). We write the stabilizer state |ψ ψ | as combination of the stabilizers (see (74)) and use that S j = 2P j − 1, and, hence which is the Ω from (82) and also the measurement strategy from the theorem statement.
Corollary 23 (Sampling complexity [14]): Let us call the outcome corresponding to P i "pass" and the one corresponding to 1 − P i "fail". Then Protocol 17 is an -certification test of ρ w.r.t. infidelity from nρ independent samples for with confidence 1−δ. Moreover, ρ is accepted with probability 1.
Proof. According to Proposition 20 a number of measurements is sufficient, where This results in So, restricting from all measurements to Pauli measurements results in at most a constant overhead of 2, cmp. Proposition 16. We note that only very few of the 2 n − 1 non-trivial stabilizers of ρ are actually measured. More precisely, the measurements are the ones of randomly sub-sampled stabilizer observables.

Further reading
A version for the certification of ground states of locally interacting Hamiltonians was developed by Cramer et al. [16] and extended by Hangleiter et al. [17] to ground state enabling universal quantum computation. In this line of research, fidelity witnesses [17][18][19] can be used to measure and estimate on a fidelity lower bounds. The work [16] solves the certification problem by efficiently reconstructing the state assuming it to be of matrix product form. Similar ideas work for permutationally invariant states [20][21][22].
Popular instances of direct certification protocols following the ideas of Pallister et al. [14] include the following settings: • Stabilizer states and two qubit states with single qubit measurements [14] • Ground states of locally interacting Hamiltonians [23] • Bipartite states [24,25], qubit case in an LOCC setting [26] • Hypergraph states [23] with improvements in efficiency by Zhu and Hayashi [15] • Stabilizer states [14,23] • Adversarial scenario [27,28] Kalev et al. [29] have extended arguments from direct fidelity estimation [30] and ground state certification [17] to the certification of stabilizer states. They also use Bernstein's inequality to give an quadratically improved -scaling for large .
Global von Neumann measurements on multiple iid. copies of the prepared quantum state have been considered [31] (even with mixed target states), which leads to a sample complexity scaling as nρ ∈ O(d/ ) a version of -certification of quantum states in S(C d ).
For a very helpful survey on quantum property testing we refer to Ref. [32], where several methods and notions of certification are reviewed.

I. Importance sampling
In the next section, we will study direct fidelity estimation, where the fidelity between a target state and a state preparation is estimated from measurements that are drawn randomly from a certain distribution depending on the target state. The idea is to perform the measurements more often that are particularly relevant to the fidelity estimation. This idea is formalized by a Monte-Carlo integration technique called importance sampling. Monte-Carlo integration aims at computing an integral F that is written as an expected value of some function f over a probability distribution with density function p: The general idea is to draw iid. samples X (1) , . . . , X (m) ∼ p and take the empirical averagê as estimator for F . It is not difficult to see thatF is unbiased. If Var[f (X)] < ∞ thenF can be proven to be consistent, i.e.,F converges to F for m → ∞ in an appropriate sense. Moreover, Thereby the empirical variance also gives an estimate of the estimation error. The estimation error can be controlled by increasing the number of samples m. Now, the integration (92) relies on the ability to sample from p. A popular way to make such sampling efficient is importance sampling. The main idea of importance sampling is to rewrite the integrand f p in (91) as for some probability distribution with density function q. Then we can apply the Monte Carlo sampling idea (92) w.r.t. q and draw X (1) , . . . , X (m) ∼ q to obtain the estimatorF It holds that E q [F q ] = F and One can show that the minimal variance is achieved by choosing q as with a normalization factor Z such that q * is a probability density. Note that for So, if f does not change its sign then a single sample from q * is sufficient for the exact estimation. This might seem miraculous at first sight. But its is important to notice that in order to determine the optimal q * one needs to know the value of normalization Z and calculating Z is equivalent to solving the integration problem. However, finding non-optimal but good choices for q can already speed up the integration, as we will see in the case of direct fidelity estimation.

J. Direct fidelity estimation
We assume to be given access to state preparations ρ ∈ S(C d ) of some target state ρ ∈ S(C d ). Direct fidelity estimation (DFE) [30,33] is a protocol to estimate the fidelity Tr[ρρ] for the case where the ρ is a pure state, i.e. of the form ρ = |ψ ψ |. In order to do so, the target states is expanded into products of Pauli matrices (73) of the form σ s1 ⊗ · · · ⊗ σ sn with s i ∈ {0, . . . , 4} and d = 2 n being the Hilbert space dimension. For sake of readability we denote these Pauli products by W 1 , . . . , W d 2 in some order and note that they are an orthogonal basis for the space of Hermitian operators Herm(C d ) w.r.t. the Hilbert-Schmidt inner product (1): Given any operator σ ∈ Herm C d we define its characteristic function (or quasi-probability distribution) Thanks to the orthogonality relation (98) we have and hence for any ρ, σ ∈ Herm(C d ). Now, we use importance sampling (Section II I) to estimate the sum (101) for the case where the target state ρ ∈ S(C d ) is a pure state. For this purpose we rewrite the overlap (101) as and define We choose q as the probability mass function of the importance sampling distribution on the sampling space The purity of ρ can be written as and equals 1 for ρ pure. Thus, q is indeed a normalized probability vector. We define random variable for k ∼ q and find that X k is an unbiased estimator of the fidelity: where the last identity is again Eq. (101).
In order to estimate the random variable X k , we need to know the value of the characteristic function χρ(k). By (99), χρ(k) can be estimated as the expectation value from repeated measurements of the observable W k in the prepared stateρ. Thus, we end up with an estimation procedure of Tr[ρρ] that involves two sources of randomness and correspondingly proceeds in two steps: (i) We classically sample k from [d 2 ] according to the importance sampling distribution (103) defined by the target state ρ. (ii) For the randomly drawn k, we estimate X k from repeated probabilistic measurements of W k . Combining the estimates of the X k we arrive at an estimate for Tr[ρρ] via (105).
The following protocol summarizes these steps. To derive a guarantee for DFE we have to control the error made in the two estimation steps. To this end, we consider the steps in reversed order: We consider Y := 1 i=1 X ki with iid. samples k i ∼ q assuming perfect estimates X ki for the moment. The accuracy of Y as an estimator of Tr[ρρ] can be controlled by increasing . Subsequently, we have to analyse the accuracy of the estimatorŶ of Y that uses the finitely many measurement outcomes. Altogether we arrive at the following guarantee: Theorem 25 (Guarantee for DFE [30]): Let ρ ∈ S(C d ) be a pure target state. The number of expected state preparations in Protocol 24 is If the state preparations are iid. given byρ ∈ S(C d ) then the fidelity estimateŶ is an 2accurate unbiased estimator of F(ρ,ρ) with confidence 1 − 2δ.
Note that the sample complexity scales linearly in the Hilbert space dimension. In contrast, the number of Pauli measurements required for state tomography scales asΩ(d 2 rank(ρ) 2 ) [6].
Proof of Theorem 25. We start with bounding the estimation error arising by taking the empirical average in Step (iv) of Protocol 24. We note that X k defined in (105) is an unbounded random variable in general, as χ ρ (k) can be arbitrarily small. Hence, we will use Chebyshev's inequality (21) to derive a tail bound for Y . Using the definitions (103) and (105) of q and X and that X is the unbiased estimator (106), the variance of X becomes Hence, Using the basic insight of Monte Carlo estimation (93), we obtain for any > 0. Hence, for any δ > 0 and the failure probability is bounded by δ, Now we bound the statistical error that arises from the estimation of X ki from measurement measurement setup Step (iii) of Protocol 24. For this purpose we write for each k the eigendecomposition of W k as with {P k,α } being the projector onto the eigenspaces and {a k,α } ⊂ {−1, 1} the eigenvalues of the Pauli string W k . We note that the expected measurement outcome is We denote by a kj ,αj the measurement outcome for measurement j ∈ [m i ] and consider the following corresponding empirical estimate of X ki (see (105)) Then we consider the sum As E[ Ŷ ] = Y , using Höffding's inequality (23) on the double sum with t = and bounds we find that (w.l.o.g. we assume that there are no i with We wish that the tail bound to hold. Therefore, we impose the RHS of (120) to be bounded by δ, which is equivalent to The choice of m i as in (107) guarantees that this bound it always satisfied and, thus, (121) holds. Then combination of the tails bounds (114) and (121) with the union bound (28) proves the confidence statement In order to calculate the final sample complexity (108) note that m i is a random variable itself, since k i and hence χ ρ (k i ) was randomly chosen. By the definition of the sampling (103), for fixed i we have where the +1 comes from the ceiling in (107). Using the bound (113) on , the expected total number of measurements is We remark that DFE estimation can be extended to sets of observables that are arbitrary orthonormal bases of Herm(C d ). However, in this case the operator norm used to bound the eigenvalues a k,α and hence the sampling complexity can be larger. One can generalize DFE further to frames that include over-complete bases, see Ref. [34].
The main contribution to the number of measurements in the derivation of the sample complexity above can be traced back to the application of Chebyshev's inequality in (112). This step can, however, be improved for the following class of states.
A prominent example for well-conditioned states are stabilizer states (see Section II G). It is easy to show that every stabilizer state ρ on n-qubits with stabilizer S, (74), is well-conditioned with parameter α = 1: where 1 S the indicator function (18) of S. For such well-conditioned state the sample complexity can be improved as follows.
Proof. With probability one we have The estimator from Step (iii) of Protocol 24 is hence bounded as with probability 1. The estimatorŶ is, thus, bounded as |Ŷ | ≤ 1 α almost surely. Höffding's inequality (23) with t = yields Imposing and solving for we find that Theorem 27 tells us that for well-conditioned states DFE has a sampling complexity independent of the system size. Ref. [30] investigates into the idea of removing "bad events" which are those that violate the wellconditioning condition. Moreover, a two-step estimation procedure as in Theorem 25 is considered also for wellconditioned states.
Finally, we look at how to turn DFE into a certification protocol with respect to the trace distance.
We summarize the result in the following proposition. Fix parameters˜ , , δ > 0 with˜ ≤ 1 2 2 . LetŶ be the direct fidelity estimator of the fidelity F(ρ, σ) so that |Ŷ − F(ρ, σ)| ≤˜ with confidence 1 − δ. We consider the protocol that accepts ifŶ ≥ 1 − and rejects otherwise. As distance we choose the trace distance dist Tr defined by dist Tr (ρ, σ) := • This protocol is an -certification test w.r.t. the trace distance in the sense of Proposition 4, i.e., the completeness and soundness conditions are satisfied with confidence 1−δ.
• The resulting sampling complexity of DFE for well condition states scales as 1/ 4 .
• Let < . Turn this protocol can be turned into a robust ( , )-certification test, i.e., into an -certification test that is guaranteed to accept all states within an -trace norm ball around ρ with confidence 1 − δ.
Proof. The proof follows from Definition 2 by direct calculations. We leave filling-in the details as an exercise.

K. Random states and unitaries
Random ensembles of quantum states and unitary matrices find ubiquitous applications in quantum information processing and, in particular, in certification and estimation protocols. Roughly speaking, random unitary operations together with a fixed quantum measurement allow to quickly gain information about the entire state space. The arguably simplest probability distribution on the unitary group U(d) is given by the Haar measure µ U(d) . In general, for a compact Lie-group the Haar measure is the unique left and right invariant probability measure which generalises the notion of a uniform measure. In applications one is often interested in random variables that are polynomials in matrix elements of a Haar-random unitary U and its complex-conjugate U † . In this case, also all moments of the random variable are the expected value of such polynomials. In this section we will introduce the mathematical theory required to explicitly calculate such moments. To this end, we observe that any polynomial p t (U, U † ) of degree k can be written as the contraction with two matrices A, B ∈ C dk×dk This motivates to define the k-th moment operator of a probability measure µ on U(d) as M If we have an expression for the k-th moment operator for the Haar measure µ U(d) , we can calculate the expectation value of arbitrary polynomials p k (U, U † ) over U ∼ µ U(d) by a linear contraction (132). The crucial property that characterises the k-th moment operator of µ U(d) is the following: Consider a fixed unitary U ∈ U(d) then a short calculation exploiting the unitary invariance of the Haar measure reveals that We find that M where orthogonality is understood with respect to the Hilbert-Schmidt inner product (1). As will become motivated shortly, we refer to as the diagonal representation of U(d).

Lemma 29 (k-th moment operator):
The k-th moment operator M is the orthogonal projector onto comm(∆ k (U(d))), the commutant of the k-order diagonal representation of U(d).
Proof. With (134) we have established that the range of M (k) µ U(d) is in comm(∆ k (U(d))). The converse also holds since for A ∈ comm(∆ k (U(d))) we calculate that M The orthogonality requirement follows in very few lines of calculation using linearity and cyclicity of the trace.
Note that the argument of the proof applies more generally and yields the analogous result for arbitrary groups equipped with a Haar measure, e.g. the uniform measure on a finite group.
The commutant of the diagonal representation of the unitary group can be characterised using a powerful result from representation theory: Schur-Weyl duality. To set the stage for explaining the result we start by reviewing some basic definitions and results from representation theory.

Representation theory
Let us start with the most basic definitions. For a proper introduction we refer to Simon's book [35] and to Goodman and Wallach's book [36] for the representation theory of the standard matrix groups.
Let G and H be groups.
• f : G → H is a (group) homomorphism if f (g 1 g 2 ) = f (g 1 )f (g 2 ) for all g 1 , g 2 ∈ G. Note that this condition implies that f (e G ) = e H and f g −1 = f (g) −1 for all g ∈ G.
• A homomorphism R : G → GL(V ) into the invertible operators on a vector space V is called a linear (group) representations. R is a unitary representation if R : G → U(H) is a homomorphism to a unitary group U(H) ⊂ L(H) on some Hilbert space. We will only be concerned with such unitary representations and, hence, often omit the word "unitary".
is called the multiplicity space of R i1 . The decomposition (137) is called multiplicity-free if all irreps R i are inequivalent, i.e., not isomorphic.
Proof. The condition (139) Hence, this condition also holds for Re(A) := 1 2 (A + A † ) and Im(A) := 1 2i (A − A † ) and A is a constant if they both are. Hence, it is sufficient to prove the theorem for A ∈ Herm(H).

Corollary 32 (Irreps of abelian groups):
If G is abelian then every irrep has dimension 1.
Proof. Let R be an irrep of G on H. Theorem 31 implies that each g ∈ G has representation R(g) = c g 1 for some constant c g . Hence, every subspace of H is invariant under R. Since R is an irrep this is only possible if dim(H) = 1.
There is also a slightly more general version of Schur's lemma: Theorem 33 (Schur's lemma II): Let R : G → U(H) andR : G → U(H) be two irreps of G on finite dimensional Hilbert spaces H andH. If A ∈ L(H,H) satisfies then either A = 0 or R 1 and R 2 are unitarily equivalent up to a constant factor.
Proof. The condition (140) implies that for all g ∈ G and, hence, Schur's lemma (Theorem 31) implies that A † A = c 1 and AA † =c 1 for constants c,c. Obviously, c =c, the eigenvalues of both operators have to coincide. Either c = 0 so that A = 0 or W = A/ √ c is a unitary. In the latter case for all g ∈ G, i.e., R andR are unitarily equivalent.
A unitary W relating two representations R andR as in (144) is called an intertwining unitary of R andR.

Schur-Weyl duality and the commutant of the diagonal action
To calculate the moments for random variables depending of Haar-random unitaries we are interested in understanding the commutant of the diagonal representation of the unitary group. Formally, we define the diagonal representation of U(d) on (C d ) ⊗k as by linearly extending the action The representation ∆ k d has a duality relation with another well-known representation on C k : the representation π k of the symmetric group S k permuting the k tensor components: We note that π k (σ) and ∆ d (U ) commute for any σ ∈ S k and U ∈ U(d).
Let us consider the following two irreducible representations of the symmetric group which appear in the decomposition, Proposition 30, of π k for any k. We call |Ψ ∈ (C d ) ⊗k symmetric if π k (σ) |Ψ = |Ψ for all σ ∈ S k and anti-symmetric if π k (σ) |Ψ = sign(σ) |Ψ for all σ ∈ S k . The symmetric subspace H sym k and antisymmetric subspace H ∧ k of (C d ) ⊗k are the subspaces consisting of all symmetric and all anti-symmetric vectors, respectively.

Lemma 34 (Symmetric subspace):
• The dimension of the symmetric subspace P sym k (C d ) ⊗k is • The orthogonal projectors onto the symmet-ric and anti-symmetric subspace are and respectively.
Proof. The first statement is a combinatorial one. The trace of the symmetric projector is the number of ways to distribute k indistinguishable particles (bosons) into d boxes (modes), i.e., the dimension of the corresponding subspace of the bosonic subspace, which is known to be given by the binomial coefficient. The second statement follows, e.g., for P sym k by realizing that any symmetric vector in in the range of P sym k and that this operator in indeed a projector, i.e., that P sym k is self-adjoint and P sym k P sym k = P sym k .
For the case of k = 2 the decomposition into these two subspaces is very familiar. It is easy to see that any matrix can be decomposed into a symmetric and an antisymmetric part, which are orthogonal to each other. This implies that Note that due to Corollary 32, both the symmetric and the antisymmetric subspace are isomorphic to C m sym 2 and C m ∧ 2 , respectively. Here m sym 2 and m ∧ 2 are the multiplicities of the two distinct one-dimensional irreps of S 2 . For k > 2 there is a similar decomposition with more summands called Schur-Weyl decomposition. The Schur-Weyl decomposition relies on a duality relation between the commuting representations ∆ k d and π k . The representations ∆ k d and π k span each others commutant as algebras.
By Schur's lemma such a duality relation implies that the multiplicity spaces of the irreducible representation of one representation are irreducible representations of the dual representation and vice versa. In other words, C d decomposes into multiplicity-free representations of the combined action U(d) × S k . In order to state this composition, we write λ = (λ 1 , λ 2 . . . , λ l(λ) ) k for a partition of k into l(λ) integers λ 1 ≥ 1 fulfilling Such partitions of integers label the irreducible representations of the symmetric group and the diagonal representation. As a consequence of Schur-Weyl duality one can prove. The action of U(d) × S k on (C d ) ⊗k given by the commuting representations (147) and (146) is multiplicity-free and (C d ) ⊗k decomposes into irreducible components as where U(d) acts non-trivially only on W λ and S k acts non-trivially only on S λ . For any k ≥ 2, both H sym k and H ∧ k occur as components in the direct sum (154).
The spaces W λ are called Weyl modules and S λ Specht modules. Schur-Weyl duality implies that the Weyl modules are the multiplicity spaces of the irreps of S k and, similarly, the Specht modules are the multiplicity spaces of the irreps of U(d).
Schur-Weyl duality, Theorem 35, and the resulting decompositon, Theorem 36, give a simple characterisation of the commutant of the diagonal action. The relation (151) allows one to derive an expression for the k-moment operator M (k) µ U(d) as the orthogonal projector onto the span of the symmetric group. But one has to be careful since {π d k (σ)} σ∈S k is not an orthonormal basis. Note that it only becomes a orthogonal set asymptotically for large k which can be exploited in some applications, e.g. [37]. A general expression in terms of so-called Weingarten functions [38] was derived by Collins and Sniady [39], see also the supplemental material of Ref. [40] for a convenient expression of their result and a summary of the derivation. For our purposes we only require to derive an expression M (k) µ U(d) for certain special cases, namely, for k = 2 and when restricted to symmetric endomorphisms as its input.
We begin with the second moment, k = 2.

Proposition 37 (Second moment operator):
For an operator A ∈ L(C d ⊗ C d ), d ≥ 2, it holds that Proof. From Lemma 29 and Theorem 35 we know that M (2) µ U(d) (A) is a linear combination of the identity 1 and the swap operator F from Eq. (6). For S 2 the expansion of the projectors onto the symmetric and and anti-symmetric subspace, (149), can be inverted yielding Id = P sym 2 + P ∧ 2 and F = P sym 2 − P ∧ 2 . This establishes the form of (155). Since P sym 2 and P ∧ 2 are mutually orthogonal projectors and M Second, we allow for arbitrary k but restrict the input of M (k) µ U(d) to endomorphisms that are itself symmetric, i.e., of product form. We can proof the following lemma.

Lemma 38 (Moment operator on symmetric operators):
For an operator A ∈ L(C d ) it holds that with c = Tr(P sym k A ⊗k )/ Tr(P sym k ).
Proof. We fix some A ∈ L(C d ) and denote E := M (k) By the definition of the moment operator (133), E = U(d) (U AU † ) ⊗k dµ U(d) (U ) and it becomes apparent that E commutes with π d k (σ) for any σ ∈ S k . In other words, E ∈ comm ∆ k d (U (d)) ∩ comm π d k (S k ) by Lemma 29. By Schur's lemma (Theorem 31) and the Schur-Weyl decomposition (154), we thus conclude that E also acts proportionally to the identity on every Specht module S λ . Denoting the orthogonal projector onto W λ ⊗ S λ as P λ , the operator E permits the decomposition E = λ k,l(λ)≤d c λ P λ with c λ ∈ C. Since the projectors are onto mutually orthogonal subspaces the coefficients are given by c λ = Tr(A ⊗k P λ )/ Tr(P λ ). Finally, we observe that A ⊗k = P sym k A ⊗k P sym k and, thus, c λ = 0 for all λ that do not correspond to the symmetric subspace. This leaves us with the lemma's expression for E.

Uniformly random state vectors
One can also define a uniform distributed on pure quantum states in multiple equivalent ways. First, one can draw randomly from the complex sphere S(C d ), i.e. the set of normalized vectors in C d . Indeed, there is a unique uniform probability measure µ S(C d ) on S(C d ) that is invariant under the canonical action of U(d) on C d . By definition we see that a column |ψ = U |0 of a Haar-randomly drawn unitary U ∼ µ U(d) is distributed according to µ S(C d ) . Finally, we can switch to density matrices by factoring out a global phase. In more detail, the complex projective space CP d−1 := S(C d )/ U(1) is the set of state vectors modulo a phase in U(1), which can be identified with the set of pure density matrices CP d−1 ⊂ S(C d ). It also as a uniform unitarily invariant probability distribution: a uniformly random pure state |ψ ψ | can be obtained by drawing |ψ ∼ µ S(C d ) .
We can calculate the moments of polynomials that depend on states drawn uniformly from µ S(C d ) using the moment operator M (k) µ U(d) . To this end, note that any polynomial p k ( |ψ , ψ |) of degree k in the component of each |ψ and ψ | can be written as a contraction of |ψ ψ | ⊗k with some operator in L(C d k ). For this reason the following lemma summarizes everything we need.

Lemma 39 (Moment operator of random states):
where P sym k is the projector (149) onto the symmetric subspace.
Since P sym k acts trivially on |ψ and it is normalized, the enumerator evaluates to one. The denominator is the dimension of P sym k given by (148).

Unitary, spherical and complex-projective k-designs
With our excursion to representation theory we have derived expressions to calculate the moments of random variables on uniformly random states and unitaries. In many applications the very same results can be also used for certain other interesting probability distributions. To this end, note that if we only want to control the first t moments of a random variable that is a polynomial of degrees in a random state or unitary, then our calculation will only involve the moment operators M (k) µ U(d) for k ≤ t . In many application it is sufficient to control the expectation value and the variance of low-degree polynomials. In these cases, any probability distribution that reproduces the first couple of moments of the uniform distributions can be used without changing the mathematical expressions. This ideas is formalized by the definition of k-designs.

Definition 40 (Unitary k-design):
A distribution µ on the unitary group U(d) is an unitary k-design if its k-th moment operator (133) coincides with the one of the Haar measure, Furthermore, a subset {U 1 , . . . , U n G } ⊂ U(d) is called an unitary k-design if its uniform distribution is one.
Note that by definition, any unitary k-design is also a unitary (k − 1)-design for k ≥ 2.
A famous example of a unitary design in the context of quantum computing is the Clifford group.

The Clifford group
The n-qubit Clifford group Cl n ⊂ U(2 n ) is the stabilizer of the Pauli group P n (see Section II G), The acting locally on any qubit. Together with the T = √ S gate the Clifford group is a universal gate set (see, e.g. [41, Section 4.5.3]). The Clifford group is a unitary 3-design but not a unitary 4-design [42][43][44]. Being a subgroup of the unitary group the commutant of the diagonal action of the Clifford group for k > 3 is, thus, a strictly larger space than the span of the permutation group. A classification of the 'missing generators' of the commutant was done by Gross et al. [45].
Analogously to unitary designs, we can define spherical k-designs. For a distribution µ on the complex sphere S(C d ) we define the k-th moment-operator as Definition 41 (Complex spherical/projective k-design): Furthermore, a subset S(C d ) is called an spherical k-design if its uniform distribution is one. The corresponding distribution of |ψ ψ | is called a complex projective k-design.
Analogously to the relation of the uniform measure on U(d) and S(C d ), a rather obvious but important example of a spherical k-designs are the orbits of a unitary kdesign. If µ is a unitary k-design for U(d) and |ψ ∈ C d then the induced distributionμ given by U |ψ with U ∼ µ, is a complex spherical k-design.
One can use this relation to see that the Clifford group being a unitary 3-design implies the analogous statement for stabilizer states.

Stabilizer states are 3-designs
The set of all stabilizer states (Section II G) is known to be a 2-design [48,49], actually even a 3-design but not a 4-design [42,43,50].
Other examples for spherical designs that play important roles in quantum system characterization are mutually unbiased bases and symmetric, informationally complete POVMs.

Mutually unbiased bases (MUBs)
MUBs are sets of bases with minimal overlaps. More explicitly, two orthonormal bases  [53] showed that maximal sets of MUBs are complex spherical 2-designs.

A symmetric, informationally complete (SIC) POVM is given by a set of d 2 normalized vectors
"Symmetric" refers to the inner products being all equal. Renes et al. [54] have shown that SIC POVMs are indeed 2-designs and have explicitly constructed them for small dimensions.

L. Shadow fidelity estimation
Another recently proposed approach to fidelity estimation makes use of estimating so-called classical shadows [10,11]. The principle idea of shadow estimation is to calculate the least-square estimator of a quantum state from recorded classical measurement outcomes with measurement setting drawn from a certain measurement frame. As we will see in this section such a POVM that allows for a quite explicit analysis is given by a complex projective 3-design.
From the state's least-square estimator one can construct estimators of multiple target functions of the state, which are linear function or even higher polynomials. The sampling complexity of the derived estimators can be captured by a so-called shadow norm that is defined in terms of the measurement frame. The classical postprocessing complexity is determined by the complexity of constructing the state estimator and evaluating the target functions. Operationally, the analysed POVM measurement is assumed to be implementable by random unitaries from a suitable ensemble and a consecutive basis measurement. While shadow estimation is a rather broad and flexible framework, we focus on the estimation of fidelities with pure target states using unitaries that form a unitary 3-design, e.g., multi-qubit Clifford gates or suitable subgroups thereof. Besides being an instructive example for shadow fidelity estimation, the 3-design setting can be equipped with a performance guarantees that features a sampling complexity O( −2 ) that does not scale with the Hilbert space dimension. This system-sizeindependent scaling is not achievable in general for other measurement frames.
The complete shadow fidelity estimation (SFE) protocol is the following: Output the median of means estimator (25) of We have presented the protocol as iterations over combined experimental and classical pre-and post-processing steps. Note, however that one can complete the three stages separately: First one can classically generate the complete sequence of nρ random unitaries. Then, one can subsequently perform the quantum experiment, i.e. all repetitions of step (ii). Importantly, at this stage not even the knowledge of the target state ρ is required. Storage of the experimental outcomes, nρ bit strings, requires nρ log n bits. These bit strings together with a prescription of the random sequence of unitaries are then taken as the input of the post-processing algorithm that calculates the median of means estimator. The complexity the classical post-processing depends on the complexity of calculating the overlap of (166). For an arbitrary target state ρ the effort of performing this task can scale exponentially in the number of qubits. In contrast, for stabilizer states and Clifford group unitaries the Gottesman-Knill theorem, see e.g. [41], allows for an efficient computation of the expression.
Shadow fidelity estimation comes with the following guarantee:

Theorem 43 (Guarantee for SFE):
Consider Protocol 42 with µ being a unitary 3design and ρ a pure target state. Choose δ ∈ (0, 1), > 0 and a number nρ ≥ 160 such that it is a multiple of k = 8 ln(1/δ) . Then, the median of means estimator of the protocol is an -accurate unbiased estimator of F(ρ,ρ) with confidence 1 − δ for nρ iid. state preparations; the median is taken over l = nρ/k means, each of which is an empirical mean of k realizations off i .
Theorem 43 shows that SFE requires a number of state copies that for arbitrary pure target states does not dependent on the Hilbert space dimension.
With the DFE protocol of Section II J we already en-countered another fidelity estimation protocol. In contrast to SFE, recall that DFE features a sampling complexity independent of the Hilbert space dimension only for the class of well-conditioned states, cmp. Theorem 27. Keep in mind, however, that in order to additionally ensure an efficient classical post-processing also SFE requires further structure such as provided by stabilizer states. Finally, note that SFE and DFE, as presented here, make use of different type of measurement data. While SFE uses basis measurement randomly selected from a large set of bases, DFE uses the expectation values of observables. Correspondingly, they differ in their requirements for experimentally implementations. The proof of the performance guarantee, Theorem 43, proceeds in three steps: First, we have to establish that the SFE estimator actually estimates the fidelity for pure target states. To derive the sampling complexity of the estimator a natural attempt would be to employ Hoeffdings inequality. Unfortunately, the random variablesf i defined in (166) only have bounds scaling as O(d). This becomes exponentially large in the number of qubits and does not yield the desired scaling. The main insight underlying the efficiency of shadow fidelity estimation is that the due to the structure of the unitary 3-design the variance off i is still bounded in O(1). Thus, as a second step we derive the bound for the variance. Finally, by combining both results we arrive at the sampling complexity using the tail-bound for the median of mean estimator introduced in Theorem 9. Using the median of mean estimator allows us to derive a sampling complexity in O(ln δ −1 ) in the confidence. Note that simply using a mean estimator in the SFE protocol can also be equipped with a guarantee with sampling complexity in O(δ −1 ) using Chebyshev's inequality, Theorem 7. A mean estimator might in practical parameter regime even be more precise compared to the median of mean estimator.

Lemma 44 (Unbiasedness of SFE estimator):
Consider Protocol 42 with µ being a unitary 2design and ρ a pure target state. Letf i be a random variable (166) w.r.t. a state preparationρ. Then where the expectation value is taken over both, U ∼ µ and the subsequent random measurement outcome.
Proof. For convenience we suppress writing the index i. Born's rule for the probability of the measurement outcomes gives us, Thus, the expectation value over U and the measurement reads The second term can be directly evaluated using the fact that we sum over a basis, The first term can be calculated using the 3-design property of µ. More precisely, at this point we only need µ to be a 2-design. Recall that if U ∼ µ is a unitary k-design then for any state |τ its orbit |φ = U |τ with the induced measureμ is a state k-design. Thus, using (7) and Lemma 39 we calculate that Combining both terms again and using that Tr[ρ] = 1, we find that Assuming that ρ is a pure state establishes the statement (168).
Next we bound the variance.

Lemma 45 (Variance bound for SFE):
Consider Protocol 42 with µ being a unitary 3design and ρ a pure target state. Letf i be a random variable (166) w.r.t. a state preparationρ. Then where the variance is taken over both, U ∼ µ and the subsequent random measurement outcome.
Proof. We again suppress the index i. The variance is Using Born's rule (169), Eq. 171 and that U |b is distributed as a complex spherical 3-designμ, the second moment can be written as The first term in this expression can be calculated using the 3-design property ofμ and Lemma 39 We recall that the projector P sym 3 onto the symmetric representation of the symmetric group S 3 is given by the sum of all six permutations in S 3 . Those are the identity, 3 transpositions and the cyclic and anti-cyclic permutation. Writing out this sum and tracking the resulting contractions (which can be most conveniently done using tensor network diagrams) yields Tr where we have used the normalization of the states and that ρ is pure in the last identity. Combining (175), (176) and using the expression (168) from the previous lemma and (178) we find the upper bound We have now the ingredients to simply invoke Theorem 9 for the median of mean estimator as the final step.

Further reading
Shadow fidelity estimation builds on the idea of extracting an incomplete description of a quantum state in order to subsequently estimate its properties. For such an incomplete description that correctly predicts the expectation of a set of observables Aaronson coined the term 'shadow' in Ref. [9]. The broader framework for shadow estimation developed by Huang et al. [10,11] allows to derive the sampling complexity of different measurement frames and is also not restricted to estimating fidelities. See also Paini and Kalev [55] for a parallel work analysing the sampling complexity of estimating expectations values of observables from measurement frames that are generated using a group. Finally, we note that the linear cross-entropy benchmarking protocol [56] presented in Section III D similarly to SFE exploits a unitary 3-design as the measurement frame to achieve a sampling complexity scaling independently of the system size, as explicitly worked out by Helsen et al. [57].

III. QUANTUM PROCESSES
In the first part of this tutorial we have presented different approaches to certify quantum states. For the second part we now turn our attention to the certification of quantum processes, i.e. maps on quantum states.
As quantum technologies typically involve processing quantum states, the task of their certification is omnipresent. For example in quantum computing, processes of interest might be individual quantum gates, entire algorithms or a noise process that accounts for the deviation from the ideal functioning of a device.
Many of the methods developed for quantum states can be employed to derive analogous results for quantum processes. In principal, we can always arrive at a certificate for a quantum process by certifying its output states on a suitably large sets of input states. Similarly, maximally entangling the input of a quantum process with ancillary quantum systems allows one to operationally prepare a quantum state representing the quantum process via the so-called Choi-Jamiołkowski isomorphism.
After reviewing the mathematical formalism for describing quantum processes and discussing several measures of quality, we will briefly discuss examples of translating methods for direct state certification to quantum processes.
As we will see, these approaches come with potentially severe drawbacks concerning the feasibility of the measurements. The characterization of a quantum process always involves the preparation of input states and measurements on the output of the process. In this task socalled state preparation and measurement (SPAM) errors can be a serious obstacle for a reliable characterization. This has motivated the development of quantum characterization and verification methods that are robust against such SPAM errors to quite some extend. One way to achieve this robustness are self-consistent approaches that aim at simultaneously characterising quantum processes, the state-preparation and the measurement [58][59][60]. These methods however require extensive effort in terms of the number of measurement settings, sampling complexity and classical post-processing, and deliver far more information than required for certification.
An important class of certification methods in the context of digital quantum computing are randomized benchmarking protocols [49,61,62]. Randomized benchmarking protocol extract performance measures for quantum gates by implementing random gate sequences of different lengths and measuring the error that accumulates with the sequence length. By studying the error dependence in the sequence length randomized benchmarking protocols are robust against SPAM errors. We will present two proto-typical types of randomized benchmarking protocols targeting performance measures of a gate-set and of individual gates together with the theoretical analysis in the simplest setup in Section III C.
Finally, in Section III D we turn our attention to a method that is used in order to certify the correct implementation of a quantum circuit in the context of demonstrating so-called quantum supremacy 1 : cross-entropy benchmarking [56].

A. Quantum processes and measures of quality
A quantum processes should model possible operations taking quantum states to quantum states. Mathematically a quantum processes is, thus, a linear map taking density operators to density operators with suitable properties. Therefore, we start with introducing some notation related to linear maps between operator spaces.
In  H, H).
Lastly, a map Φ ∈ L(H, K) is called unital if Φ(1 H ) = 1 K . Note that Φ is trace-preserving if and only if its adjoint (w.r.t. the Hilbert-Schmidt inner product) Φ † is unital.
So, essentially, quantum channels are maps that take density matrices to density matrices even when applied to a part of a larger system. Usual unitary dynamics is of the following form.

Example 46 (Unitary channels):
We use calligraphic letters to denote the adjoint representation U ∈ L(H) of a unitary U ∈ U(H) given by These maps are quantum channels and are called unitary (quantum) channels.
Unitary channels are invertible and the inverses are again unitary channels.

The Choi-Jamiołkowski isomorphism
The Choi-Jamiołkowski isomorphism [64,65] provides a duality between CP maps and bipartite positive semidefinite operators and allows to identify channels with certain states. It has many applications in quantum information theory and related fields. In particular, it gives a practical criterion to check whether a given map is a quantum channel. Furthermore, it will allow us to derive certification methods for quantum processes from the already presented methods for quantum states.
For any vector space V , recall that there is the canonical isomorphism where V * := L(V, C) is the dual space of V . Furthermore, if V is equipped with an inner product · | · , we have the canonical isomorphism v → (w → v|w ) iden- is one of theses isomorphism of vector spaces given by the following sequence of simple identification: where the natural isomorphism (185) is denoted by "=", the isomorphism of changing the order of the vector spaces by " ∼ =", and identification marked by "hc" makes use of the Hilbert space isomorphism H ∼ = H * . More explicitly, the Choi-Jamiołkowski isomorphism can be written in the following way. Let ( |i ) i∈[dim(H)] be a basis of H and the unnormalized maximally entangled state. The Choi matrix of X ∈ L(H, K) is given as This characterization of C(X ) implies for all X ∈ L(H, K), A ∈ L(H) and B ∈ L(K), as can be seen by direct calculations with basis elements or tensor network diagrams. Now we can connect the Choi-Jamiołkowski isomorphism to the properties of quantum channels.

Theorem 47 (CPT conditions):
For any map X ∈ L(H, K) the following equivalences hold:

(i) X is trace-preserving if and only if
Tr K [C(X )] = 1.
(ii) X is Hermiticity-preserving if and only if C(X ) is Hermitian.
(iii) X is completely positive if and only if C(X ) is positive semidefinite.
For completeness, we remark that another important consequence of the complete positivity of a map is the existence of so-called Kraus operators. This gives another item that could be added to Theorem 47: X is a CP map if and only if there are (Kraus) operators K 1 , . . . , K r ∈ L(H, K), where r = rank(C(X )) so that for all A ∈ L(H). Moreover, X is a CPT map if and only if (191) holds with In the context of quantum information theory, another normalization convention for the Choi-Jamiołkowski isomorphism is useful. For X ∈ L(H, K) we set with Choi matrix (189). The theorem tells us that X is a quantum channel if and only if J(X ) is a density matrix with the reduction to H (obtained by tracing over K) being a maximally mixed state. The so-called Choi state of a channel X is where is a maximally entangled state, i.e., has the strongest bipartite quantum correlations possible in a precise sense.
In particular, the Choi state can be prepared by applying the channel to this state. Note that not every bipartite state corresponds to a channel. Indeed, the Choi-Jamiołkowski isomorphism is an isomorphism of convex cones, C : CP(H, K) → Pos(K⊗H) but CPT(H, K) is mapped to a proper subset of S(K ⊗ H). The reason is that the trace-preservation constraint of channels corresponds to dim(H) 2 many equalities whereas the trace constraint of states is just one equality.
An important quantum channel and frequent model for noise processes appearing in quantum technologies is the depolarizing channel. The (quantum) depolarizing channel D p : L(C d ) → L(C d ) with parameter p ∈ [0, 1] is the linear map defined by

Inner products of superoperators and fidelity measures
The vector space of linear maps L(H, K) is also equipped with a canonical inner product (the Hilbert-Schmidt inner product for superoperators) given by for any X , Y ∈ L(H, K), where the trace can be calculated using a orthonormal basis {E 0 , E 1 , . . . , E d 2 −1 } of L(H) as The Hilbert-Schmidt inner product on L(H, K) coincides with the inner product of the corresponding Choi matrices, i.e., for any X , Y ∈ L(H, K) We now consider the case where Y is a quantum channel and X a unitary quantum channel. Then, as we have seen above, J(Y) and J(X ) are quantum states (density matrices). Moreover, J(X ) is a pure state. In this case, the above Hilbert-Schmidt inner product with the proper normalization is the fidelity measures induced by the state fidelity (44) via the Choi-Jamiołkowski isomorphism (189), it is referred to as the entanglement (gate) fidelity.
In the context of digital quantum computing, another very prominent fidelity measure for quantum processes is the so-called average gate fidelity. The average gate fidelity (AGF) between maps X , Y ∈ L(H, K) is defined as where the integral is taken according to the uniform Haar-invariant probability measure on state vectors of Section II K 3. Note that the inner product here is the Hilbert-Schmidt inner product of L(K) not L (H, K). From the definition we see that the average gate fidelity F avg (X , Y) is a measure of closeness of X and Y that compares the action of X and Y on pure input states on average. Intuitively, if X and Y only deviate in their action on a low-dimensional subspace of H they can still have an average gate fidelity close to one.
For any X , Y ∈ L(H, K) This motivates the definition F avg (X ) := F avg (id, X ) for X ∈ L(H).
The average gate fidelity is intricately related to the Hilbert-Schmidt inner product on L(H, K) [67,68] (see also [69]).
Proposition 48 (Inner product and F avg ): This proposition implies that the average gate fidelity is and inner product, i.e., a skew symmetric nondegenerate form that is linear in its second argument. For Hermiticity-preserving X and Y the average gate fidelity is real, F avg (X , Y) ∈ R. Thus, on Hermicity-preserving maps it is symmetric, Therefore, we can associate a distance measure with the average gate fidelity: the average error rate or average infidelity, that is also real-valued for Hermicity-preserving maps. We set r(X ) := 1 − F avg (X ).
Proof of Proposition 48. By the virtue of (201) which also holds for the inner products appearing in (202) it suffice to prove the statement for X = id. Using (190) and denoting the transposition map as T : L(H) → L(H), A → A , we can rewrite the average gate fidelity as Due to linearity, we can recast this expression with the moment-operator K (k) µ S(C d ) of random states and use the expression we derived in Lemma 39. Then, Tr Id ⊗T (P sym 2 ) C(Y) where the last step follows from P sym 2 = 1 2 (1 + F) with the swap operator F from (6) and Id ⊗T (F) = |1 1 |. Using (190) this time the other way around, we see that the first summand of (206) is Plugging these two expressions into (206) and solving for for Id, Y yields the assertion of the proposition.
If X † Y trace-preserving, (202) simplifies to or, equivalently, We conclude that for trace-preserving and unital quantum channels the average gate fidelity and the Hilbert-Schmidt inner product are affinely related with a proportionality constant in O(d −2 ). This is the same scaling as appearing for the entanglement fidelity in (199). More precisely, we find the affine relation between the two fidelities still assuming X † Y being trace-preserving and one of X and Y being a unitary channel. For two unitary channels U, V ∈ CPT(H) with U, V ∈ U(d) we can further simplify (208) to For V = 1 this equality reflects that the average gate fidelity measures how close U is to 1 on average where the average is taken over it's spectrum. Furthermore, the identity (202) also connects the average gate fidelity to the Frobenius norm. This, in turn, shows that the Frobenius norm is an average case error measure as well.
Lastly, beside the entanglement fidelity, the Hilbert-Schmidt inner-product, the average gate fidelity, there is a another affinely related measure of quality that is particularly convenient to work with in the analysis of randomized benchmarking: the effective depolarizing parameter. Here, we will define the effective depolarizing parameter only for trace-preserving maps via its linear relation to the fidelity. If X is not trace-preserving one can more generally define it by explicitly first projecting on unital maps. Let X ∈ L(H, K) be trace-preserving, its effective depolarizing parameter is To justify its name let us have a look at the depolarizing channel D p , which was defined in (195) as the convex combination of D 1 = Id and D 0 . The averagegate fidelity of these extremal channel can be quickly calculated to be F avg (Id) = 1 and F avg ( Plugging this into the definition of the effective depolarizing parameter (211) yields Another affinely related measure that is often used in this context is the χ 0,0 -entry of the so-called χ-process matrix, see e.g. Ref. [70] for further details.

The diamond norm
The distance measures on quantum channels we encountered so far can be regarded as average error measures. A more pessimistic, worst-case error measure is induced by the trace-norm on operators, the so-called diamond norm. It measures the operational distinguishability of quantum channels. Hence, it plays an important role in the certification of quantum processes. Indeed, also error-correction thresholds require worst-case guarantees without additional assumption on the error model, see e.g. the discussion Refs. [69,71]. At the same time, certification schemes that directly deliver certificate in diamond norm are very resource intense and typically practically infeasible. For this reason, the connection of the diamond norm to the already introduced average error measures will be in the focus of this section.
We start with defining the (1 → 1)-norm on L(H, K) to be the operator norm induced by the trace norm, Note that if X is Hermicity-preserving then the supremum is attained for a Hermitian operator since in that case X (A) = 1 2 (X (A) + X (A) † ) = X 1 2 (A + A † ) . Moreover, since the trace norm is a convex function, we have for any X ∈ L(H, K) (214) This means that the supremum is attained for rank-1 operators. For a Hermiticity-preserving X ∈ L(H), the supremum is furthermore attained at |ψ = |φ . Density operators are normalized in trace norm. This implies that channels are normalized in (1 → 1)-norm, i.e., In order to distinguish quantum channels one can use ancillary systems. For this reason, we would like to define a norm that has good stability properties for the maps acting on composite Hilbert spaces. This motivates the definition of the diamond norm as a so-called completely boundedness (CB)-completion of the (1 → 1)-norm. We define diamond norm for X ∈ L(H) by Note that the diamond norm inherits the above mentioned properties from the (1 → 1)-norm and preserves them even when a map is only applied to part of a composite Hilbert space. More precisely, the diamond norm has the following properties.

Theorem 49 (Complete boundedness and (sub)multiplicativity):
For any X ∈ L(H, K) where the supremum is taken over all finite dimen-sional Hilbert spaces H . Moreover, for all X ∈ L(H, K), Y ∈ L(H , K ) and Z ∈ L (H , H).
Proof. For the proof we refer e.g. to [66,Chapter 3.3] or recommend to prove it as an exercise.
Theorem 49 tells us that the diamond norm precisely captures the maximum distinguishability of quantum channels X , Y ∈ CPT(H, K) in the following sense. One can prepare copies of a state ρ ∈ S(H ⊗ H ) and apply either X or Y to the parts on H to obtain states on K ⊗H . Then Proposition 12 tells us that 1 2 Φ ⊗ id H (ρ) 1 is the distinguishability of the output states. Taking the supremum over all (pure) states ρ yields the distinguishability of X and Y, which is given by the diamond distance 1 2 X − Y . In particular, the theorem tells us that optimal distinguishability can be obtained by choosing H = H in a similar sense as it can be detected when a map is not CP just using H = H, cp. Theorem 47(iii).
Another way to distinguish quantum processes is to prepare their Choi states and distinguish them, as characterized by Proposition 12 via the trace norm. The following statements provides a relation of the two notions of distinguishability of quantum channels.

Proposition 50 (Diamond norm and trace norm):
For any map X ∈ L(H, K) where J denotes the Choi-Jamiołkowski isomorphism (193).
The upper bound can be improved. For a Hermitianpreserving map X ∈ L(H, K) the improved bound implies [72, Corollary 2] Proof of Proposition 50. We prove the proposition in terms of C(X ) = dim(H) J(X ). Denoting the Frobenius norm again by · F , it holds that as can be seen from (214) and rearranging the contractions. Choosing A = B = 1/ dim(H) (corresponding to the maximally entangled state (194)) establishes the lower bound. The upper bound follows using Hölder's inequality (35), and our normalization convention (192).
It is not difficult to see that the bounds in Proposition 50 are tight, i.e., that there are X , Y ∈ L(H, K) so that J(X ) 1 = X and X = dim(H) J(X ) 1 . These results tell us that distinguishing quantum channels via their Choi states is in general not optimal.
It is non-obvious how the diamond norm can actually be computed in practically. Watrous has shown that the diamond norm can be computed efficiently (in the dimension) via a semi-definite program [73]. However, for the highly relevant special case where the map is a difference of two unitary channels the computation is much simpler:

Proposition 51 (Diamond norm distance of unitary channels):
For any U, V ∈ U(d) the diamond norm distance of the corresponding unitary channels is where λ i are the eigenvalues of U † V , dist( · , · ) denotes the Euclidean distance and conv( · ) the convex hull, both in the complex plane.
This proposition reflects that the diamond distance is a worst-case quantity, where the worst-case optimization is done over the spectrum of the "unitary difference" U † V . The geometric interpretation of this result is reviewed and visualized in [74].
In order to prove the proposition we will write the matrices U and V as vectors. In general, (column) vectorization is a map | · : C n1×n2 → C n1n2 that stacks the columns of a matrix A ∈ C n1×n2 on top of each other. For all matrices A, B and C with fitting dimensions it holds that where X ⊗ Y ∼ = (X i,j Y ) i,j (defined by a block matrix) denotes the Kronecker product of matrices X and Y .
Proof of Proposition 51. Starting with (214) and using the Choi-Jamiołkowski isomorphism (190) and the vectorization rules for matrix products (225), we can write the diamond norm of the channel difference as (226) Using (46) relating the trace-norm difference of two trace-normalized, hermitian, unit rank matrices to their overlap yields Practical certification schemes for quantum processes will typically certify w.r.t. the Hilbert-Schmidt overlap, average gate fidelity or an equivalent quantity. In terms of the infidelity r(X ) = 1 − F avg (X ), the diamond norm and the average gate fidelity are in general related by the following inequalities.

Proposition 52 (Infidelity and diamond norm [75, Proposition 9]):
For any X ∈ CPT(C d ) it holds that Proof. The proof combines Proposition 50 with the Fuchs-van-de-Graaf inequality (45). Latter yields where we already dropped a square-root on the lower bound.
Since J(Id) = 1 d |1 1 | is of unit rank and Hermitian, it holds that F(J(Id), J(X )) = J(Id), J(X ) = F e (Id, X ). We can cast this in terms of the average gate fidelity via (209), Plugging (230) into (229) yields Finally, from Proposition 50 the proposition's assertion follows.
Proposition 52 leaves us with unsatisfactory state of affairs in two regards: First, the upper bound of the diamond norm introduces an dimensional factor O(d). In the context of quantum computing, this leaves us with a potentially large factor scaling exponentially O(2 n ) with the number of qubits n. Second, the upper bound scales with the square-root of the infidelity. For unitary quantum channels one can in fact tighten the lower-bound to r(X ) [69]. The lower-bound for unitary quantum channels indicates that the square-root scaling is unavoidable in general. Practically, this means that to certify in diamond norm requires a certificate in infidelity that is order of magnitudes smaller. Especially, for small system sizes this can be a key obstacle for the certification of the worst-case performance of quantum processes.
Fortunately, if a quantum process is highly incoherent, i.e. far away from being unitary, one can derive a linear scaling of the diamond-norm distance in the infidelity. The incoherence can be controlled by the so-called unitarity introduced by Wallman et al. [76]. For X ∈ L(H) the unitarity is defined as where d = dim H and X ∈ L(H) is the defined by One can straightforwardly check that u(U) = 1 for every unitary channel U. On the other hand, in Refs. [69,76] a lower-bound on u in terms of the infidelity r was derived for trace-decreasing maps. For X ∈ L(H) and Tr(X (1)) ≤ Tr(1) it holds that Kueng et al. [69] established that quantum channels saturating this lower bound exhibit a linear scaling of the diamond norm distance in terms of the infidelity. Let X ∈ CPT(H) be unital, d = dim(H).
We leave it with this qualitative statement and refer to Ref. [69,Proposition 3] for a quantitative statement. See also Ref. [77].

B. Direct quantum process certification
We have seen in Section II B, that quantum states can be certified with measurement strategies resembling the optimal POVM P + for distinguishing quantum states of Proposition (12). By means of the Choi-Jamiołkowski isomorphism strategies for quantum states can be lifted to quantum processes: Operationally, one prepares the Choi state (189) by applying the process to state that is maximally entangled with an ancillary system. Then one certifies the Choi state using a protocol for quantum states. The resulting certification protocols certifies with respect to the entanglement gate fidelity (199) of the channels that coincides with the state fidelity of the Choi states.
Moreover, for certain measurement strategies the protocol can be performed without using entanglement with ancillary systems. These, prepare-and-measure versions use an effective measurement strategy Ω of the form [78] For this measurement strategy the expectation value in the Choi state is and can be recast, thanks to (190), as While the dichotomic POVM defined by N i ⊗ ρ i for each i originally acts on the Choi state J(Ũ), the form of (237) suggests a simpler, straight-forward experimental implementation: of the dichotomic POVM: One prepares the state ρ i , applies the channel U under scrutiny, and measures the dichotomic POVM given by N i on the statẽ U(ρ i ). Thus, effective measurement strategies of the form [78] can indeed be implemented by simple prepare-andmeasure schemes. For Clifford unitaries this method yields a simple direct certification test. The Choi state of a Clifford unitary channel is a stabilizer states and can hence be verified with the methods of Ref. [14] discussed in Section II H.
The following proposition gives a theoretical guarantee for this protocol. It can be derived as a corollary of the results of Section II H.

Proposition 54 (Direct certification of Clifford operations, [78, Proposition 3]):
Let C be an n-qubit Clifford operation. We consider the state certification of Protocol 17 applied to its Choi state J(C), which is a stabilizer state. This yields an -certification test of J(C) w.r.t. infidelity from nρ independent such state preparations for with confidence 1 − δ. Moreover, the target J(C) is accepted with probability 1. This test corresponds to a similar certification test of C w.r.t. entanglement gate infidelity 1 − F e and can be implemented as a prepare-and-measure scheme via (237).

Further reading
The three works of Refs. [78][79][80] all follow the presented certification strategy based on direct state certification. Moreover, they discuss several additional aspects: Liu et al. [78] study non-trace-preserving processes and measurements, Zhu and Zhang [79] analyse the general multi-qudit case and strategies based on projective 2designs, and Zeng et al. [80] discuss entanglement property detection.
Similar to direct state certification also fidelity estimation protocols can be lifted to quantum processes. To this end, one applies the state fidelity estimation to the output of the process applied to randomly chosen input states. The original DFE proposal by Flammia and Liu [30] already includes the application to quantum channels by sampling from the eigenstates of multi-qubit Pauli operators as the input states. Furthermore, simplifications arising for Clifford gates are discussed. See also the parallel work by da Silva et al. [33]. A strategy to estimate the average gate fidelity by inputting states drawn at random from complex projective 2-designs was studied by Bendersky et al. [81]. Reich et al. [82] determined the minimal number of required input states for the fidelity estimation of quantum processes. See also the related work by Hofmann [83]. Reich et al. also provide a quantitative comparative overview over all of the beforementioned approaches in Ref. [84].

C. Randomized benchmarking
The schemes presented in the previous section fail in the presence of sizeable SPAM errors. In the context of digital quantum computing, this sensibility to SPAM errors is dramatically reduced by so called randomized benchmarking (RB) protocols [49,61,62,85,86]. These protocols can extract certain quantitative measures of a quantum process associated to a quantum gate set. The process can be for example a certain gate, an error channel or an error map associated to the deviation of a quantum gate set from its ideal implementation. While still concerned with the physical layer of a quantum device, randomized benchmarking protocols already make explicit use of a gate layer, the abstraction at the heart of digital quantum computing.
Randomized benchmarking comprises a large zoo of different protocols. Therefore, we begin with a fairly general description. The principle idea to achieve the SPAM-(error) robustness is the following: After preparing an input state, one applies the quantum process under scrutiny multiple times in sequences of different length before performing a measurement. Thereby, the effect of the process on the measurement is attenuated with increasing sequences length. At the same time errors in the state preparation and measurements enter the measured quantities only linearly and are independent of the sequence length. In this way, fitting the attained signals for different sequence lengths with functions depending on the length reveals properties of the quantum process disentangled from the SPAM errors.
A proto-typical RB protocol implements this rough idea for a digital quantum computer as follows. Let G ⊂ U(d) be a subgroup of unitary operations and φ : G → L(C d ) be their implementation on a quantum computer. In simple RB protocols φ(g) just models the faulty implementation of G on the actual device. More generally, the targeted implementation of the protocol can also include, e.g., a non-uniform sampling over the group or the implementation of another fixed gate after G. Also in these cases φ is the faulty version of the targeted implementation.
Note that the assumption of the existence of such a map φ already encodes assumptions on the quantum device and its noise process: The map φ might model the compilation into elementary gates, effects and imperfections of the physical control and noise. All these steps are not allowed to depend on the gate sequence the gate is part of, the overall time that evolves during the protocol, or other external variables.
With these ingredients we can state a proto-typical RB protocol.

Protocol 55 (Proto-typical RB):
Let G ⊂ U(d) be a subgroup, ρ ∈ S(C d ) an initial state, and M = {M, 1 − M } ⊂ Pos(C d ) a mea-surement. Furthermore, let M ⊂ N be a set of sequence lengths. For every sequence length m ∈ M, we do the following estimation procedure multiple times: Draw a sequence g = (g 1 , . . . , g m ) of m group elements chosen i.i.d. uniformly at random. For the sequence calculate the inverse elements g inv = g −1 1 g −1 2 · · · g −1 m . For each sequence preform the following experiment: the sequence of implementations of g followed by the implementation of g inv , to ρ.
• Perform the measurement M.
Multiple repetitions of the experiment yield an estimatorp g for the probabilities Repeating these steps for different random sequences, we can calculate an estimatorp(m) for More generally, RB protocols might go beyond Protocol 55 in various ways: E.g. by calculating the inverse of a sequence only up to specific gates, using a different measure than the uniform measure for drawing the group elements of the sequence, or performing a measurement POVM with multiple outputs or measurements adapted to the sequence. In addition, the post-processing might combine different RB data series in order to get simpler decay signatures.
The first step in the theoretical analysis of RB protocols is to establish the fitting model of the RB data p(m). Ideally, p(m) is well-approximated by a single exponential decay. Subsequently, the RB decay parameters can in certain settings be connected to the average gate fidelity of a noise-process effecting the implementation map, as we will now discuss.
The data model of most RB protocols can be understood as estimating the m-fold self-convolution of the implementation map [87]. More precisely, for φ, ψ : G → L(C d ) we can define a convolution operation as φ * ψ(g) = Egφ(gg −1 )ψ(g).
Note that this definition naturally generalises, e.g., the discrete circular convolution on vectors in C n , which can be seen as an operation on functions on the finite group (Z n , +) → C. With the convolution (241), we can rewrite the averages of the RB sequences as where the replacements h 1 = g 1 and h j = g j h j−1 for j ∈ {2, . . . , m} have been made the second equality, Id denotes the identity element of G and φ * k denotes the k-fold convolution of φ with itself. In expectation the RB data p(m) is thus a contraction defined by M and ρ of the (m + 1)-fold self-convolution of φ evaluated at the identity element.
In the simplest instance of an RB protocol one can directly calculate this expression: Namely, when G is a unitary 2-design, the targeted implementation is simple the action of G as quantum gates, and the noise in φ can be modelled by a single gate-independent quantum channel Λ ∈ CPT(C d ). Denoting by G the (adjoint) action of g as the unitary channel X → G(X) = gXg † , we have the noise model With this ansatz for φ we can calculate that The operator tw µ : (244) is the so-called (channel) twirling map and appears in different contexts in quantum information. If we write out the twirling map with the individual unitaries it reads It becomes apparent that tw µ is related to second moment operator M (2) µ , (133), by simple vector space isomorphisms. Recall that for a unitary 2-design µ Proposition 37 gives us an explicit description of M (2) µ . We can simply track the isomorphism to derive the following convenient expression.
Theorem 56 (Twirling of channels [61,68]): Let X ∈ L(C d ) be trace-preserving and µ be a unitary 2-design. Then where D p is the depolarizing channel (195) and p(X ) is the effective depolarizing parameter defined in (211).
Proof. First we note that any map X ∈ L(C d ) is uniquely determined by (X ⊗ Id)(F), which is a similar construction as the Choi-Jamiołkowski isomorphism. This isomorphism is given by Tr 2,3 [(X ⊗ Id)(F) ⊗ A] = X (A) but its explicit form is not needed. Hence, we can make the isomorphisms between the twirling map tw µ and the second moment operator M (2) µ from (133) explicit by writing For µ a unitary 2-design, M µ takes the value of the moment operator of the Haar measure. Schur-Weyl duality, Theorem 35, tells us that Observing that D 0 ⊗ Id(F) = 1/d and trivially D 1 ⊗ Id(F) = F, we conclude that Furthermore, one quickly checks that if X is tracepreserving so is tw µ (X ). Hence, tw µ (X ) is an affine combination of D 0 and D 1 . Thus, tw µ (X ) = D p holds for some p ∈ C and it remains to determine p. One way forward is a straight-forward calculation using the expressions for the coefficients provided by Proposition 38.
A short-cut is to calculate the effective depolarization of both sides. Due to the unitary invariance of µ S(C d ) , it follows from (200) that F avg (X ) = F avg (tw(X )) and correspondingly for the affinely related effective depolarization parameter p(X ) = p(tw(X )). Combined with p(D p ) = p, (212), yields the theorem's assertion.
Theorem 56 allows us to explicitly calculate the RB data model from (244). To this end, a short calculation reveals that D m p = D p m . With this we find the RB data model to be Thus, fitting a single exponential decay to the estimator p(m) allows to obtain estimatesp,Â andB for the model parameters p, A and B. In particular, the estimated RB decay parameterp is an estimator for the effective depolarizing parameter p(Λ) of the error channel Λ. Recall that the effective depolarizing parameter is affinely related to to the average gate fidelity (200) via (211). From the RB decay parameter, we thus equivalently obtain an estimate for the average gate fidelity of the noise channel Λ asF Note that the resulting estimate of (200) is robust against SPAM errors, which only enter in the SPAM constants A and B.
Deriving rigorous performance guarantees for the estimator RB estimatorp is involved: It requires the analysis of confidence region of the estimatorp g (m) of (239) that is a random variable of the quantum measurement statistics andp(m) obtained by the sub-sampling of the sequences g. Furthermore, the error of this estimators for each m enters the errors of the fidelity estimator via the exponential fitting procedure. This step depends on the choice of algorithm and the estimated sequence lengths.
Using the fact thatp(m) is the mean estimator of a bounded random variable, one can use Hoeffdings' inequality, Theorem 8, to derive confidence intervals for a overall sampling complexity that is independent of the number of qubits in the regime of high fidelity. Such bounds however are prohibitively large for practical implementations. A refined analysis by Wallman and Flammia [75] derived tighter bounds for short sequences and small number of qubits. However, bounds that are practical and scalable in the number of qubits require a careful analysis of the variance of the estimatorp g (m) over the choice of the random sequences. For G being the Clifford group, Helsen et al. [88] work-out explicit variance bounds for the estimatorp g (m) and derived sampling complexities forp(m) that are practical, independent of the number of qubits and scale favourable with the sequence length. To this end, they employed a refined representation theoretical analysis of the commutant of the 4-th order diagonal action of the Clifford group [44,89] in order to calculate the corresponding moment operator; an endeavour that is complicated by the fact that the Clifford group itself is not a unitary 4-design.
A rigorous analysis of a simplified fitting procedure was derived in Ref. [90]. Therein (again using trivial bounds on the variance) the authors show that a ratio estimator for the infidelity r = 1 − p that employs the estimates of p(m) for two different sequence length has multiplicative error using an efficient number of samples again in the regime of high fidelity.
All of these performance guarantees indicate that in principle RB protocols can be efficiently scalable in the number of qubits. To ensure also an efficient classical pre-processing of the proto-typical RB protocol it is important to have an efficiently tractable group structure so that the inverse of the gate sequence can be computed.
For the important example of the Clifford group, the Gottesman-Knill theorem, see e.g. [41], allows to compute the inverse of a sequence g m · · · g 2 g 1 in polynomial time (w.r.t. the number of qubits). Furthermore, since the Clifford group is a unitary 3-design [42,43], it meets the requirement of Theorem 56. For this reason the presented analysis applies to the Clifford group under the assumption of gate-independent noise. It is natural to ask of additional examples of groups that constitute a unitary 2-design and are covered by the presented analysis without modifications. But it was established that these two requirements are already surprisingly restrictive. A complete classification of so-called 2-groups (2-design groups) is summarized in Ref. [91]. In fact, if one requires a family of 2-groups that can be constructed for an arbitrary number of qubits, one is left with subgroups of the Clifford group or SU(d) itself as the only examples [91][92][93].
We provide more details how the analysis of the prototypical RB protocol can be generalised in the furtherreading paragraph at the end of the section. Now, we want to discuss another variant of RB that is particularly important as tool for certifying quantum gates.

Interleaved randomized benchmarking
The proto-typical RB protocol estimates the effective depolarizing parameter or the average gate fidelity of the average error channel of a gate set. In contrast, interleaved RB protocols [94] allow one to extract the effective depolarizing parameter of individual gates from a group with respect to their ideal implementation provided the noise is sufficiently incoherent.
In an interleaved RB protocol one performs in addition to the standard RB protocol a modified version, where the random sequences are interleaved with the specific target gate. The second experiment yields estimates for the effective depolarization parameter of the error channel associated to the group concatenated with the error channel of the individual target gate. Under certain assumptions the effective depolarization parameter of the implementation of the target gate can be estimated from the decay parameters of both RB protocols.

Protocol 57 (Interleaved RB):
For G ⊂ U(d) and a target gate g T ∈ G 1. follow the Protocol 55, 2. follow the Protocol 55 but modify the sequences to be g = (g 1 , g T , g 2 , g T , g 3 , . . . , g T , g m ), where g T is the target gate and g i ∈ G for i ∈ [m] are drawn uniformly at random. The inverse g inv is also calculated w.r.t. the modified sequence g.
The output of the protocol are the decay parameters of both experiments.
For the analysis we will again consider a 'mostly' gateindependent noise model and assume that G is a unitary 2-design. In the noise model we assume that the same noise channel Λ ∈ CPT(H) follows the ideal implementation of all gates but the target gate, i.e., for all g ∈ G \ {g T }. The first step of the protocol is the unmodified RB protocol. If we neglect that φ deviates from the form (255) on g T , we can apply the analysis of the previous section for gate-independent noise and conclude that the protocol outputs and estimator for the effective depolarizing constant p(Λ). E.g. for a large group is plausible to neglect the contribution of the noise associated to the g T gate to the group average.
It remains to analyse the second protocol. In analogy to (241) we can in general rewrite by substituting g i with g i g −1 i−1 g −1 T for all i > 1. Inserting the noise model (255) yields This is the same expression as (244) with Λ replaced by G † T φ(g T )Λ. Hence, applying the same arguments as in the analysis of the standard RB protocol for unitary 2-designs yields a single-exponential fitting model with decay parameter estimating the effective depolarizing parameter p(G † T φ(g T )Λ). The second part of the interleaved RB protocol, thus, yields an estimate of the effective depolarizing parameter or equivalently via (253) of the fidelity of the error map G † T φ(g T ) of the target gate G T concatenated with the error channel Λ.
From p(Λ) and p(G † T φ(g T )) it is indeed possible to infer p(G † T φ(g T )). In meaningful practical regimes this however requires additional control the unitarity of Λ [70]: For sequences of unitary channels the infidelity of their composition can scale quadratically in the sequence length in leading order. In contrast, highly non-unitary channels will feature a close to linear scaling in the sequence length. Thus, using the unitary one can derive bounds for fidelity measures of composite channels that exploit the linear scaling. We simply state the required bound without proof for interleaved RB: Theorem 58 (Composite channel bound [70]): For any two quantum channels X , Y it holds that With an estimate for the unitarityû(Λ), Theorem 58 allows to estimate the effective depolarizing constant and thus the average gate fidelity of the target gate bŷ up to an systematic error that is given by evaluating the right hand side of (257). The systematic error is small in the regime where u(Λ) ≈ p(Λ) 2 which is the case if Λ is de-coherent. The unitarity of Λ can be estimated using variants of the RB protocol itself developed in Refs. [76,95]. Alternatively, one can just assume that the error is sufficiently incoherent, i.e. that |1 − p(Λ) 2 /u(Λ)| ≤ . Conditioned on this external believe one obtains the simpler estimator that comes with a systematic error that is controlled in . Thereby, interleaved RB can be used to arrive at averageperformance certificates of individual quantum-gates.
We have already seen that for interleaved RB controlling the unitarity is helpful in deriving tighter error bounds. In addition, estimating the unitarity in principle also allows to derive better worst-case performance bounds from the average gate fidelities that come out of RB experiments using Theorem 53.

Further reading
Randomized benchmarking was originally developed in a series of work focusing on the unitary group and Clifford gates [49,61,62,85,86].
The early analyses used the gate-independent noise model (243), which we also assumed here. In many applications this is however a questionable assumption. After first perturbative approaches to derive the RB signal model under gate-dependent noise by Magesan et al. [86,96] and Proctor et al. [97], Wallman rigorously derived the fitting model for unitary 2-designs in Ref. [98].
Using the elegant description of the RB data as the m-fold convolution of the implementation map, recently proposed by Merkel et al. [87], one can abstractly understand the result as follows: As the standard discrete circular convolution, the convolution operator of maps on a group can be turned into a (matrix) multiplication using a Fourier transform. This abstract Fourier transform for functions on the group is defined to be a function on the irreducible representations of the group. In the case of RB, this function is matrix-valued and we observe matrix powers of the Fourier transforms for every irreducible representation superimposed by a linear map. For every irreducible representation, for sufficiently large m, the matrix powers are proportional to the mth power of the largest eigenvalue of the matrix-valued Fourier transformation. Contributions from other eigenvalues are suppressed. In this sense RB is akin to the power method of numerical linear algebra but in Fourier space [99]. A rigorous analysis requires to perturbatively bound the contribution of the sub-leading eigenvalues. For unitary 2-groups the adjoint representation decomposes into two irreducible representations the trace representation and the unital part of the quantum channel. For close to trace-preserving maps the trace representation will only contribute a very slow decay, i.e. a constant contribution to the fitting model, and the RB decay parameter is the dominant eigenvalue of the unital representation. Wallman [98] derived norm bounds for the contribution of sub-leading eigenvalues and showed that the contribution is exponentially suppressed with the sequence length. Furthermore, he showed that there is a gauge choice of the gate set such that the decay parameter can be connected to the average gate fidelity of the average error channel over the gate set. For qubits this gauge was demonstrated to yield a physical gate set by Carignan-Dugas et al. [100]. The physicality of this gauge is, however, in general not guaranteed and a counter example is given by Helsen et al. [57]. As discussed by Proctor et al. [101], this complicates the interpretation of the RB decay rates as related to average fidelities that have a clear physically interpretation.
While the Clifford gates are definitely a prominent use case in the benchmarking of digital quantum computers, more flexible RB protocols require to analyse groups that are not a unitary 2-design.
Randomized benchmarking protocols for other groups were developed in Refs. [102][103][104][105][106][107][108][109]. These protocols, for example, allow to include the T -gate in the gate set [103] or characterise leakage between qubit registers by using tensor copies of the Clifford group [102]. As the adjoint representation of other groups typically decomposes into multiple irreducible representation RB data is expected to feature multiple decays in general. For a description of a flexible post-processing scheme for general RB type data and performance guarantees see Ref. [57].
In order to isolate the different decays multiple variants of RB were develop. These either rely on directly preparing a state that has high-overlap with only one irreducible representation or cleverly combining data from different RB experiments to achieve the same effect. Many of these techniques can be understood as variants of the character benchmarking protocol developed by Helsen et al. [109]. Character benchmarking uses inversions of the RB se-quence not to the identity but randomly drawn gates from the group. In the classical post-processing data sequences of different end-gates are linearly combined weighted according to the character formulas. Thereby, the data is projected onto the irreducible representation of the respective character and can be subsequently fitted by a single decay.
Interleaved RB was proposed in Refs. [94,110] and demonstrated in practice. Already standard RB provides a trivial bound for individual gates of the group by simply attributing the average error to a single gate. In the original proposal of interleaved RB, the analysis does not allow for rigorous certificates that go significantly beyond this trivial bound for few qubits [70]. A general bound by Kimmel et al. [111], was considerably refined using the unitarity by Carignan-Dugas et al. [70]. Thereby it was established that if the error channel is sufficiently incoherent interleaved RB yields rigorous certificates for individual gates with reasonable error bars. There exist multiple variants of the interleaved RB scheme [112][113][114][115]. Another class of interleaved RB was introduced in Ref. [116]. Here, the average gate fidelity of individual gates is inferred from measurements of random sequences of gates that are drawn from the symmetry group of the gate. The individual gates are not part of the group itself and are also not included in the inversion of the sequence.
Another practically very interesting variation of RB arises when one does not draw the gates from the uniform but another distribution over the group [85,97,107,117]. For example, drawing the sequences randomly from the generating gates of the group allows to perform RB with much shorter sequence lengths [97].
Other quantities that can be measured by variants of the RB protocols are the unitarity [76,95], measures for the losses, leakage, addressability and cross-talk [102,118,119]. Furthermore, RB of operations on the logical level of an error correcting quantum architecture was proposed in Ref. [120].
Combining different relative average gate fidelities obtained by interleaved RB schemes can be used to acquire tomographic information about the error channel providing actionable advise to an experimentalist beyond a mere benchmarking and certification [111]. Using SPAMrobust data, these tomography schemes are in addition resource-optimal for the unitary gates [121] and Clifford gates [40]. For Pauli channels tomographic information can be efficiently obtained performing a character RB protocol on multiple qubits simultaneously [122][123][124][125].
A general framework with few theorems that establishes the RB fitting model of essentially all known RB schemes under gate-dependent noise is developed in Ref. [57]. The central assumption employed therein to control contributions from sub-dominant eigenvalues of the Fourier transformation is a closeness condition to a reference representation in diamond-norm averaged over all group elements. Moreover, a unifying review of RB is provided.

D. Cross-entropy benchmarking
The final protocol we discuss in this tutorial is crossentropy benchmarking (XEB) [56]. XEB gained importance recently: it was used in order to experimentally collect evidence that a quantum computer can perform a task that basically no existing classical computer can solve in a reasonable amount of time [126].
In Ref. [126] XEB is performed in two distinct variants: One variant aims at extracting fidelity measures averaged over random sequences of individual gates. This protocol can be regarded as a special case of the character randomized benchmarking protocol [57,109] that we have touched upon in Section III C. The second variant aims at certifying the correct sampling from the measurement output distribution of a single specific circuit. This second variant of XEB is in the focus of this section. It can be seen as an instance of a certification protocol on the application layer of a digital quantum computer. In consequence, it is commonly also referred to as a verification protocol for sampling tasks. But the application, sampling from a distribution encoded in a quantum circuit, is deliberately chosen very close to the physical layer.
XEB was proposed as a protocol in the context of demonstrating quantum supremacy. Experimentally demonstrating that a quantum computer can outperform current classical computers in some task is regarded as one of the mayor milestones in developing quantum computing technologies. The accuracy of the quantum operations and number of qubits of today's devices do not allow to run instances of interesting quantum algorithms that solve NP-hard problems such as Shor's algorithm for integer factorization, at least not for problem instances that come even close to being troublesome for a classical computer [3]. This motivated the proposal of demonstrating quantum supremacy in the task of generating samples from a probability distribution that is specified as the measurement distribution of a quantum circuit. This is a task that a quantum computer solves very naturally even though it might not be of any practical use [56,127]. At the same time one can prove that certain random ensembles of quantum circuits yield probability distributions that can not be efficiently sampled from on a classical computer [128].
Besides establishing evidence for the hardness of solving the sampling task on a classical computer, a crucial ingredient in demonstrating quantum supremacy is a certification protocol that guarantees that one has implemented the correct distribution.
The approach taken in Ref. [126] is to build trust in the correct functioning of the device for circuits that are still amenable to calculating a couple of outcome probabilities on a classical super-computer. To this end, the XEB protocol was used. The measures that XEB tries to estimate are the cross-entropy difference and its variant the cross-entropy fidelity.
In the context of certifying a sampling task it is natural to directly consider measures of quality that compare two probability densities describing the measurement outcomes. While the measures we have studied in this tutorial so far were concerned with the physical layer; measures directly comparing two probabilities can be regarded as measures on the application layer.
For a quantum circuit U acting on n qubits, we denote its measurement probability mass function in a basis { |x } x∈[d] after preparing a fixed initial state |ψ by A well-known statistical measure [129] to relate two probability mass functions q, p : [d] → [0, 1] is the crossentropy q(x) ln(p(x)). (261) For p = q we find that H X (q, q) = − x q(x) ln(q(x)) =: H(q) is the standard Shannon entropy. One can show that H(q) is the minimal value of the cross-entropy H X (q, p), a relation known as Gibbs' inequality [130].
In the context of quantum supremacy demonstrations one expects the target probability distribution that one aims to implement to be of Porter-Thomas shape. We say that a probability mass function p : [d] → [0, 1] is of Porter-Thomas shape if the tail distribution of p(x) regarded as a random variable for x drawn uniformly at random from [d] is well-approximated by an exponential decay function, where p uni denotes the uniform distribution. Note that while the left-hand site of (262) is discontinuous, the right-hand side allows us us to approximately think of the distribution of p(x) as being described by the continuous probability density p PT (p) = de −dp of the Porter-Thomas distribution [131]. We will use this description in our theoretical analysis multiple times. The motivation to study distributions of Porter-Thomas shape stems from considering Haar random unitaries in place of the quantum circuit U and is further illuminated in the following box.

Densities of Porter-Thomas shape
For U ∈ U(d) drawn from the Haar measure µ U (d) one can show that the absolute value of its matrix entries have the probability density p e (p) = (d − 1)(1 − p) d−2 . In the limit of d 1, p e (p) is described by the Porter-Thomas distribution [131] p P T (p) = d exp(−dp) For fixed U and again in the limit of large d, one can hence argue that the probability mass function p U is of Porter-Thomas shape [56]. Assuming that p U is of Porter-Thomas shape, Boxio et al. [56] showed that a straightforward calculation reveals H X (p U , p U ) = H(p U ) = ln(d) + γ − 1, (264) H X (p uni , p U ) = ln(d) + γ, where γ is the Euler-Mascheroni constant and p uni (x) = 1/d is the uniform probability mass function.
The introduction of the so-called cross-entropy difference as a performance measure in sampling tasks for quantum supremacy brought the cross-entropy into focus:

Cross-entropy difference
Ref. [56] introduced the cross-entropy difference as a performance measure in sampling tasks d XE (q, p) := H X (p uni , p) − H X (q, p) , where p uni is the uniform distribution. The crossentropy difference, thus, measures the excess in cross-entropy that q has with p beyond the uniform distribution.
In the previous box we have argued that for Haarrandom unitaries the corresponding measurement densities p U are generically of Porter-Thomas shape. The motivation of the cross-entropy difference is highly relying on this observation. By definition we have that d XE (p uni , p) = 0 for any p. If p is of Porter-Thomas shape, (264) and (265) show that d XE (p, p) = 1. Note however that there still exist probability distributions that score even higher in cross-entropy difference than p itself.
Before discussing the XEB Protocol to estimate H X and F X let us illuminate the motivation of F X in the context of certifying sampling tasks. First, the cross-entropy fidelity can be regarded as a linear approximation to the cross-entropy difference and as such as a simpler version of it. The constant shift in the definition of F X is chosen such that F X (p uni , p) = 0 for p uni the uniform density and any probability density p. If p U is assumed to be of Porter-Thomas shape one can calculate that F X (p U , p U ) = 1. This motivates the expectation that performing high in cross-entropy fidelity indicates successfully solving the sampling task for typical random circuits U .
Note that if U is drawn at random from a unitary 2design µ, we can reproduce the Porter-Thomas value of F X (p U , p U ) in expectation over U using Lemma 38: We first calculate Hence, we find that Thus, if U is drawn from a distribution, where we have suitable control over higher moments we can hope to proof concentration around the expectation with high probability for large d. For Haar random unitaries Levy's lemma [132] directly yields a corresponding statement.
We will for the moment leave this as a motivation for the estimating F X and H X and turn to the XEB protocol.

Cross-entropy benchmarking protocol
The crucial structural insight of XEB is that F X and H X are both of the form with f (p) = f F (p) = dp − 1 for the cross-entropy fidelity and f (p) = f H (p) = − ln(p) for the cross-entropy. This observation suggests a simple protocol, akin to importance sampling (Section II I), for empirically estimating both quantities if we have access to samples of one of the distributions.
Protocol 59 (Cross-entropy benchmarking (XEB) [56,126]): Let U be a description of a quantum circuit, |ψ ∈ C d be an initial states and B = { |x } x∈[d] an orthonormal basis of C d . 2. Calculate on a classical computer for each x ∈ O the value of p U (x).

Return the estimator
where f is f F or f H for estimating the cross-entropy fidelity or cross-entropy, respectively.
It is important to keep in mind that Step 2 requires that a classical computer can compute individual probabilities of the circuit. For this reasons XEB cannot be used directly for circuits that are not classically simulable. Instead one can investigate the performance on restricted subclasses of circuits that are still tractable on a powerful classical computer and from these results extrapolate the performance in the regime where one expects quantum supremacy.
If we assume that the target distribution p U is defined using a Haar-randomly drawn unitary U , we can derive a guarantee for Protocol 59 for the linear cross-entropy using the techniques that we presented in this tutorial. Such a guarantee was derived by Hangleiter [133].
Then, Protocol 59 returns with confidence 1−δ an unbiased -accurate estimatorÊ f for F X (p U , p U ).
The proof of the theorem relies on bounding the range of the random variable p U (x) and applying the Hoeffding's inequality (22). We have already seen that for U drawn from the Haar measure, p U is asymptotically of Porter-Thomas shape. In particular, large probabilities in p U are exponentially suppressed. For this reason, we expect that with high probability over the choice of U , p U (x) will be bounded for all x. The following lemma makes this expectation explicit.
Lemma 61 (p U is bounded w.h.p.): Let U ∈ U(d) be a Haar random unitary and { |x } d i=1 be an orthonormal basis basis of C d . Then, the measurement probability mass function p U : [d] → [0, 1], p U (x) = | x| U |0 | 2 , fulfils p U (x) ≤ b for all x with probability of at least 1 − de −db/e . One way to prove the lemma is via the Porter-Thomas density (263). We will follow a more self-contained strategy by calculating the moments of p U . The bound on the moments can then be translated to an exponential tail bound using the following consequence of Markov's inequality.
Theorem 62 (Sub-exponential tail bound, e.g. [7,Proposition 7.11]): Let X be a random variable satisfying for all k ≥ 2. Then, for all t ≥ 2, Proof. Applying Markov's inequality (17) and the theorem's assumption gives for k ≥ 2 Now choosing k = t yields the claim.
Proof of the Lemma 61. We start by calculating the moments of p U (x) as a random variable depending on U ∼ µ U (d) using . First note that by definition p U (x) = | x| U |0 | 2 = | x|ψ | 2 with ψ drawn uniformly from S(C d ). Using the moment operator K (k) µ S(C d ) for |ψ ∼ µ S(C d ) , Lemma 39 and (149), we find that for all x ∈ [d] Due to the inequality n k ≥ (n/k) k , it holds for k ≥ 1 that d+k−1 k ≥ d+k−1 k k ≥ (d/k) k and, thus, By Theorem 62, this moment bound translates into the tail bound for t ≥ 2e/d. Finally, using the union bound we conclude that which completes the proof.
Proof of Theorem 60. Let d = 2 n . The estimatorÊ f is the sum of m i.i.d. random-variables f (p U (x)). By the form (270) it is clear thatÊ f is an unbiased estimator for E f . The estimatorÊ f is the sum of m i.i.d.
Following the same strategy, one can also derive a sampling complexity in O −2 ln 2 (d) ln(1/δ) for estimating the cross-entropy H X (p U , p U ) by Protocol 59 [133]. Since the cross-entropy f (p U ((x)) involves the logarithm, the upper bound on the range of p U of Lemma 61 is no longer sufficient to ensure boundedness of the random variables that enter the estimator. In addition, one needs a lower bound on the range of p U . This is not possible with our bounds on the moments. Instead, one has to explicitly calculate the tail distribution (263).
From an estimate of the cross-entropy one can calculate an estimate of the cross-entropy difference by shifting with H X (p uni , p U ). If the ideal circuit is sufficiently close to a Haar-random unitary, one can analytically calculate H X (p uni , p U ). Alternatively, taking the average of the values calculated in Step 2 provides a numerical estimate for H X (p uni , p U ).
Ultimately, theoretical results for the hardness of sampling tasks require closeness of the probability mass functions in total variation (TV) distance or TV norm |q(x) − p(x)|.
Without additional assumptions, it is not possible to derive a TV norm bound from the cross-entropy. A counter example is discussed in Ref. [128]. Therein, Bouland et al. also hint at a possible bail-out. An insightful presentation of the argument is also given in Ref. [133]. Very close to the desired bound is Pinsker's inequality [129] q − p TV ≤ D KL (q, p) 2 (281) that bounds the TV norm in terms of the Kullback-Leibler divergence D KL (q, p) := H X (q, p) − H(q). The Kullback-Leibler divergence D KL (q, p U ) is unfortunately not of the form (270) and cannot be directly estimated by a XEB protocol. In addition to the estimate of the cross-entropy, the D KL (q, p U ) requires an estimate of the entropy of the implemented mass function q. If we assume that the noise in our implementation only increases the entropy such that H(q) ≥ H(p), we can avoid this obstacle and swap H(q) for H(p), the entropy of the ideal probability mass function. Thus, instead of D KL (q, p) we consider D XE (q, p) = H X (q, p) − H(p). If H(q) ≥ H(p), then D KL (q, p) ≤ D XE (q, p) and a TV norm bound is given in terms of D XE (q, p) via Pinsker's inequality. Similar to the cross-entropy difference (266) D XE (q, p) can be estimated by measuring H XE (q, p) with Protocol 59 and either estimating the shift H(p) analytically or numerically from the computed values p U (x i ) of Step 2. If the ideal probability mass function is of Porter-Thomas shape then one can calculate that D XE (q, p) = 1 − d XE (q, p) and the above discussion can be translated to the cross-entropy difference.

Further reading
The idea of demonstrating quantum supremacy in the task of sampling from certain probability distribution that naturally arise in quantum systems goes back to the proposal of boson sampling in a linear optics [127,134]. Besides random circuit sampling [56] multiple supremacy proposals exist, e.g. for other restricted classes of quantum computations [127] or for processes arising in quantum simulation [135,136]. A series of additional theoretical works collects evidence for the robust hardness of the resulting approximate sampling tasks, e.g. [128,[137][138][139].
It was realized early on that the verification of quantum supremacy is a daunting task [140,141]. One might hope that it is possible to perform a non-interactive black-box verification. Such a verification certifies the sampling task solely from the samples itself. Unfortunately, the same features of a probability distribution that guarantee the classical hardness of the sampling task prohibit the efficient verification from samples on a classical computer [142]. Optimal but non-efficient strategies for general verification problems were studied in Ref. [143].
We focused on cross-entropy estimation for the quantum supremacy verification [56]. Another measure of the form (270) is employed in the heavy outcome generation (HOG) test which uses a heavy-side function as f [144]. A refined notion of the heavy outcome generation test is the binned outcome generation (BOG) test proposed in Ref. [128]. Naturally, approaches for quantum state and processes certification can also be used to verify a sampling task under a varying set of assumptions. Finally, it is an on-going endeavour to develop classical strategies for spoofing verification protocols for quantum supremacy with successes reported e.g. in Refs. [145,146] and for collecting evidence for the hardness of classical spoofing [147].
An extensive, recent overview over verification and certification methods in the context of quantum supremacy can be found in Ref. [133].