\makeatletter
\@ifundefined{HCode}
{
\documentclass[CRPHYS,Unicode,screen,biblatex]{cedram}
\addbibresource{crphys20241068.bib}
\newenvironment{noXML}{}{}
\newenvironment{Table}{\begin{table}}{\end{table}}
\def\thead{\noalign{\relax}\hline}
\def\endthead{\noalign{\relax}\hline}
\def\tabnote#1{\vskip4pt\parbox{.8\linewidth}{#1}}
\def\tsup#1{$^{{#1}}$}
\def\tsub#1{$_{{#1}}$}
\def\sfrac#1#2{{#1}/{#2}}
\def\sofrac#1#2{({#1}/{#2})}
\usepackage{amssymb}
\RequirePackage{etoolbox}
\usepackage{amsmath}
\csdef{Seqnsplit}{\\}
\def\tbody{\noalign{\relax}\hline}
\def\jobid{crphys20241068}
\def\rmpi{\uppi}
\def\ubreak{\break}
%\graphicspath{{/tmp/\jobid_figs/web/}}
\graphicspath{{./figures/}}
\newcounter{runlevel}
\let\MakeYrStrItalic\relax
\def\refinput#1{}
\def\0{\phantom{0}}
\def\sfrac#1#2{{#1}/{#2}}
\def\stfrac#1#2{({#1})/{#2}}
\def\stofrac#1#2{(({#1})/{#2})}
\def\back#1{}
\DOI{10.5802/crphys.229}
\datereceived{2024-12-19}
\dateaccepted{2024-12-20}
\ItHasTeXPublished
\def\ndash{\text{--}}
\def\mathbi#1{\text{\textit{\textbf{#1}}}}
}
{\documentclass[crphys]{article}
\def\CDRdoi{10.5802/crphys.229}
\let\newline\break
\def\sfrac#1#2{{#1}/{#2}}
\def\sofrac#1#2{({#1}/{#2})}
\let\ubreak\relax
\makeatletter
\def\CDRsupplementaryTwotypes#1#2{}
\def\selectlanguage#1{} 
\def\href#1#2{\url[#1]{#2}}
}
\makeatother

\usepackage{upgreek}

\begin{document}

\begin{noXML}

\CDRsetmeta{articletype}{research-article}

\title{Classical and quantum algorithms for many-body problems}

\alttitle{Algorithmes classiques et quantiques pour le probl\`eme \`a N corps}

\author{\firstname{Thomas} \lastname{Ayral}\CDRorcid{0000-0003-0960-4065}}
\address{Eviden Quantum Laboratory, Les Clayes-sous-Bois, France}
\email{thomas.ayral@eviden.com}

\keywords{\kwd{Many-body physics}
\kwd{Quantum computing}
\kwd{Algorithms}
\kwd{Condensed-matter physics}
\kwd{Numerical methods}}

\altkeywords{\kwd{Probl\`eme \`a N corps}
\kwd{Informatique quantique}
\kwd{Algorithmes}
\kwd{Physique de la mati\`ere condens\'ee}
\kwd{M\'ethodes num\'eriques}}

\begin{abstract}
The many-body problem is central to many fields, such as
condensed-matter physics and chemistry, but also to combinatorial
optimization, which is nothing but a classical many-body problem.  This
manuscript, written as part of an Habilitation \`a Diriger des
Recherches, presents the various algorithmic approaches, both classical
and quantum, to solving this problem. We begin by reviewing the main
existing classical and quantum methods, focusing on their successes as
well as their current limitations. In particular, we present the
state-of-the-art in quantum methods, distinguishing between perfect and
noisy processors. We then present recent work on combining classical
and quantum algorithms to overcome the limitations inherent to both
paradigms. In particular, we begin by showing how tensor networks,
often used as reference tools to gauge the interest of quantum methods,
can also be used to initialize a quantum computation, in addition to
simulating it realistically. We then turn to the special case of
fermionic problems. After describing a method based on natural orbitals
for shortening, and thus making more reliable, quantum circuits to
prepare fermionic states, we present a method based on slave spins for
using a platform of Rydberg atoms to simulate lattice models of
fermions. Finally, we show how these same Rydberg platforms can be used
to solve combinatorial problems, and how decoherence influences the
quality of the results obtained. This leads to the definition of a new
utility metric for quantum processors, the Q-score.
\end{abstract}

\begin{altabstract}
Le probl\`eme \`a N corps est un probl\`eme central pour nombre de
domaines comme la physique de la mati\`ere condens\'ee ou la chimie, mais
aussi celui de l'optimisation combinatoire, qui n'est autre qu'un
probl\`eme \`a N corps classique.  Ce manuscrit, r\'edig\'e dans le cadre
d'une Habilitation \`a Diriger des Recherches, pr\'esente les
diff\'erentes approches algorithmiques, qu'elles soient classiques ou
quantiques, pour r\'esoudre ce probl\`eme. Nous commen\`a\c{c}ons par y passer
en revue les principales m\'ethodes classiques et quantiques existantes,
avec un accent mis sur leurs succ\`es ainsi que leurs limitations
actuelles. En particulier, nousc pr\'esentons un \'etat de l'art des
m\'ethodes quantiques, en distinguant processeurs parfaits et
processeurs bruit\'es. Ensuite, nous pr\'esentons des travaux r\'ecents
permettant de combiner algorithmes classiques et quantiques pour
surmonter les limitations inh\'erentes aux deux paradigmes. En
particulier, nous commen\`a\c{c}ons par montrer comment les r\'eseaux de
tenseurs, souvent utilis\'es comme outils de r\'ef\'erence pour jauger de
l'int\'er\^et des m\'ethodes quantiques, peuvent aussi \^etre utilis\'es
pour initialiser un calcul quantique, en plus de le simuler de fa\`a\c{c}on
r\'ealiste. Nous passons ensuite au cas particulier des probl\`emes
fermioniques. Apr\`es avoir d\'ecrit une m\'ethode \`a base d'orbitales
naturelles permettant de raccourcir, et donc de fiabiliser, des
circuits quantiques pour pr\'eparer des \'etats fermioniques, nous
exposons une m\'ethode \`a base de spins esclaves permettant d'utiliser
une plateforme d'atomes de Rydberg pour simuler des mod\`eles de
fermions sur r\'eseau. Nous montrons enfin comment ces m\^emes
plateformes de Rydberg peuvent \^etre utilis\'ees pour r\'esoudre des
probl\`emes combinatoires, et comment la d\'ecoherence influence la
qualit\'e des r\'esultats obtenus. Ceci nous am\`ene \`a la d\'efinition
d'une nouvelle m\'etrique d'utilit\'e des processeurs quantiques, le
Q-score.
\end{altabstract}

\maketitle

{\vspace*{-1pc}}

\end{noXML}

\let\rmmu\upmu

\section{Introduction}\label{sec1}

Quantum many-body problems---for instance, systems of interacting
electrons in solids---are among the most difficult problems to solve on
classical computers. This is directly due to the strongly interacting
nature of the involved particles. These interactions in turn cause a
failure of mean-field approaches, which essentially try to capture
many-body physics within an effective single-body description---in
fact, a classical description as it does not involve a key quantum
property, entanglement. Such mean-field approaches must be replaced, or
rather augmented, with ``strongly-correlated'' methods that capture the
many-body nature of the phenomena at play, this time with entanglement
at the center of the stage. This comes at a cost: whether exact
diagonalization methods (within a carefully identified low-energy or
active subspace), Monte-Carlo methods, or tensor-network methods---all
these methods come with an exponential cost as a function of some
parameter. This has not prevented huge algorithmic progress in the
recent decades, with an increasing understanding of more and more
exotic phases of matter. And yet some regimes still elude our
understanding. Doped, low-temperature regimes of models like the
Hubbard model, which are believed to capture the physics of
high-temperature superconductors, are but one example. Another one is
the dynamics of fermionic or spin models, whose study is nowadays
limited to short times and/or small systems.


These limitations spurred physicists to envision, in the early 1980s, a
new type of processors, called quantum processors. They experienced a
boom in the 2010s thanks to the advent of experimental processors with
tens to hundreds of controlled particles or even quantum bits. Since
early on, quantum processors have been assigned to two different tasks.
One is solving abstract math problems like factoring numbers, with
exponential speedup guarantees in theory. The second one, which will be
the main focus of this work, consists in using quantum processors as
artificial quantum many-body systems that mimic Nature's quantum
many-body systems. In this field, too, exponential speedup guarantees
were established for simulating the dynamics of quantum systems
compared to classical computers. 

In practice, however, quantum computers are fragile systems. A
phenomenon called decoherence introduces steep limitations to the
quality of the outputs of quantum computations. These limitations
accumulate exponentially with the complexity (size, duration) of the
task at hand. A careful balance thus has to be stricken between the
hoped-for acceleration compared to classical algorithms, and the
penalty coming from decoherence---all this while taking into account
the rules of quantum computing, which differ from those of classical
computing.

In this work, we explore various ways to strike this balance, with the
following overarching question as our horizon: \emph{how to best
combine classical and quantum algorithms to compute the properties of
many-body systems?}

To answer this question, we will first identify the strengths and
weaknesses of the main classical algorithms for the many-body problem
(Section~\ref{sec:Many-body-problems:-a}). Our main focus will be
the Hubbard model, a prototypical model for strongly-correlated electron
systems, although we will also extend the discussion to quantum chemistry
problems as well as combinatorial optimization problems, which can
be regarded as classical many-body problems. We will carry out a similar
program for quantum algorithms for the many-body problem, with a focus
on algorithms geared at current processors, namely variational quantum
algorithms (Section~\ref{sec:Quantum-computers:-Promises}). 

After these introductory sections, we will explain how classical and
quantum algorithms can be combined to optimize our usage of quantum
processors (Section~\ref{sec:Paths-towards-useful}). We will first
show that tensor network techniques are not only powerful diagnosis
tools for quantum advantage and promising predictive tools to simulate
noisy quantum computers, but that they can also be used in combination
with quantum algorithms to reach accurate results in possibly hard
parameter regimes. We will then present two hybrid methods for handling
fermionic models with more robustness to decoherence, as well as quantitative
criteria for the success of two important fermionic quantum algorithms.
Finally, we will tackle a subclass of \emph{classical} many-body problems
encountered in the field of combinatorial optimization, and how they
can be tackled with quantum algorithms. In particular, we investigate
the effect of decoherence on the quality of the computed output.

\section{Many-body problems: a few classical approaches, their
strengths and their
\mbox{limitations}}\label{sec:Many-body-problems:-a}

Many-body or strongly-correlated problems are encountered in many
fields of physics including materials science, quantum chemistry and
nuclear physics. Their commonality is the failure of conventional
mean-field theories to describe some of their phases. This failure
requires the development of advanced classical algorithms, all of
which, as we shall see, have to deal with an exponential difficulty
in some guise. 

We will first focus on algorithms that directly tackle the problem
at hand (Section~\ref{subsec:The-Hubbard-model}). The goal is to
shed light on the successes of these algorithms despite this exponential
wall, and identify potential bottlenecks. We will mainly focus on
condensed-matter many-body problems, but will also introduce a few
notions of quantum chemistry, which is an important candidate application
for quantum processors.

We will then introduce classical problem reduction techniques that have
been developed to reduce the number of relevant degrees of freedom,
leading to a smaller, yet still many-body subproblem
(Section~\ref{subsec:Problem-reduction:-embedding}). Such a reduced
model will in general be better suited to a quantum processing task.

\subsection{The Hubbard model and exponential walls}\label{subsec:The-Hubbard-model}

Many-body problems come in many flavors. In materials science and
quantum chemistry, the physics of the electronic degrees of freedom is
well described (after neglecting the ionic motion, an approximation
called the Born--Oppenheimer approximation) by the so-called
electronic-structure Hamiltonian,
{\begin{equation}
H=\sum_{\alpha,\beta=1}^{N_{o}}\sum_{\sigma=\uparrow,
\downarrow}h_{\alpha\beta}c_{\alpha,\sigma}^{\dagger}
c_{\beta,\sigma}+\frac{1}{2}\sum_{\sigma,\sigma'}
\sum_{\alpha\beta\gamma\delta}v_{\alpha\beta\gamma\delta}
c_{\alpha,\sigma}^{\dagger}c_{\beta,\sigma}
c_{\gamma,\sigma'}^{\dagger}c_{\delta,\sigma'},\label{eq:str_el}
\end{equation}}\unskip
which describes the kinetic and potential energy (first term) and
Coulomb interaction (second term) of the electronic system. The kinetic
and interaction tensors $h$ and $v$ are determined by the geometry of
the problem and by the choice of single-particle basis set $\{
\phi_{\alpha}(\boldsymbol{r})\} _{\alpha=1\dots N_{o}}$
(sometimes called atomic orbitals), created/annihilated by the creation
and annihilation operators $c_{\alpha,\sigma}^{\dagger}$ and
$c_{\alpha,\sigma}$ ($\sigma=\uparrow,\downarrow$ denotes the spin of
electrons). Provided the basis set size $N_{o}$ is large enough, the
ground state energy of $H$ will be the exact ground state energy of the
solid. In practice, one must truncate the basis set to a small $N_{o}$.
(The size $N_{o}$ that allows keeping a reasonable ``accuracy'', namely
a small enough error with respect to the exact ground state energy in
the $N_{o}\rightarrow\infty$ limit, depends on the choice of basis).

In condensed-matter physics, and in particular strongly-correlated
materials that typically encompass partially filled $d$ and $f$ shells,
which correspond to localized atomic orbitals, one usually picks a
localized basis set. Localized means that
$\phi_{\alpha}(\boldsymbol{r})$ is centered around an atomic position
$\boldsymbol{R}_{i}$, e.g.\ $\alpha\equiv(\boldsymbol{R}_{i},a)$ with $a$
an atomic orbital like 1s, 2s, 2p\,$\ldots$~Thus, $h_{\alpha\beta}$ can
typically be interpreted as a tunneling matrix element between atomic
sites. It is thus usually limited to neighboring sites in the material.
As for $v_{\alpha\beta\gamma\delta}$, it also has a ``local'' structure
that endows~$H$~with a special structure: it is not \textit{any} large
($2^{N_{o}}\times2^{N_{o}}$) Hermitian matrix. In condensed-matter
systems, this locality is made even more obvious by high-energy
electrons, which tend to screen the interaction between low-energy
electrons. This screening phenomenon often suggests a simplification of
Equation~(\ref{eq:str_el}): one can keep only local interaction matrix
elements, which yields the so-called Hubbard model~\cite{Hubbard1963}.
In its single-band version, it reads\looseness=-1
{\begin{equation}
H=\sum_{i,j=1}^{n}\sum_{\sigma}t_{ij}c_{i,\sigma}^{\dagger}
c_{j,\sigma}+U\sum_{i=1}^{n}n_{i\uparrow}n_{i\downarrow}=
H_{\mathrm{kin}}+H_{\mathrm{int}},\label{eq:Hubbard}
\end{equation}}\unskip
where the creation operators $c_{i,\sigma}^{\dagger}$ create electrons
in orbitals $\{ \phi_{i}(\boldsymbol{r})\} _{i=1\dots n}$
localized around site $\boldsymbol{R}_{i}$ of the lattice. The Hubbard
interaction $U$ is the value of the on-site interaction after screening.
The Hubbard model (and other related ``low-energy'' models) can
be derived, more or less approximately, from the electronic structure
Hamiltonian. In particular, $U$ can be computed in an ab-initio fashion
under some assumptions using advanved techniques like the constrained
random phase approximation (cRPA,~\cite{Aryasetiawan2004}). This
``downfolding'' procedure can also be carried out using tensor network
techniques (which we will discuss in further detail below). For instance,
\cite{Jiang2023} derives a single-band Hubbard model for the cuprate
materials (that are known to exhibit high-temperature superconductivity
since the 1980s~\cite{Bednorz1986}) starting from a three-band
model.

Although simple-looking, this minimal model is difficult to solve in
regimes where both the kinetic energy (described by the tensor
$t_{ij}$) and the interaction $U$ are of comparable magnitude. In such
regimes, the influence of the kinetic term, which favors delocalized
states through tunnelling, competes with that of the interaction term,
which favors localized states that minimize Coulomb repulsion, and form
an exotic phase of matter called a ``Mott insulator''~\cite{Mott1937}.

\subsubsection{The failure of mean-field methods}

Standard mean-field theories like Hartree--Fock (HF) theory or more
advanced ones (like density functional theory (DFT,~\cite{Kohn1999}),
which is not strictly speaking mean-field but reduces the problem
to a single-particle Kohn--Sham problem~\cite{Kohn1965}) take interactions
into account only in an averaged fashion. This simplication allows
for ``efficient'' classical algorithms. Here, and throughout this
report, ``efficient'' will refer to algorithms whose run time and
memory complexity is polynomial in the problem size $n$. For the
case of HF or DFT, the complexity is $O(n^{3})$, dominated by the
diagonalization of a $n\times n$ (or $N_{o}\times N_{o}$ if dealing
with the electronic structure problem Equation~(\ref{eq:str_el})) linear
system of equations. Yet, these methods fail to describe the Mott
insulating phase: they can predict the opening of antiferromagnetic
gaps but not of paramagnetic Mott gaps.

One can (at least partially) account for this failure by looking at
a simple two-site Hubbard problem (\emph{aka} the Hubbard molecule)
with two electrons. In this case, in the limit of large on-site interactions
$U$ (or zero hopping), the ground state is of the form
{\begin{equation}
|\Psi_{0}{\mathop{\rangle}}=\tfrac{1}{\sqrt{2}}\left(|\uparrow,{\downarrow}{\mathop{\rangle}}+|
{\downarrow},\mathop{\uparrow}{\mathop{\rangle}}\right),\label{eq:hub_molecule_gs}
\end{equation}}\unskip
(with
$|\uparrow,{\downarrow}{\mathop{\rangle}}=c_{L\uparrow}^{\dagger}c_{R\downarrow}^{\dagger}|
\boldsymbol{0}{\mathop{\rangle}}$,
meaning one electron on the left site with spin up, one electron on the
right site with spin down), because the on-site Coulomb interaction
penalizes states with double occupancies, like
$|0,\uparrow{\downarrow}{\mathop{\rangle}}$ or $|\uparrow\downarrow,0{\mathop{\rangle}}$.


As we are about to see, mean-field theory will lead to a qualitatively
different answer. Indeed, (unrestricted) Hartree--Fock theory amounts
to replacing $H$ by
{\begin{equation}
H_{\mathrm{m.f}}=\sum_{i,j=1}^{n}\sum_{\sigma}\tilde{t}_{ij}^{\sigma}
c_{i,\sigma}^{\dagger}c_{j,\sigma},\label{eq:H_meanfield}
\end{equation}}\unskip
with a modified hopping matrix
{\begin{equation}
\tilde{t}_{ij}^{\sigma}=t_{ij}+U\langle n_{i\bar{\sigma}}
\rangle\delta_{ij},\label{eq:HF_hopping}
\end{equation}}\unskip
that depends on the densities $\langle n_{i\sigma}\rangle$. A quadratic
Hamiltonian, $H_{\mathrm{m.f}}$ will necessarily lead to a ground state
in the form of a single Slater determinant, namely a state that can be
written as $\prod_{\alpha\sigma}
(\tilde{c}_{\alpha\sigma}^{\dagger})^{n_{\alpha\sigma}}|
\boldsymbol{0}{\mathop{\rangle}}$,
where the creation operators $\tilde{c}_{\alpha\sigma}^{\dagger}$ are
defined as linear combinations of the original operators
$\tilde{c}_{\alpha\sigma}^{\dagger}=\sum_{i}V_{\alpha
i}c_{i\sigma}^{\dagger}$ ($V$ is unitary to preserve the
anticommutation relations of fermionic operators).

It turns out that the two (up and down) electrons will populate the
``bonding'' orbital
$\tilde{c}_{B\sigma}^{\dagger}=(c_{L\sigma}^{\dagger}+
c_{R\sigma}^{\dagger})/\sqrt{2}$,
leading to a state of the form
{\begin{equation}
|\Psi_{\mathrm{HF}}{\mathop{\rangle}}=\tilde{c}_{B,\uparrow}^{\dagger}
\tilde{c}_{B,\downarrow}^{\dagger}|\boldsymbol{0}{\mathop{\rangle}}=
\tfrac{1}{2}\left(|\uparrow,{\downarrow}{\mathop{\rangle}}+|\downarrow,
\uparrow{\mathop{\rangle}}+|0,\uparrow{\downarrow}{\mathop{\rangle}}+|
\uparrow\downarrow,0{\mathop{\rangle}}\right).\label{eq:hub_molecule_hf}
\end{equation}}\unskip

This state has contributions from configurations with double
occupancies (last two terms) that are not present in the true ground
state Equation~ (\ref{eq:hub_molecule_gs}). This is due to the fact
that HF (and DFT) underestimates the influence of local interactions by
handling them in an approximate, mean-field way. From a technical point
of view, it is also worth noting that the ground state
(\ref{eq:hub_molecule_gs}) cannot be written as a Slater determinant:
it is said to be (in a quantum chemistry context) multi-reference, as
opposed to HF states, which are ``single-reference'' states (see
\cite{Evangelista2018} for an interesting discussion of multi-reference
vs single-reference character). In yet another context, one could see
that the state represented by Equation~(\ref{eq:hub_molecule_gs}) is an
entangled state, as opposed to (\ref{eq:hub_molecule_hf}), which can be
factorized. 

Describing multi-reference states (aka strongly-correlated, or states
with ``static'' correlations) requires more advanced methods. 

\subsubsection{Direct diagonalization}\label{subsec:Direct-diagonalization}

A direct diagonalization (also called exact diagonalization, or full
configuration interaction) of the $4^{n}\times4^{n}$ matrix
representation of $H$ in the Fock basis (the basis made up of all
states of the form $|n_{1\uparrow},n_{1\downarrow},
\dots,n_{n\uparrow},n_{n\downarrow}{\mathop{\rangle}}=\prod_{i=1}^{n}
\prod_{\sigma=\uparrow,\downarrow}
(c_{i\sigma}^{\dagger})^{n_{i\sigma}}|\boldsymbol{0}{\mathop{\rangle}}$)
is limited to very small lattice sizes $n$ due to the exponential
(${\propto}(4^{n})^{3}$) cost of diagonalizing this matrix.

More advanced diagonalization approaches are routinely used, like the
Lanczos method~\cite{Koch2011}. It essentially amounts to a smart way
of orthonormalizing the so-called Krylov basis
$\{H|\Phi_{0}{\mathop{\rangle}},H^{2}|\Phi_{0}{\mathop{\rangle}},\dots,H^{k}|\Phi_{0}{\mathop{\rangle}}\}$
(with $|\Phi_{0}{\mathop{\rangle}}$ a suitable starting state, 
e.g.\ the HF state),
which spans a (Krylov) subspace that is a good approximation of the
low-energy eigenspace of the full $H$ (the Krylov basis is inspired
from the power method, where powers of $H$, $H^{k}|\Phi_{0}{\mathop{\rangle}}$,
are used to amplify a given eigenvector, here the lowest). If $H$ is a
sparse matrix (which is the case in Equation~(\ref{eq:Hubbard}), as
there are typically $O(n)$ nonzero terms per line in the
$4^{n}\times4^{n}$\ $H$ matrix), the cost of constructing the $k\times
k$ matrix of $H$ in this basis is $O(4^{n})$ instead of $O(16^{n})$:
one can thus push the limit a bit further (and have some control on the
accuracy by increasing the size $k$ of the Krylov subspace). The use of
symmetries also allows to extend exact diagonalization methods. For
instance,~\cite{Wietek2018} reach systems of 50 spins. Despite these
improvements, one always faces an exponential bottleneck in the number
$n$ of sites.

\subsubsection{Quantum Monte-Carlo
algorithms}\label{subsec:Quantum-Monte-Carlo-algorithms}

Another way to tackle the properties of the Hubbard model is to resort
to so-called quantum Monte-Carlo methods. These methods come in several
flavors, but boil down to rewriting e.g.\ expectation values of observables
in the generic form
{\begin{equation}
\langle O\rangle=\mathrm{Tr}(\rho\hat{O})=
\frac{\sum_{x\in\mathcal{C}}w(x)f(x)}{\sum_{x\in\mathcal{C}}
w(x)}\label{eq:O_sampling}
\end{equation}}\unskip
with $\mathcal{C}$ a very large (exponential or worse) configuration
space, $w(x)$ the weight of a configuration $x$ and $f(x)$ the value of
the configuration. (For instance, the right-hand side 
of~(\ref{eq:O_sampling}) can obtained for a thermal state
$\rho={\mathrm{e}}^{-H/(k_{\mathrm{B}}T)}/Z$ by Taylor-expanding the exponential
in powers of $H_{\mathrm{int}}$). From~(\ref{eq:O_sampling}), one can
define a probability distribution $p(x)=|w(x)|/\sum_{x}|w(x)|$ and
approximate the corresponding expression by sampling:
{\begin{equation}
\langle O\rangle=\frac{\sum_{x\in\mathcal{C}}p(x)
\mathrm{sign}(w(x))f(x)}{\sum_{x\in\mathcal{C}}p(x)
\mathrm{sign}(w(x))}\approx\frac{\langle\mathrm{sign}\cdot 
f\rangle_{\mathrm{MC}}}{\langle\mathrm{sign}
\rangle_{\mathrm{MC}}}.\label{eq:MC_expression}
\end{equation}}\unskip
Here, $\langle
g\rangle_{\mathrm{MC}}=\sofrac{1}{\mathcal{N}}\sum_{i=1}^{\mathcal{N}}g(x_{i})$,
where the $\mathcal{N}$ configurations $x_{i}$ are sampled according to
the $p(x)$ distribution (either through direct sampling or Markov chain
sampling, see~\cite{Krauth1996}). The finite number of samples
$\mathcal{N}$ used in Monte-Carlo leads to a statistical error 
{\begin{equation}
\Delta g\approx\sqrt{\frac{\mathrm{Var}(g)}{\mathcal{N}}}\label{eq:MC_error}
\end{equation}}\unskip
(for uncorrelated samples), which can be made systematically small by
increasing the number $\mathcal{N}$ of samples. However, when the
denominator $\langle\mathrm{sign}\rangle_{\mathrm{MC}}$ in
(\ref{eq:MC_expression}) vanishes, the number of samples must increase
dramatically to ensure a constant error $\Delta O$: this is the sign
problem phenomenon that generically plagues fermionic computations. For
instance, for a thermal state, one obtains $\Delta O/\langle
O\rangle=O({\mathrm{e}}^{N_{\mathrm{el}}/(k_{\mathrm{B}}T)}/\sqrt{\mathcal{N}})$,
with $N_{\mathrm{el}}$ the number of electrons~\cite{Troyer2005}:
keeping a constant error requires a number of samples that is
exponential in the number of electrons and inverse temperature $T$. 

In practice, there are ways to avoid or overcome, at least in some
regimes, this exponential sign problem. Let us give three illustrative
examples.

\paragraph*{Diagrammatic Monte-Carlo}

One example is diagrammatic Monte-Carlo (DiagMC,~\cite{Prokofev2019}),
which consists in computing expectation values
$\mathrm{Tr}(\rho\hat{O})$ of observables $\hat{O}$ (with
e.g.\ thermal states $\rho={\mathrm{e}}^{-H/(k_{\mathrm{B}}T)}/Z$) by expanding the
exponential of $H$ (e.g.\ (\ref{eq:Hubbard})) in powers of the
interaction term $H_{\mathrm{int}}$ of $H$. The corresponding series
$\langle
O\rangle=\mathrm{Tr}(\rho\hat{O})=\sum_{m=0}^{\infty}O_{m}$
contains contributions $O_{m}$ at the $m$th order in the interaction
$U$. These contributions can in turn be written in the form
$O_{m}=\int_{X}\sum_{T\in\mathcal{S}_{m}}D(T,X)$, with $T$ a Feynman
diagram of order $m$ ($\mathcal{S}_{m}$ is the group of all topologies
at order $m$) and $X$ internal variables (e.g.\ space and imaginary
time). DiagMC uses configurations $x\equiv(X,T)$ and weights
$w(x)=D(T,X)$, while the more recent ``connected determinant DiagMC''
(CDet,~\cite{Rossi2017}) uses configurations $x=X$ with weights
$w(x)=\sum_{T\in\mathcal{S}_{m}}D(T,X)$.

Let us focus specifically on the more advanced CDet (the conclusions
are similar, although slightly less powerful, for DiagMC). The
computation of the contributions at $m$th order turn out to have a
$O(3^{m})$ complexity. While this method thus appears to have
exponential complexity in the expansion order (the run time,
$t={\mathrm{e}}^{\alpha m}$, with $\alpha=\log(3)$, is exponential for a given
order), the error decreases exponentially with the expansion order $m$:
once one is beyond the convergence radius of the series,
$m\sim\log(1/\epsilon)$ is enough to achieve a truncation error
$\epsilon=|\langle O\rangle-\langle O\rangle_{m}|$, with
$\langle O\rangle_{m}$ the truncated series. Therefore, the overall run
time for a fixed error scales polynomially: we have
$t=1/\epsilon^{\alpha}$, a polynomially scaling run time as a function
of $1/\epsilon$~\cite{Rossi2017a}. (Of course, to compute beyond the
convergence radius $U>U_{\mathrm{cr}}$ of the series, one recovers an
exponential complexity: this approach is perturbative in nature).

\paragraph*{Auxiliary-field quantum Monte-Carlo}

An alternative to DiagMC or CDet---both unbiased methods---is auxiliary
field quantum Monte-Carlo (AFQMC,~\cite{Pavarini2013}). In this method
(like in diffusion Monte-Carlo (DMC) or projector 
techniques~\cite{Grimm1971}), an imaginary-time evolution ${\mathrm{e}}^{-\tau H}$
starting from a initial wavefunction $|\Phi_{0}{\mathop{\rangle}}$ is performed:
the time-evolved state 
{\[
|\Psi(\tau){\mathop{\rangle}}={\mathrm{e}}^{-\tau H}|\Phi_{0}{\mathop{\rangle}}/
\Vert {\mathrm{e}}^{-\tau H}|\Phi_{0}{\mathop{\rangle}}\Vert 
\]}\unskip
converges to the ground state in the large imaginary time limit. This
formal algorithm requires the costly (thus impossible in practice)
exponentiation of the exponentially large matrix $H$. In practice, this
evolution is performed by splitting ${\mathrm{e}}^{-\tau H}$ in $N_{t}$
``Trotter'' slices
{\begin{equation}
{\mathrm{e}}^{-H\tau}=\prod_{k=1}^{N_{t}}\left({\mathrm{e}}^{-H_{\mathrm{kin}}
\Delta\tau}\prod_{i=1}^{n}{\mathrm{e}}^{-U\Delta\tau n_{i\uparrow}
n_{i\downarrow}}\right)+O\left(\frac{\tau^{2}}{N_{t}}\right),
\label{eq:suzuki_trotter_imaginary}
\end{equation}}\unskip
 and by expanding each Trotter slice using a Hubbard--Stratonovich
decoupling of the interaction term, 
{\begin{equation}
{\mathrm{e}}^{-U\Delta\tau n_{i\uparrow}n_{i\downarrow}}=
\sum_{s_{i}=\pm1}{\mathrm{e}}^{-f(U,\Delta\tau,s_{i})H_{\mathrm{f}}},\label{eq:HS}
\end{equation}}\unskip
with $H_{\mathrm{f}}$ a quadratic Hamiltonian and $\Delta\tau=\tau/N_{t}$.
The energy of the resulting wavefunction can thus be written in the
requisite form (\ref{eq:O_sampling}):
{\[
E(\tau)=\frac{\langle\Phi_{0}|H{\mathrm{e}}^{-2\tau H}|\Phi_{0}\rangle}
{\langle\Phi_{0}|{\mathrm{e}}^{-2\tau H}|\Phi_{0}\rangle}=
\frac{\sum_{x}\langle\Phi_{0}|H|\Phi(x)\rangle}
{\sum_{x}\langle\Phi_{0}|\Phi(x)\rangle},
\]}\unskip
with
$x=\boldsymbol{s}_{1},\dots,\boldsymbol{s}_{n},
\boldsymbol{s}_{i}=(s_{i}^{1},\dots,s_{i}^{N_{t}})$,
and the wavefunction
{\[
|\Phi(x){\mathop{\rangle}}=\prod_{k=1}^{N_{t}}\left({\mathrm{e}}^{-H_{\mathrm{kin}}
\Delta t}\prod_{i=1}^{n}{\mathrm{e}}^{-f(U,\Delta\tau,s_{i}^{k})
H_{\mathrm{f}}}\right)|\Phi_{0}{\mathop{\rangle}}.
\]}\unskip
The exponential sums appearing in the numerator and denominator are
sampled using a Markov chain Monte-Carlo algorithm. For a given
configuration $x$, $|\Phi(x){\mathop{\rangle}}$ is a Slater determinant provided
the starting point $|\Phi_{0}{\mathop{\rangle}}$ is also one, since applying a
quadratic evolution (via Hamiltonians $H_{\mathrm{kin}}$ and
$H_{\mathrm{f}}$) to a Slater determinant leads to a Slater
determinant. Being a Slater determinant, it can be stored and computed
efficiently (on a classical computer). Sampling this multidimensional
sum a priori allows to perform the evolution to the ground state and
thus compute ground state expectation values.

In practice, however, the sign of $\langle\Phi_{0}|\Phi(x){\mathop{\rangle}}$
changes, which leads to a sign problem. To avoid it, importance
sampling strategies are usually implemented in the form of a ``trial
wavefunction'' that is used to avoid sign changes (this variant is
dubbed constrained-path Monte-Carlo, CPMC,~\cite{Zhang1997}). This
removes the sign problem but introduces a bias in the method (contrary
e.g.\ to the diagrammatic Monte-Carlo methods introduced above, which are
unbiased). This bias is all the larger as the trial wavefunction is far
away from the true ground state. Constructing a good enough trial
wavefunction is thus essential\,$\ldots$~but brings additional classical cost
that needs to be taken into account in the computational burden of the
method.

Note that there also exists DMC-like methods that do not require a
trial wavefunction and therefore do not suffer from a potential bias:
this is the case, for instance, of the FCIQMC method~\cite{Booth2009}.
There, the limiting factor is the number of MC ``walkers'' (which are
also used in CPMC to represent the wavefunction in a stochastic
manner), which is supposed to saturate when the full configuration
interaction (FCI) limit is reached: the number of walkers at saturation
can be very large. 

\paragraph*{Variational Monte-Carlo}

A third and last widely used example is variational Monte-Carlo (VMC).
It is based on a parametric representation
{\begin{equation}
|\Psi(\boldsymbol{\theta}){\mathop{\rangle}}=\sum_{x=0}^{2^{n}-1}
\psi_{\boldsymbol{\theta}}(x)|x{\mathop{\rangle}}\label{eq:vmc}
\end{equation}}\unskip
of the wavefunction, where $\boldsymbol{\theta}$ is a list of parameters
that completely characterize the state, and $\{ |x{\mathop{\rangle}}\} $
is the Fock basis. The Rayleigh--Ritz variational principle guarantees
that
{\[
E(\boldsymbol{\theta})=\frac{\langle\Psi(\boldsymbol{\theta})|H|\Psi
(\boldsymbol{\theta})\rangle}{\langle\Psi(\boldsymbol{\theta})|
\Psi(\boldsymbol{\theta})\rangle}\geq E_{0}
\]}\unskip
(with $E_{0}$ the exact ground state energy of $H$). The goal of VMC is
to find the set of parameters $\boldsymbol{\theta}^{\star}$ that
minimizes $E(\boldsymbol{\theta})$. For this, one needs to compute
$E(\boldsymbol{\theta})$. This is achieved by inserting
$I=\sum_{x}|x{\mathop{\rangle}}\langle x|$ in $E(\boldsymbol{\theta})$, which
yields: 
{\[
E(\boldsymbol{\theta})=\sum_{x}\frac{\left|\langle
\Psi(\boldsymbol{\theta})|x\rangle\right|^{2}}
{\langle\Psi(\boldsymbol{\theta})|\Psi(\boldsymbol{\theta})
\rangle}\frac{\langle x|H|\Psi(\boldsymbol{\theta})\rangle}
{\langle x|\Psi(\boldsymbol{\theta})\rangle}=\sum_{x}
p_{\boldsymbol{\theta}}(x)E_{\mathrm{loc}}(x)
\]}\unskip
with the probability distribution
$p_{\boldsymbol{\theta}}(x)=|\psi_{\boldsymbol{\theta}}(x)|^{2}/
\langle\Psi(\boldsymbol{\theta})|\Psi(\boldsymbol{\theta})\rangle$
and the local energy $E_{\mathrm{loc}}(x)=\sfrac{\langle
x|H|\psi_{\boldsymbol{\theta}}\rangle}{\psi_{\boldsymbol{\theta}}(x)}$.
This exponential sum is sampled as
{\[
\sum_{x}p_{\boldsymbol{\theta}}(x)E_{\mathrm{loc}}
(x)\approx\frac{1}{\mathcal{N}}\sum_{\substack{i=1}
}^{\mathcal{N}}E_{\mathrm{loc}}(x_{i})
\]}\unskip
with samples $x_{i}$ drawn from $p_{\mathbf{\boldsymbol{\theta}}}(x)$.
VMC relies on an efficient computation of $E_{\mathrm{loc}}(x)$ for a
given $x$, and the possibility to sample from
$p_{\boldsymbol{\theta}}(x)$. The method has gained particular traction
in the recent years with the emergence of deep neural networks (dubbed
quantum neural networks,~\cite{Carleo2016}) to represent
$\psi_{\boldsymbol{\theta}}(x)$. Some allow direct sampling, some
require the use of Markov chain Monte-Carlo.

As always in MC methods, the finite number $\mathcal{N}$ of samples
induces statistical noise (see Equation~(\ref{eq:MC_error})). One nice
feature of VMC, though, is the behavior of the variance when
$\psi_{\boldsymbol{\theta}}(x)$ is close to the sought-after ground
eigenstate: then, $E_{\mathrm{loc}}(x)=\sfrac{\langle
x|H|\psi_{\boldsymbol{\theta}}\rangle}{\psi_{\boldsymbol{\theta}}(x)}\approx E_{0}$,
namely the energy is independent of $x$, therefore the variance
vanishes. This means that as one gets closer to the solution, one needs
fewer and fewer samples to reach the same statistical accuracy. 

The main limitation of VMC is the representational power of
$\psi_{\boldsymbol{\theta}}(x)$ and the optimization of its parameters
to minimize its energy. We refer to reader to~\cite{Mazzola2023} 
and~\cite{Jiang2024a} for more in-depth discussions of Monte-Carlo
algorithms (and their relation with quantum algorithms).

The pros and cons of the three MC methods we just presented may be
summarized as follows:
\begin{itemize}
\item DiagMC and CDet are perturbative in nature: they will perform well
in weak correlation regimes, but fail in stronger correlation regimes;
\item AFQMC is part of the family of projector MC methods: it will perform
well if a good trial state is provided. Otherwise, it will be plagued
by a large bias (due to a wrong trial state) or a large variance (due
to the sign problem);
\item VMC is unbiased (to the extent that the variational ansatz is expressive
enough) and not perturbative, but it relies on the optimization of
a nonconvex function, a problem that is generically hard to solve
(in fact, it is ``NP hard'', a terminology we will discuss later,
in Section~\ref{subsec:complexity-theory}).
\end{itemize}

\subsubsection{Tensor network methods}\label{subsec:Tensor-network-methods}

A major alternative to Monte-Carlo methods is provided by so-called
tensor network methods. They consist in representing the state of the
system as a graph whose vertices are tensors, and whose edges are
chosen to reflect e.g.\ the lattice topology (but not necessarily). The
combined memory footprint of these tensors (of the order of
$n\chi^{d}$, with $d$ the number of neighbors of each vertex, and
$\chi$ the dimension of the tensor legs) is meant to be much smaller
than the exponential footprint of storing the wavefunction
$|\Psi{\mathop{\rangle}}$ ($4^{n}$) or thermal state $\rho$ ($4^{n}\times4^{n}$)
of the Hubbard model. This exponential in the Hilbert space size is
generically traded for an exponential in the entanglement of the state
to be represented, as we shall see below.

The simplest example of such tensor networks are matrix product states
(MPS,~\cite{Schollwoeck2011}, Figure~\ref{fig:tensor-networks}(a)),
whose graph is simply a line graph, namely each tensor has two
``neighbors'' on its left and on its right (except for the first and
last tensors). The main parameter of a MPS (and, for that matter, of
other tensor networks) is the so-called bond dimension $\chi$. $\chi=1$
corresponds to a product (factorized) state, i.e.\ without entanglement.
If the so-called von Neumann entanglement entropy of the state, which
measures the level of entanglement of the state, is called $S$, then
one must choose 
{\begin{equation}
\chi\geq2^{S}\label{eq:bond_dim_vs_entropy}
\end{equation}}\unskip
to represent this state with a MPS. Thus, a MPS with a small bond
dimension, and thus a small storage cost (${\propto}n\chi^{2}$), can
potentially represent states with low entanglement. Algorithms to
manipulate these states then generally scale as
$O(\mathrm{poly(}\chi))$ (usually $O(\chi^{3})$, which corresponds to
the cost of compressing the state through a singular value
decomposition (SVD), or more advanced algorithms). This property is
very useful for instance for studying ground states of Hamiltonians
with local terms, which typically display a small entanglement entropy
($S$ is even independent of system size for gapped systems in 1D,
\cite{Hastings2007}).

\begin{figure}
\includegraphics{fig01}
\caption{\label{fig:tensor-networks}Two examples of tensor networks. (a) Matrix product state
(MPS): linear graph. (b) Projected entangled pair state (PEPS): graph
with a $\sqrt{n}\times\sqrt{n}$ square lattice
topology.}
\end{figure}

Most importantly, tensor networks illustrate the fact that entanglement
is a key quantity to consider when investigating the ``hardness'' of a
computational problem for a classical algorithm. Currently, MPSs are
widely used for studying 1D systems at equilibrium. Difficulties arise
for higher dimensions or for out-of-equilibrium problems.

For higher dimensions, indeed, even when an ``area law'' is fulfilled,
the entropy nevertheless still increases with dimension of the system:
in 2D, the bond dimension is exponential in the linear dimension $L$ of
the $L\times L$ lattice. Tensor networks based on higher dimensional
graphs than line (1D) graphs have been developed to try and overcome
this challenge. For instance, projected entangled pair states (PEPS,
\cite{Verstraete2004a}, Figure~\ref{fig:tensor-networks}(b)) are based
on a 2D lattice graph. However, contrary to MPS, observable expectation
values cannot be computed efficiently (polynomially in $\chi$) for
PEPS. In fact, ``contracting'' (i.e.\ summing over all internal indices
of) a generic tensor network can be shown to scale (both in CPU and
memory) as ${\mathrm{e}}^{T(G)}$, where $T(G)$ is the so-called treewidth of the
TN's graph $G$, namely the minimum width over the tree decompositions
of $G$ (\hspace{1sp}\cite{Markov2005}; graphs that are trees have treewidth 1, a
$M\times N$ square lattice has treewidth $\min(M,N)$). The recently
introduced isometric tensor networks (IsoTNS,~\cite{Zaletel2020}) solve
this issue, but with a quite prohibitive cost of $O(\chi^{7})$.
Recently, approximate contraction methods of PEPS have been introduced
that use a belief propagation algorithm~\cite{Alkabetz2021}. This type
of approximate contraction gives accurate results for PEPS (or more
generally tensor networks) defined on sparse graphs~\cite{Sahu2022},
namely graphs with a low connectivity (and is exact for one-dimensional
graphs, i.e.\ MPS).

As for out-of-equilibrium physics, the challenge for MPS also comes
from large entropies: the entropy tends to increase with time (linearly
after a sudden change or quench in the Hamiltonian parameters,
\cite{Calabrese2005}), causing the requisite bond dimension to increase
accordingly, following Equation~(\ref{eq:bond_dim_vs_entropy}). This
difficulty is compounded by going to two or more dimensions for the
reasons we just explained.

\subsubsection{Towards a quantitative handshake of classical methods
for the two-dimensional Hubbard model?}

The Hubbard model in two dimensions has been tackled through a variety
of methods including those we described above. Despite the difficulties
associated with the strongly-correlated character of the model,
agreement between various methods has been reached in some parameter
regimes~\cite{Leblanc2015}.

One important frontier is that of the doped, low-temperature phase of
the model with strong interactions ($U/t\approx8$). Recent comparative
DMRG (MPS) and AFQMC studies of the two-dimensional Hubbard model with
nearest-neighbor $t$ and next-nearest-neighbor $t'$ hopping on a square
lattice have led to consistent pictures of the phenomenology of high-Tc
cuprates in the zero-temperature regime~\cite{Xu2023}. In particular,
the coexistence of partially filled stripe order with superconductivity
in the ground states found by both methods on the hole-doped side has
been observed in the high bond dimension regime (for MPS) and
large-size limit (for AFQMC), suggesting that the Hubbard model with
$t'$ contains the main ingredients of high-Tc superconductivity.
However, these studies were limited to zero temperatures (ground
states), so that no direct computation of the superconducting
temperature was possible. No dynamical properties were studied either.

\subsubsection{A word on quantum chemistry methods}\label{subsec:qchem}

We have so far reviewed a few major methods used to tackle the Hubbard
model, a model that captures the physics of certain solids. For
chemical compounds, the simplification from the electronic structure
Hamiltonian (Equation~(\ref{eq:str_el})) to the Hubbard model is not
possible. In particular, the structure of the Coulomb interaction is in
general more complicated. This makes Monte-Carlo methods based on a
expansion in powers of the interaction term (like DiagMC above) much
more costly, if not irrelevant.

We review below a few widespread algorithms designed for quantum
chemistry problems, with the goal of identifying computational
bottlenecks that could be overcome with quantum computers.

\paragraph*{Configuration interaction (CI,~\cite{Lowdin1955})}
CI is a variational method like VMC, where the variational state is a linear
combination of a few states: 
{\begin{equation}
|\Psi(\boldsymbol{\theta}){\mathop{\rangle}}=\theta_{0}|\Phi_{\mathrm{HF}}
{\mathop{\rangle}}+\sum_{a,i}\theta_{i}^{a}c_{a}^{\dagger}c_{i}
|\Phi_{\mathrm{HF}}{\mathop{\rangle}}+\sum_{ab,ij}\theta_{ij}^{ab}c_{a}^{\dagger}c_{b}^{\dagger}
c_{i}c_{j}|\Phi_{\mathrm{HF}}{\mathop{\rangle}}+\cdots\label{eq:ci_ansatz}
\end{equation}}\unskip
where $|\Phi_{\mathrm{HF}}{\mathop{\rangle}}$ is the Hartree--Fock state,
$i,j,\dots$ refer to occupied molecular orbitals and $a,b\dots$ refer
to unoccupied ones (we henceforth include the spin $\sigma$ index in
the orbital index). One usually truncates the expression to a finite
set of excitations, for instance single (S) excitations
$c_{a}^{\dagger}c_{i}|\Phi_{\mathrm{HF}}{\mathop{\rangle}}$ and double (D)
excitations
$c_{a}^{\dagger}c_{b}^{\dagger}c_{i}c_{j}|\Phi_{\mathrm{HF}}{\mathop{\rangle}}$.
The method is efficiently tractable on classical computers: the
computation of the variational energy requires the evaluation of terms
like
$\langle\Phi_{\mathrm{HF}}|c_{j'}^{\dagger}c_{i'}^{\dagger}
c_{b'}c_{a'}Hc_{a}^{\dagger}c_{b}^{\dagger}c_{i}c_{j}|
\Phi_{\mathrm{HF}}\rangle$,
which is a polynomial computation thanks to
$|\Phi_{\mathrm{HF}}{\mathop{\rangle}}$ being a single Slater determinant
(allowing the use of Wick's theorem). However, to make it accurate in
strongly-correlated cases, one needs a very large number of excitations
(which in practice is too large compared to the required accuracy),
unless one uses several Slater determinants in the CI expansion
(instead of only the HF state). Even in this ``multi-reference'' case,
(truncated) CI will suffer from a lack of size extensivity, namely the
energy of two uncoupled fragments will not be the sum of the energies
of each fragment. (The untruncated, and thus exponentially costly,
version of CI, called full configuration interaction (FCI), takes into
account all Slater determinants: it is of course exact and as such
extensive, but untractable beyond small sizes because exponential in
$N_{o}$.)

\paragraph*{Coupled cluster (CC,~\cite{Coester1960})}

CC is a method that allows to use the same number of parameters as CI,
but includes an infinite number of fluctuations around the HF state.
This is achieved by putting the parameters into an exponential
{\begin{equation}
|\Psi(\boldsymbol{\theta}){\mathop{\rangle}}={\mathrm{e}}^{\sum_{a,i}
\theta_{i}^{a}c_{a}^{\dagger}c_{i}+\sum_{ab,ij}
\theta_{ij}^{ab}c_{a}^{\dagger}c_{b}^{\dagger}c_{i}
c_{j}+\cdots}|\Phi_{\mathrm{HF}}{\mathop{\rangle}}\equiv 
{\mathrm{e}}^{T(\boldsymbol{\theta})}|\Phi_{\mathrm{HF}}{\mathop{\rangle}}.\label{eq:CC_wf}
\end{equation}}\unskip
Usually the cluster operator $T(\boldsymbol{\theta})$ is truncated
to single (S) and double (D) excitations, in which case we still have
$O(N_{o}^{4})$ parameters, but by expanding the exponential we see
that an infinite number of fluctuations (made up of ``clusters''
of excitations of finite order) is taken into account. A major advantage
of the CC method is that is it size extensive.

Questions arise when implementing the method in practice. In principle,
one would like to keep the variational nature of CI (this is formally
advantageous as one knows that the obtained energy is an upper bound
to the true energy). This entails minimizing
{\begin{equation}
E(\boldsymbol{\theta})=\frac{\langle\Phi_{\mathrm{HF}}|
{\mathrm{e}}^{T^{\dagger}}H{\mathrm{e}}^{T}|\Phi_{\mathrm{HF}}\rangle}
{\langle\Phi_{\mathrm{HF}}|{\mathrm{e}}^{T^{\dagger}}{\mathrm{e}}^{T}|
\Phi_{\mathrm{HF}}\rangle}.\label{eq:variational_CC}
\end{equation}}\unskip

However, computing $E(\boldsymbol{\theta})$ is not efficient: the
number of terms that arise from expanding the exponentials is factorial
in the number of orbitals $N_{o}$ (when truncating $T$ so a finite
excitation order,~\cite{Robinson2011,Harsha2018}), and truncating to a
smaller arbitrary number of terms would suppress size
extensivity~\cite{Schaeffer1977}. This is why, in most implementations,
one does not use this variational formulation of coupled cluster
(dubbed VCC).

Instead, one uses the fact that at convergence, one should have
$H{\mathrm{e}}^{T}|\Phi_{\mathrm{HF}}{\mathop{\rangle}}=E_{0}{\mathrm{e}}^{T}|\Phi_{\mathrm{HF}}{\mathop{\rangle}}$.
Multiplying by ${\mathrm{e}}^{-T}$, this yields
${\mathrm{e}}^{-T}H{\mathrm{e}}^{T}|\Phi_{\mathrm{HF}}{\mathop{\rangle}}=E_{0}|\Phi_{\mathrm{HF}}{\mathop{\rangle}}$.
Defining states $|\mu{\mathop{\rangle}}=\tau|\Phi_{\mathrm{HF}}{\mathop{\rangle}}$ for
$\tau\in \{
c_{a}^{\dagger}c_{i},c_{a}^{\dagger}c_{b}^{\dagger}c_{i}c_{j},\dots\}
$, and noticing that these states are orthogonal to
$|\Phi_{\mathrm{HF}}{\mathop{\rangle}}$, we can project:
{\begin{equation}
\langle\mu|{\mathrm{e}}^{-T}H{\mathrm{e}}^{T}|\Phi_{\mathrm{HF}}\rangle=0\label{eq:tCC_equation}
\end{equation}}\unskip
for all $\mu$. We are thus facing a root finding problem.

It turns out that the Baker--Campbell--Hausdorff expansion of operator
${\mathrm{e}}^{-T}H{\mathrm{e}}^{T}$ (which is not Hermitian)
{\begin{equation}
{\mathrm{e}}^{-T}H{\mathrm{e}}^{T}=H+[H,T]+\tfrac{1}{2!}
[[H,T],T]+\tfrac{1}{3!}
[[[H,T],T],T]+\cdots\label{eq:BCH}
\end{equation}}\unskip
terminates at a finite, polynomial order in $T$ (when $T$ itself is
truncated to a finite excitation order) because the creation
(annihilation) operators act only on unoccupied (occupied) orbitals.
Therefore, CC, solved in this fashion (sometimes called projective CC,
PCC, or truncated CC, TCC), remains a polynomial method (the cost is
e.g.\ $O(N_{o}^{6})$ for single and double excitations, CCSD). 

In its ``SD with perturbative triples'' form (CCSD(T)), CC is considered
to be the golden standard for weakly correlated molecules.

Two main limitations are the fact that it is a single-reference method
(it builds upon a single Slater determinant,
$|\Phi_{\mathrm{HF}}{\mathop{\rangle}}$), and it is not, in its projective form,
variational (essentially because ${\mathrm{e}}^{-T}H{\mathrm{e}}^{T}$ is not hermitian). The
first limitation entails inaccurate results e.g.\ in the dissociation
limit of molecules. Whether the second limitation---non
variationality---is an actual problem is a debated topic (see 
e.g.~\cite{Kutzelnigg1991}), namely it is not sure that guaranteeing that
$E(\boldsymbol{\theta})$ is an upper bound to $E_{0}$ is very important
in practice. Yet, VCC does have the nice property to ensure that a
generalized Hellmann--Feynman theorem holds~\cite{Szalay1995} (the
Hellmann--Feynman theorem states that if $r$ is an external parameter
(e.g.\ the bond distance), then
$\partial_{r}E(\boldsymbol{\theta})=
\langle\Psi(\boldsymbol{\theta})|\partial_{r}H|
\Psi(\boldsymbol{\theta})\rangle$,
making it easy to compute forces in molecules without redoing the
variational computation for several $r$'s). Note that PCC also
satisfies the Hellmann--Feynman theorem~\cite{Bishop2008}.

\paragraph*{Unitary coupled cluster (UCC,~\cite{Schaeffer1977})}

UCC proposes an ansatz of the form:
{\begin{equation}
|\Psi(\boldsymbol{\theta}){\mathop{\rangle}}={\mathrm{e}}^{T(\boldsymbol{\theta})-
T^{\dagger}(\boldsymbol{\theta})}|\Phi_{\mathrm{HF}}
{\mathop{\rangle}}.\label{eq:ucc_ansatz}
\end{equation}}\unskip
This expression, as VCC, has only a polynomial number of parameters,
and preserves size extensivity, and can be implemented in a variational
way. In practice, as in VCC, computing the corresponding energy is
combinatorially difficult in the absence of truncation of the BCH
expansion. The relative qualities of VCC and UCC are still a subject of
discussion~\cite{Harsha2018}. However, one interesting feature of UCC
is the unitary nature of
${\mathrm{e}}^{T(\boldsymbol{\theta})-T^{\dagger}(\boldsymbol{\theta})}$: as we
shall see later, it will make UCC easier to implement on a quantum
processor.

All three methods, when applied to the HF state, are limited to weak
correlations (around the reference). They typically fail in
strongly-correlated regimes. Other methods, referred to as
multireference methods, are necessary to handle the strongly-correlated
regime properly (we can mention the complete active space
self-consistent field method (CASSCF,~\cite{Roos1980,Roos1987}),
configuration interaction using a perturbative selection done
iteratively (CIPSI,~\cite{Huron1973}), multi-reference CC). We refer
the reader to~\cite{Chan2024} for a more comprehensive analysis of
quantum chemical methods.

\subsection{Problem reduction: embedding
approaches}\label{subsec:Problem-reduction:-embedding}

\subsubsection{Dynamical mean-field theory\,$\ldots$}

In the previous section, we summarized the main methods used for the
numerical treatment of strongly-correlated problems, with a focus on
the Hubbard model. All these methods have limitations in some regimes.
One key aspect of the Hubbard model (and of electronic structure
problems in general, see~\cite{Chan2024} for a discussion) that we
overlooked so far is the \emph{locality} of the physics of interacting
electrons (which is also the reason for taking the Hubbard model as a
starting point).

Advanced classical methods were developed starting in the early 1990s
to take advantage of this locality. In particular, the local nature of
the Mott phenomenon at play in the Hubbard model
(Equation~\ref{eq:Hubbard}) is central: at large interaction strengths,
local interaction phenomena cause the charge degree of freedom to
freeze and electrons to localize (this is the essence of Kondo
physics). Based on this physical insight, and on more formal
considerations about the infinite-dimensional limit of the Hubbard
model, a theory called dynamical mean field theory (DMFT,
\cite{Georges1996}) was developed to self-consistently map the Hubbard
model onto a local effective model describing a single or a few
interacting sites (usually called ``impurities'') embedded in a
noninteracting bath mimicking the dynamical mean field surrounding them
(Figure~\ref{fig:embedding}). 

\begin{figure}
\includegraphics{fig02}
\caption{\label{fig:embedding}Embedding methods (a) and active space selection (b).
Adapted from~\cite{Ayral2023}. With kind permission of The European Physical Journal (EPJ).}
\end{figure}

In analogy to plain-vanilla Weiss mean-field theories
\cite{Georges2008}---where a single spin $S$ feels the influence of a
mean field $h$ created by its neighbors, which itself depends on the
mean \mbox{magnetization} $\langle S\rangle$ of the spin---DMFT describes a
single (for simplicity) impurity embedded in a (dynamical) mean field
$\Delta(t)$, called the hybridization function. This mean field itself
depends on the impurity's Green's function, defined as
{\begin{equation}
G^{\mathrm{R}}(t)=-\mathrm{i}\theta(t)
\langle\{ c(t),c^{\dagger}(0)\}\rangle.\label{eq:G_imp}
\end{equation}}\unskip
The dynamical character of the mean field $\Delta(t)$ is crucial
in recovering Mott physics: it describes how surrounding electrons
can hop on and off the impurity site, where the Hubbard interaction
$U$ can cause them to localize. This contrasts with the Hartree--Fock
mean-field theory described in a previous section, where the effect of
other electrons on the effective model is described by a static density
$\langle c_{i\bar{\sigma}}^{\dagger}c_{i\bar{\sigma}}\rangle$
(Equation~(\ref{eq:HF_hopping})). Essentially, DMFT replaces this
static mean field with a dynamical quantity like $\langle
c_{i\sigma}(t)c_{i\sigma}^{\dagger}(0)\rangle$.

Through this mapping, the Hubbard model gets mapped onto a so-called
Anderson impurity model (AIM), defined by the Hamiltonian
{\begin{equation}
H=Un_{\uparrow}n_{\downarrow}-\mu
(n_{\uparrow}+n_{\downarrow})+\sum_{p\sigma}
V_{p\sigma}(c_{\sigma}^{\dagger}a_{p\sigma}+\mathrm{h.c})+
\sum_{p\sigma}\epsilon_{p\sigma}a_{p\sigma}^{\dagger}
a_{p\sigma},\label{eq:AIM}
\end{equation}}\unskip
where the impurity site ($c_{\sigma}^{\dagger}$, $c_{\sigma}$) is
coupled to noninteracting bath sites
$a_{p\sigma}^{\dagger},a_{p\sigma}$ (represented as the green dots in
Figure~\ref{fig:embedding}). The (a priori infinitely many) coupling
parameters $V_{p\sigma}$ and bath levels $\epsilon_{p\sigma}$ are
chosen so that the Fourier transform of $\Delta(t)$ verifies:
{\begin{equation}
\Delta_{\sigma}(\omega)=\sum_{p=1}^{\infty}\frac{V_{p\sigma}}
{\omega-\epsilon_{p\sigma}+\mathrm{i}\eta}.\label{eq:hybridization}
\end{equation}}\unskip

In practice, this Anderson impurity model is simpler to solve than the
Hubbard model, because only the impurity site (or the impurity sites in
``cluster'' DMFT) is interacting. A whole spectrum of ``impurity
solvers'' has been developed to compute the impurity Green's function
required by the self-consistent DMFT equations. Most of these solvers
rely on techniques similar to those used for tackling the Hubbard model
(see Section~\ref{subsec:The-Hubbard-model}). However, because the
interacting problem is of much lower dimension (even zero dimensional
in the single-impurity case) compared to the Hubbard model, the
exponential hurdles cause much less harm, and allow for a very accurate
computation of $G(\omega)$ in many physical regimes. In particular,
they allow to solve DMFT with a few impurities and observe a Mott
transition, a major achievement compared to usual mean-field theories.

There are, however, three main limiting factors: imaginary time, the
number of impurities, and out-of-equilibrium regimes. First, most
advanced quantum Monte-Carlo impurity solvers~\cite{Gull2011}
work on the imaginary-time axis, which requires mathematically ill-defined
analytical continuation techniques to obtain real-time quantities.
Second, increasing the number of impurities (or orbitals, when working
with more realistic models than the Hubbard model) is crucial in regimes
where long-range fluctuations play an important role (as is suspected
in some high-temperature superconductors), and serves as a control
parameter of the method. But larger sizes revive the aforementioned
exponential issues. Third, going out of equilibrium comes with the
same problems as for the Hubbard model, albeit at a smaller scale
to thanks to the reduced dimension of the impurity model~\cite{Aoki2014}. 

\subsubsection{$\ldots$~and its simplifications}

To circumvent these limitations or to tackle larger (usually
multiorbital) problems, simplified methods have been developed at the
price of less accuracy or less information. Similarly to DMFT, these
alternative ``embedding methods'' also consist in self-consistently
mapping strongly-correlated, extended models like the Hubbard model
onto smaller effective models. One such method is
rotationally-invariant slave bosons (RISB,~\cite{Lechermann2007}),
equivalent to the Gutzwiller method~\cite{Bunemann2007}, which can be
regarded as a low-energy simplification of DMFT that gives access to
low-energy properties of the spectrum like the quasiparticle
renormalization weight $Z$ and the static self-energy shift.

In RISB, the impurity model Equation~(\ref{eq:AIM}) contains as many
bath sites as correlated sites (or impurities)~\cite{Lanata2015}, as
opposed to the infinite (or very large) number required in DMFT to
mimic the dynamical features of the hybridization function
$\Delta(\omega)$. The parameters are obtained via self-consistent
equations that are similar to those of DMFT (see e.g.~\cite{Ayral2017a}
for a unified view of DMFT, RISB and the DMET method below). Contrary
to DMFT, RISB does not require the computation of the full
time-dependent impurity Green's function, but only of static
correlators of impurity sites $i,j\dots$
$[D_{\mathrm{imp}}]_{ij}^{\sigma}=\langle
c_{i\sigma}^{\dagger}c_{j\sigma}\rangle$, bath sites
$[D_{\mathrm{bath}}]_{pq}^{\sigma}=\langle
a_{p\sigma}^{\dagger}a_{q\sigma}\rangle$, as well as mixed correlators
$[D_{\mathrm{mixed}}]_{ip}^{\sigma}=\langle
c_{i\sigma}^{\dagger}a_{p\sigma}\rangle$. Together, these correlators
make up the impurity model's one-particle reduced density matrix 
{\begin{equation}
D=\left[\begin{array}{cc}
D_{\mathrm{imp}} & D_{\mathrm{mixed}}\\
D_{\mathrm{mixed}}^{\dagger} & D_{\mathrm{bath}}
\end{array}\right].\label{eq:1RDM}
\end{equation}}\unskip

These static correlators are easier to compute than the full Green's
function (Equation~(\ref{eq:G_imp})), in exchange for containing less
information. RISB has been used, for instance, to explore nonlocal
correlation properties (like momentum-dependent quasiparticle weights)
in the Hubbard model~\cite{Ferrero2009}, or multi-orbital models
inaccessible to DMFT relevant to realistic materials like praseodymium
and plutonium~\cite{Lanata2015}, at a fraction of the cost of DMFT.

Another widespread embedding method, density-matrix embedding theory
(DMET,~\cite{Knizia2012}), proposes a similar embedding as RISB,
without giving access to information such as the quasiparticle 
weight~\cite{Ayral2017a}. It also relies on the mapping of an extended
model onto a simplified effective model that is solved with more accurate
methods than mean-field methods.

The embedding methods presented above all consist in iteratively
defining a suitable effective model. The definition of the model itself
is done thanks to an efficient (i.e.\ polynomial) computation, while the
effective model, which is supposed to contain the relevant
strongly-correlated degrees of freedom, is solved either exactly with
an expensive (exponential) method---with the corresponding size or time
limitations---or a simplified polynomial method---with the
corresponding loss of accuracy.

The spirit of these embedding methods is quite similar to ``active
space'' methods encountered in a quantum chemistry context: there,
the important degrees of freedom, called the ``active space'' of
the molecule, are selected among all the orbital degrees of freedom
of the molecule. Indeed, tackling all the degrees of freedom would
either be impossible for exact methods (like the full configuration
interaction method) due to the exponential wall, or inaccurate when
implemented with approximate polynomial methods like the configuration
interaction (CI) or coupled cluster (CC) method with a finite number
of excitations (like singles and doubles) for strongly-correlated
(also called ``multireference'') compounds (see Section~\ref{subsec:qchem}
above).

In conclusion: with embedding methods and active-space methods, an
efficient classical computation is done to (usually self-consistently)
identify the relevant, hard degrees of freedom of the original extended
model. This construction is very successful in reducing the problem
size and therefore making it more accessible to exact, exponentially
costly methods. Yet, they carry their own limitation as the size of
the effective model is by construction limited by the exponential
wall. For some regimes, the effective-model sizes that are required
to converge the computation exceed the capacities of classical processors.
In an upcoming section, we will explain how quantum processors could
be used as a replacement or complement to some of the classical algorithms
used for tackling these strongly-correlated problems.

Before this, we also introduce many-body problems that are not coming
from physical systems.

\subsection{Beyond quantum: classical many-body
problems}\label{subsec:classical-many-body}

Quantum many-body problems such as the ones that naturally arise in
condensed-matter physics and quantum chemistry are but a subclass of
many-body physics. Beyond the field of nuclear physics (which we did
not elaborate on but shares many problems and methods with solid-state
physics and chemistry, see our recent review,~\cite{Ayral2023a}),
many-body problems are also encountered in classical computational
problems known as combinatorial optimization problems.

One well-known combinatorial optimization problem is the maximum cut
(MaxCut) problem. It is defined on a graph $G=(V,E)$ with a set $V$ of
vertices and a set $E$ of edges. It consists in finding the ``maximum
cut'' of the graph, namely the bipartition of vertices such that the
most edges in $E$ have one vertex in either partition. An example of
graph and its maxcut bipartition are given in
Figure~\ref{fig:Example-of-graph}.

\begin{figure}
\includegraphics{fig03}
\caption{\label{fig:Example-of-graph}Example of graph. The maxcut bipartition is illustrated with
the black and white vertices.}
\end{figure}

If one labels the vertices $i=1,\dots, n$, then a bipartition of the
vertices can be encoded as a bitstring $b=(b_{1},b_{2},\dots, b_{n})$
with $b_{i}\in\{ 0,1\} $. If $b_{i}=0$, the $i$th vertex
belong to the first partition, otherwise it belongs to the second
partition. Among the $2^{n}$ possible bipartitions (bitstrings),
we want to find the one that maximizes
{\begin{equation}
C(b)=-\sum_{i,j\in E}(1-2b_{i})(1-2b_{j}).\label{eq:cost_maxcut}
\end{equation}}\unskip
Indeed, if $b_{i}=b_{j}$, we obtain a negative contribution $-1$
for each edge, otherwise (when the vertices belong to different partitions)
we obtain a positive $+1$ contribution. Maximizing $C(b)$ should
thus yield the solution.

Solving MaxCut on a classical computer is known to be exponentially
hard (it is a NP complete problem, see Section~\ref{subsec:complexity-theory}
below for a more in-depth discussion). The connection to usual \mbox{many-body}
problems can be made by redefining $S_{i}=1-2b_{i}$. The $S_{i}$
variables can take the values $+$1 or $-$1, they are like classical spins.
The cost function, with an additional minus sign, is called the energy
and reads:
{\begin{equation}
E(S)=\sum_{ij\in E}S_{i}S_{j}=\sum_{ij}J_{ij}S_{i}
S_{j}.\label{eq:ising_hamiltonian}
\end{equation}}\unskip

This form of energy defines the (antiferromagnetic) Ising model. Here,
the ``coupling'' is $J_{ij}=1$ if $i,j$ is an edge of $G$ and $0$
otherwise. The Ising model belongs to the class of spin glass models. A
well-known classical method to find the lowest energy configuration
$S^{*}$ of this model is to use a Markov chain Monte-Carlo algorithm
with a Boltzmann distribution $p(S)\propto {\mathrm{e}}^{-E(S)/T}$ for the
configurations, where the temperature $T$ is slowly decreased (or
``annealed'') to $0$. At this point, the system should be in its lowest
configuration. The intuition behind the algorithm is that the nonzero
temperature at the beginning is going to help the algorithm out of the
local energy minima it can start in. This algorithm, called simulated
annealing (SA) or thermal annealing, is illustrated in
Figure~\ref{fig:Sketch-annealing}. Of course, if the potential
landscape is very rugged (with high barriers), the probability for
attaining the global minimum will be very small.

\begin{figure}
\includegraphics{fig04}
\caption{\label{fig:Sketch-annealing}Sketch of the potential landscape and different annealing
methods.}
\end{figure}

One way to enhance this probability would be to create the possibility
of jump through barriers, as sketched in
Figure~\ref{fig:Sketch-annealing}. For this, one needs to supplement
our dynamics with possibilities to ``hop'' from one configuration to
another. To achieve this, we turn our classical cost function into a
quantum one: if we define the operator
{\begin{equation}
H=\sum_{ij}J_{ij}Z_{i}\otimes Z_{j},\label{eq:maxcut_hamiltonian}
\end{equation}}\unskip
with $Z_{i}$ the Pauli-$z$ matrix on the $i$th spin, we can check
that the lowest energy of this operator corresponds to the bitstring
$S$ that minimizes $E$. We can now add a temporary perturbation
to $H$ that couples different eigenstates of $H$ with each other,
say, by defining 
{\begin{equation}
H(t)=\sum_{ij}J_{ij}Z_{i}\otimes Z_{j}+h(t)\sum_{i}X_{i},\label{eq:TFIM-SQA}
\end{equation}}\unskip
(with $X_{i}$ the Pauli-$x$ matrix). The second term (a transverse
field term) accomplishes the wished-for hopping between configurations.
As we shall argue in Section~\ref{subsec:adiabatic-annealing}, a slow
tuning of the coefficient of $h$ from a large value to zero will lead
to an enhanced probability of ending up in the ground state of $H$.
This is the idea behind quantum annealing (QA). One can expect this
algorithm to provide speedups when the potential barriers are tall and
narrow.

While this quantum algorithm can be, and is, executed on the quantum
processors we will introduce in the next section, it can also be
``emulated'' on classical computers. The corresponding
``quantum-inspired'' method is called ``simulated quantum annealing''
(SQA): the quantum time evolution can be mapped to a classical Ising
model with one more dimension, which is solved using a Markov chain
Monte-Carlo algorithm similar to the one used for simulated annealing.
Such an algorithm turns out not to have a sign
problem~\cite{Bravyi2015}, contrary to fermionic problems, but its
convergence to the exact ground state can be very slow. In general,
however, it is faster than the SA algorithm~\cite{Heim2015}. Whether
SQA is faster than a purely quantum algorithm (QA) is still a matter of
debate (see~\cite{Heim2015} for a study in favor of
SQA,~\cite{Denchev2016} for a study in favor of QA).

In the next section, we introduce in more detail the main principles of
quantum algorithms for many-body problems.

\section{Quantum computers: promises for the many-body
problem\,\ldots~and reality}\label{sec:Quantum-computers:-Promises}

In this section, we summarize the reasons why quantum processors have
been suggested as potential solutions to the limitations of the
classical algorithms for solving the strongly-correlated problems we
introduced in the previous section. We also discuss the limitations of
these quantum processors, whether coming from the quantum nature of the
processors or from hardware imperfections.

\subsection{The promises of the perfect quantum
computer}\label{subsec:Promises...-:-the}

Given the challenges of solving many-body problems with classical
processors, Richard Feynman famously advocated the use of quantum
machines to simulate quantum problems~\cite{Feynman1982}. In other
words, he suggested the use of physical devices governed by the laws of
quantum mechanics to mimic natural physical systems governed by the
same laws. 

\subsubsection{Two classes of quantum processors}

Two main strains of quantum processors emerged in the following
decades. Inspired by Feynman's suggestion, physicists built ``quantum
simulators'', namely artificial systems---ultracold atoms in optical
lattices~\cite{Bloch2008}, ions trapped in electromagnetic traps,
Rydberg atoms trapped by optical tweezers~\cite{Browaeys2020},
etc.---whose Hamiltonian was similar to that of physical systems of
interest. For instance, Fermi--Hubbard physics were studied via the
physics of tens of ultracold atoms in optical lattices (see
e.g.~\cite{Jordens2008}), and hundreds of Rydberg atoms have been used
to gain insights into the dynamics of spin models (see
e.g.~\cite{Scholl2020}). This ``quantum simulation'' approach is
currently limited, among others, by the effective temperatures that can
be reached, which are still too high to study phenomena like pseudogap
physics, let alone superconductivity (they typically reach temperatures
close to the antiferromagnetic exchange coupling~\cite{Xu2023a}). In
addition, these processors are limited to simulating a specific form of
Hamiltonian, and are therefore meant to ``solve'' very specific
problems. They are sometimes called ``analog quantum computers'', in
reference to the analog computers that were built in the past to solve
dedicated problems, before the advent of (classical) computers as we
know them.

The second class of computers is inspired by the model of computation
used in classical computers: there, all operations can be boiled down
to logic gates (NOT, NAND, etc) acting on bit registers. This model
was extended to quantum processors by introducing the notion of quantum
bit (qubit) to refer to a spin-1/2 system. A quantum ``computer''
can then be regarded as a collection of interacting qubits. These
qubits can be manipulated by quantum logic gates, which, owing to
Schr\"odinger's equation, are unitary operations acting on (possibly
several) qubits. The so-obtained quantum states are then ``measured''
to extract useful information.

It was soon realized that this mathematical model of quantum
computation (sometimes called gate-based or ``\emph{digital} quantum
computation'') could be used to create algorithms with provable
speedups over classical algorithms. The two most famous examples are
Shor's factoring algorithm~\cite{Shor1994} and Grover's unstructured
search algorithm~\cite{Grover1996}, which respectively achieve an
exponential and a quadratic speedup over their known classical
counterparts. Another interesting feature of reusing a formalism
similar to that of classical computers is that the error correction
methods developed for classical machines were generalized to quantum
computers~\cite{Shor1995,Shor1996}: by using several (``physical'')
qubits to encode the information of a single (``logical'') qubit, it
was proven that, provided the error rate of physical qubits was below a
certain threshold, combining more and more physical qubits together led
to an exponential suppression of errors for the corresponding logical
qubits, thereby guaranteeing an arbitrary level of accuracy. To date,
the required error thresholds and numbers of qubits have not yet been
attained by experimental quantum processors, although first promising
proofs of concepts in this direction---with near-threshold errors, but
still very small qubit counts---are emerging in all major technological
platforms~\cite{Chen2021,Krinner2021,Bluvstein2023,DaSilva2024}.

\subsubsection{Preparing a ground state: adiabatic
annealing}\label{subsec:adiabatic-annealing}

One major challenge of quantum many-body physics is to investigate the
properties of the ground state of a complicated Hamiltonian. For this,
one needs to generate such a state. A central method to generate such a
state is the quantum adiabatic annealing method (see~\cite{Albash2018}
for a review). It consists, in order to prepare the ground state
$|\Psi_{0}{\mathop{\rangle}}$ of a Hamiltonian $H$, in starting from an easier
Hamiltonian $H_{0}$, whose ground state $|\Phi_{0}{\mathop{\rangle}}$ is known and
easy to prepare on the quantum hardware at hand. Then, one performs a
time evolution, starting from $|\Phi_{0}{\mathop{\rangle}}$, under a Hamiltonian
that interpolates between $H_{0}$ and $H$ between the initial time and
the final time $T$. For instance,
{\begin{equation}
H(t)=(1-t/T)H_{0}+t/TH.\label{eq:annealing_Hamiltonian}
\end{equation}}\unskip

If the ``annealing time'' $T$ is long enough ($T\gg
V/\Delta_{\mathrm{min}}^{2}$ with $V$ the time derivative of $H(t)$ in
its instantaneous eigenstate and $\Delta_{\mathrm{min}}$ the minimum
gap in the spectrum of $H(t)$), the adiabatic theorem will guarantee
that $|\Psi(T){\mathop{\rangle}}$ will be very close to $|\Psi_{0}{\mathop{\rangle}}$. The
overall run time of the algorithm is $O(T)$, so a major question for
quantum annealing is the scaling of the minimum gap
$\Delta_{\mathrm{min}}$. It depends on the problem at hand ($H$) as
well as the mixer Hamiltonian ($H_{0}$) and the schedule itself. In
particular, hard problems may feature a minimum gap that decreases
exponentially with the system's size\,$\ldots$\ leading to exponentially
long run times. 

The adiabatic annealing principle is underlying a number of quantum
algorithms and methods. It is routinely used to prepare Rydberg atoms
in the ground state of exotic Hamiltonians~\cite{Lienhard2017}. Quantum
annealers built e.g.\ by the D-wave firm use this principle, in
combination to the thermal annealing method we described above, to
attempt to find ground states of Hamiltonians that correspond to
complex optimization functions, as briefly discussed in 
Section~\ref{subsec:classical-many-body}. 

Adiabatic annealing is also a guiding principle to design quantum
programs on gate-based architecture. Since annealing consists in
performing time evolutions governed by Hamiltonians like $H(t)$
(Equation~(\ref{eq:annealing_Hamiltonian})), tools have been introduced
to implement these time evolutions with gate-based quantum computers,
as we will describe in the next subsection.

\subsubsection{Performing a Hamiltonian
evolution}\label{subsec:Hamiltonian-evolution}

The time evolution of quantum systems is the first inspiration behind
Feynman's idea~\cite{Feynman1982}. While the first---analog---class of
computers can be used to simulate the system they are built to
reproduce, the second---digital---class of computers can also in
principle be used as a substitute to classical computers for simulating
\textit{any} many-body problem. How to do it in practice was first
described by~\cite{Lloyd1996}. The main building block for doing so is
the realization that a quantum time evolution under the Schr\"odinger
equation, which can be boiled down to a unitary evolution under the
evolution operator $U=\mathrm{e}^{-\mathrm{i}Ht}$ (in the time-independent Hamiltonian
case; the time-dependent case can also be dealt with, but we will focus
on the time-independent case for ease of notation), can be expressed as
a quantum circuit, namely a sequence of quantum logic gates.

The goal is to break down the $\mathrm{e}^{-\mathrm{i}Ht}$ unitary
operator to the quantum logic gates of the quantum processor at hand.
Formally, this breakdown is guaranteed to be feasible by the
Solovay--Kitaev theorem. It states that any unitary operator can be
approximated to precision $\epsilon$ by a sequence of gates of size
polylogarithmic in $1/\epsilon$, namely a short sequence. These gates
can be drawn from a universal gateset known as ``Clifford+T''---namely
one-qubit Clifford gates (essentially, single-qubit $\rmpi/2$ rotations)
and the controlled-NOT (CNOT) gate---supplemented with the so-called
$T$ gate (a $\rmpi/4$ rotation along the $Z$ axis).

In practice, the breakdown can be achieved by several methods. The
simplest one~\cite{Lloyd1996} is to use the Trotter--Suzuki formula:
supposing the Hamiltonian of interest has been decomposed as a weighted
sum of Pauli operators, 
{\begin{equation}
H=\sum_{i=1}^{N_{p}}\lambda_{i}P_{i},\label{eq:Pauli_decomposition}
\end{equation}}\unskip
(with $P_{i}=\sigma_{1}^{p_{i}(1)}\otimes\sigma_{2}^{p_{i}(2)}
\otimes\cdots\otimes\sigma_{n}^{p_{i}(n)}$,
$\sigma^{p}\in\{ I,X,Y,Z\} $, $\lambda_{i}\in\mathbb{R}$)
the first-order product formula reads
{\begin{equation}
U=\mathrm{e}^{-\mathrm{i}Ht}=\prod_{k=1}^{N_{t}}
\left(\prod_{i=1}^{N_{p}}\mathrm{e}^{-\mathrm{i}
\frac{\lambda_{i}}{N_{t}}P_{i}t}\right)+
O\left(\frac{t^{2}}{N_{t}}\right),\label{eq:suzuki_trotter}
\end{equation}}\unskip
with $N_{t}$ the number of time-slices (higher-order formulae and
precise bounds involving the commutators of the $P_{i}$'s are discussed
in~\cite{Childs2019}).

Usually, the Pauli operators $P_{i}$ that appear in the decomposition
of physical Hamiltonians are local, namely they act only on a small
subspace of the full Hilbert space. Thus, the operator
$\mathrm{e}^{-\mathrm{i}\sfrac{\lambda_{i}}{N_{t}}P_{i}t}$ that appears
in (\ref{eq:suzuki_trotter}) can typically be decomposed as a short
sequence of one- and two-qubit gates. Thus, the original unitary
evolution operator $U$ has been broken down, up to a controllable error
$\epsilon\propto t^{2}/N_{t}$, as a sequence of one- and two-qubit
gates that form a quantum circuit.

The ideal computational complexity for performing the corresponding
computation is $O(N_{t}N_{p}N_{c})\sim O(t^{2}/\epsilon)$. The scaling
in the error $\epsilon$ is of the form $\mathrm{poly}(1/\epsilon)$,
a scaling regarded as ``tractable''.

The final relevant term is $N_{p}$, the number of Pauli terms, which we
need to assess as a function of the system size. Typically, for spin
Hamiltonians on a lattice, like the Ising (see
Equation~(\ref{eq:ising_hamiltonian})), XY or Heisenberg models,
$N_{p}$ is proportional to the system size $n$. For fermionic models
and in particular for the Hubbard model introduced previously
(Equation~(\ref{eq:Hubbard})), we first need to transform the fermionic
Hamiltonian in the form of Equation~(\ref{eq:Pauli_decomposition}).
There are several such fermion-qubit transformations or ``encodings''.
A straightforward transformation is the Jordan--Wigner transformation,
which maps the creation and annihilation operators onto Pauli spin
operators as follows:
{\begin{equation}
c_{\alpha}^{\dagger}=\prod_{\beta=1}^{\alpha-1}Z_{\beta}
\left(\frac{X_{\alpha}-\mathrm{i}Y_{\alpha}}{2}\right),\quad 
c_{\alpha}=\prod_{\beta=1}^{\alpha-1}Z_{\beta}
\left(\frac{X_{\alpha}+\mathrm{i}Y_{\alpha}}{2}\right).\label{eq:JW}
\end{equation}}\unskip

The operator $\sfrac{X_{\alpha}-\mathrm{i}Y_{\alpha}}{2}=\sigma_{\alpha}^{+}$
flips the $\alpha$th qubit (corresponding to the $\alpha$th
spin-orbital), while the string of $Z$ operators takes care of
fermionic anticommutation relations. Thus e.g.\ a hopping term
$c_{\alpha}^{\dagger}c_{\alpha'}$ maps to 4 Pauli strings of the form
$\sigma_{\alpha}Z_{\alpha+1}\cdots Z_{\alpha'-1}\sigma_{\alpha'}$ (with
$\sigma_{\alpha/\alpha'}\in\{ X,Y\} $). Since the Hubbard
model has $O(n)$ terms, the Jordan--Wigner transformation will lead to a
number of Pauli terms $N_{p}=O(n)$. For the general electronic
structure Hamiltonian (Equation~(\ref{eq:str_el})) used to model
quantum chemical systems, $N_{p}$ is dominated by the Coulomb
interaction term, $N_{p}=O(N_{o}^{4})$ (with $N_{o}$ the number of
orbitals).

Thus, generically, for physical systems written in a second-quantized
form, $N_{p}\sim\mathrm{poly}(n)$ where $n$ is the size of the system.
This gives a total scaling $O(t^{2}/\epsilon\mathrm{poly}(n))$.
Interestingly, replacing the first-order Trotter formula with a
$p$th-order formula will lead to a scaling
$O(t^{(1+p)/p}/\epsilon^{1/p}\mathrm{poly}(n))$, which, in the limit of
$p\rightarrow\infty$, becomes
$O(t^{1+o(1)}/\epsilon^{o(1)}\mathrm{poly}(n))$ (see e.g.~\cite[chapter
5]{Lin2022b}), namely in the absence of Trotter errors we get a linear
run time, as expected.

This scaling, which is behind Feynman's original
idea~\cite{Feynman1982}, is very favorable to quantum computers when
compared to the exponential scaling (in size and/or time and/or other
parameters) of the classical methods described above. This exponential
advantage is the very reason for the strong push of quantum computing
in the last decades. Of course, much remains to be done to improve the
implementation of the unitary at hand. While the understanding of
Trotter formulas and their errors is making progress (see
e.g.~\cite{Childs2019}), alternatives to Trotter formulas have appeared
in recent years (see
e.g.~\cite{Childs2012,Low2017,HaoLow2017,Haah2018}).

Let us stress that while we introduced Hamiltonian simulation as a
means to implement the slow time evolution required in adiabatic
annealing, it of course applies to any time evolution, like a sudden
change (quench) of a parameter of a Hamiltonian, which leads to time
evolutions that are hard to describe on classical computers.

\subsubsection{Computing ground state energies: quantum phase
estimation}\label{subsec:ground-state-energies}

With the adiabatic annealing algorithm and a circuit to implement time
evolution, we can in principle prepare a quantum processor in the
ground state of a given Hamiltonian. We are now facing the problem of
measuring the energy of this state, or other observables like the
many-particle correlators required e.g.~in embedding techniques.

\paragraph*{The energy estimation problem}

Measurements in quantum processors are usually limited to measurements
of Pauli operators $Z_{i}$ on each qubit $i$. With these measurements,
one can easily estimate expectation values of the form $\langle
Z_{i_{1}}\cdots Z_{i_{m}}\rangle$ or, via single-qubit rotations, the
expectation value of any string of Pauli operators. However, this
expectation value estimation---which, at face value, is needed to
compute the energy $\langle\Psi_{0}|H|\Psi_{0}\rangle$ of the state we
prepared---is plagued with the same statistical variance as the
classical Monte-Carlo algorithms we introduced previously (see
Equation~(\ref{eq:MC_error})). Indeed, it consists in approximating an
expectation value with an empirical average on a finite number of
samples. (This will turn out to be a major limitation of the near-term
algorithms we will introduce later).

A key quantum algorithm, called quantum phase estimation
(QPE,~\cite{Kitaev1995}, that is also, incidentally, at the heart of
the aforementioned Shor algorithm), avoids this statistical error by
resorting to a circuit inspired by Mach--Zehnder
interferometry~\cite{Cleve1998}.

QPE consists in extracting the phase $\varphi$ of the eigenstate
$|\psi{\mathop{\rangle}}$ of a unitary operator $U$ such that
$U|\psi{\mathop{\rangle}}={\mathrm{e}}^{\mathrm{i}\varphi}|\psi{\mathop{\rangle}}$, given a circuit
implementation of $U$. Applied to $U=\mathrm{e}^{-\mathrm{i}Ht}$ and an
eigenvector $|\psi{\mathop{\rangle}}$ of $H$, QPE allows to extract the
eigenenergy of this eigenstate. This is achieved with an accuracy
$\epsilon\sim1/2^{m}$, where $m$ is the number of auxiliary (or
ancilla) qubits used in addition to those needed to encode
$|\psi{\mathop{\rangle}}$. This comes at a computational cost of $O(2^{m})$ (QPE
requires the successive application of controlled $U^{2^{k}}$ operators
with $k=0\dots m-1$, hence the $O(2^{m})$ cost). QPE thus achieves a
time complexity of $O(1/\epsilon)$, a quadratic gain over the time
complexities of $O(1/\epsilon^{2})$ that Monte-Carlo algorithms must
cope with (recall Equation~(\ref{eq:MC_error}): a set error $\Delta
g=\epsilon$ leads to a number of samples, and hence a run time,
$\mathcal{N}\sim1/\epsilon^{2}$).

QPE also comes with the useful property of projecting any incoming
vector $|\phi{\mathop{\rangle}}$ to an eigenvector $|\psi{\mathop{\rangle}}$ of $U$ with
probability $P\propto{{\Omega=}}
|\langle\phi|\psi\rangle|^{2}$. This can also be regarded as a
downside: the success probability of the algorithm is proportional to
$\Omega$, the overlap with the sought-after eigenstate. Therefore, a
number $1/\Omega$ of repetitions of QPE is needed to find the phase
with high probability.

Hence, QPE can be used to prepare ground states and find ground state
energies of Hamiltonians such as the Hubbard model: starting from a
guess $|\phi{\mathop{\rangle}}$ for the ground state of $H$ with sufficient
overlap with the true ground state $|\psi_{0}\rangle$, one can in
principle extract the ground state energy $E_{0}$ to great accuracy.
The $U^{2^{k}}$ operators can be implemented with the Trotter--Suzuki
technique we just described for time evolution. 

The preparation of an input state sufficiently close to the ground
state is a key challenge. The simplest way is to find a classical
method (such as the ones we reviewed in
Section~\ref{sec:Many-body-problems:-a}) to prepare an input state and
then load it onto the quantum computer (how to do so depends on the
classical representation of the state, and can be expensive). Another,
more quantum way, is to use QPE in an annealing-like iterative
procedure, dubbed adiabatic state preparation
(ASP,~\cite{Aspuru-Guzik2005}): one applies several QPEs with $H$
interpolating, as in Equation~(\ref{eq:annealing_Hamiltonian}), from an
easy Hamiltonian $H_{0}$ with known ground state $|\Phi_{0}\rangle$
(e.g.\ $H_{\mathrm{m.f}}$, Equation~(\ref{eq:H_meanfield})) to the
Hamiltonian of interest $H$. As per the adiabatic theorem, the number
of required steps for this ASP is of the order of
$O(\mathrm{poly}(1/\Delta_{\mathrm{min}}))$, where
$\Delta_{\mathrm{min}}$ is the minimal gap between the ground state and
the first excited state of the Hamiltonian that interpolates between
$H_{0}$ and $H$~\cite{Albash2018}. The behavior of this gap with system
size is difficult to predict. 

Taking the first approach (using a classical heuristic), the complexity
of QPE can thus be estimated by looking at the overlaps achieved by
known methods in cases where the exact ground state is
known. Reference~\cite{Tubman2018} shows that, as expected, for weakly-correlated
molecules (aka single-reference or dynamically correlated systems), the
Hartree--Fock method does yield states with good overlaps with the true
ground state. For more correlated systems, like the Hubbard and
Anderson models with larger $U/t$ ratios and small fillings, the
overlaps become quite small (\hspace{1sp}\cite{Tubman2018}, see
also~\cite{Lee2022}).


\paragraph*{Computing other observables}

Computing other observables, like the correlators needed for DMET or
RISB (Equation~(\ref{eq:1RDM})), or the Green's function for DMFT
(Equation~ (\ref{eq:G_imp})), boils down (at zero temperature) to
computing quantities of the form
$\langle\psi|U_{\mathrm{meas}}|\psi\rangle$, with $|\psi\rangle$ 
e.g.~the ground state wavefunction (typically prepared with the QPE-ASP
procedure described above), and $U_{\mathrm{meas}}$ a unitary operator.

For instance, after a Jordan--Wigner transformation, a local
$c_{i}(t)c_{i}^{\dagger}(0)$ term appearing in the Green's function
becomes
{\begin{eqnarray*}
U^{\dagger}(t)\frac{X_{i}+\mathrm{i}Y_{i}}{2}U(t)
\frac{X_{i}-\mathrm{i}Y_{i}}{2}
& =&\frac{1}{4}\{ U^{\dagger}(t)
X_{i}U(t)X_{i}-\mathrm{i}U^{\dagger}(t)X_{i}U(t)Y_{i}\\
&&+\,\mathrm{i}U^{\dagger}(t)
Y_{i}U(t)X_{i}+U^{\dagger}(t)Y_{i}U(t)Y_{i}\} ,
\end{eqnarray*}}\unskip
where $U(t)=\mathrm{e}^{-\mathrm{i}Ht}$ denotes the time-evolution
operator. Thus, computing
$\langle\psi|c_{i}(t)c_{i}^{\dagger}(0)|\psi\rangle$ amounts to
computing four expectation values. For instance, the first term is
$\langle\psi|U^{\dagger}(t)X_{i}U(t)X_{i}|\psi\rangle$: it is of the
form $\langle\psi|U_{\mathrm{meas}}|\psi\rangle$ and can be computed
with two circuits (one for the real part, another for the imaginary
part), shown in Figure~\ref{fig:Circuits-for-computing}. In addition to
the state-preparation circuit needed to prepare the initial state
$|\psi{\mathop{\rangle}}$, one must also implement a Trotterized version of the
evolution operator $U(t)$, as well as controlled-Pauli operations. 

\begin{figure}
\includegraphics{fig05}
\caption{\label{fig:Circuits-for-computing}Circuits for computing a Green's function. Top panel: circuit
for computing $\langle\psi|U|\psi\rangle$. Bottom panel: circuit for
computing $\langle\psi|P_{k}(t)P_{l}(t')|\psi\rangle$. Taken from
\cite{Ayral2023}. With kind permission of The European Physical Journal (EPJ).}
\end{figure}

Using these techniques, one can thus, at least in theory, hope to
take advantage of quantum computers to overcome the exponential hurdles
that classical methods all face. This was proposed in the DMFT context
in~\cite{Bauer2016,Kreula2016,Kreula2016a}.

However, in practice, the circuits involved in these tasks are
potentially quite deep: the controls on the time evolutions lead to
numerous entangling gates, and a large number of Trotter
slices---needed to maintain a small Trotter error---leads to a high
number of gates. For instance, the current noise levels make QPE-based
approaches impractical. To obtain the ground state of the Hubbard model
to accuracy $\Delta E/E_{0}\leq0.5\%$ for models up to $512$ sites
($16\times16$ lattice), the required number of $T$ gates is of the
order of $10^{7}$--$10^{8}$ according to the estimates
of~\cite{Kivlichan2020}. Using a surface code (a quantum error
correction code suitable to superconducting architectures,
see~\cite{Fowler2012}), assuming a physical gate error rate of 0.1\%,
this requires about one million physical qubits and about half an hour
of computation.

\subsection{Reality: noisy, intermediate-scale quantum
computers}\label{subsec:...-and-reality:}

\subsubsection{The exponential effect of
decoherence}\label{subsec:The-exponential-effect}

Current quantum processors are far from perfect. They suffer from
various imperfections: initialization errors, gate errors, readout
errors, and idling errors (errors incurred by inactive qubits), to cite
the most prominent sources of errors. Gate errors may come from
miscalibration (operating a Rabi oscillation on a qubit requires
exciting the qubit at its nominal frequency; if this frequency is known
only approximately and/or varies with time, a systematic under- or
overrotation will occur) or unwanted interaction with the environment.

Mathematically, noisy quantum states (dubbed mixed states) are described
by density matrices, usually denoted as $\rho$, as opposed to the
pure states $|\Psi\rangle$ we have used so far. The time evolution
is no longer described by a unitary operator $U$, but by a \emph{quantum
channel}, namely a completely positive, trace-preserving linear map
$\mathcal{E}(\rho)$. A unitary gate $U$ is described by the channel
$\mathcal{U}(\rho)=U\rho U^{\dagger}$, while generic noisy gates
are described by 
{\[
\mathcal{E}(\rho)=\sum_{k}E_{k}\rho E_{k}^{\dagger},
\]}\unskip
with the Kraus operators $E_{k}$ satisfying
$\sum_{k}E_{k}^{\dagger}E_{k}=I$ to preserve the trace. A typical
figure of merit of quantum gate is the average fidelity of the
corresponding quantum channel, defined as
{\begin{equation}
F(\mathcal{E},U)=\int\mathrm{d}\psi\langle\psi
|\mathcal{U}^{-1}\circ\mathcal{E}(|\psi\rangle\langle\psi|)|
\psi\rangle,\label{eq:average_fidelity}
\end{equation}}\unskip
where $\mathrm{d}\psi$ denotes the Haar measure on quantum states
and $\mathcal{U}$ denotes the ideal unitary operation of the gate.
Error rates are defined as $\epsilon=1-F$. (Similar metrics exist
for qubit initialization and readout).

Typically, in today's superconducting processors~\cite[Supplementary
Figure~4]{Morvan2023}, readout errors of the order of 1\%, two-qubit gate
errors of the order of 0.5\%, and one-qubit gate errors of the order of
0.1\% are typically measured using various benchmarking protocols. 

The cumulated effect of errors on the final state or final measurements
is generically exponential in the number of operations. To understand
this, we can consider a simple quantum channel called the (global)
depolarizing channel,
{\begin{equation}
\mathcal{E}_{\mathrm{depol}}(\rho)=(1-p)
\rho+p\frac{I}{2^{n}}.\label{eq:depolarizing}
\end{equation}}\unskip

Here the depolarizing probability $p$ tells one how often the state
$\rho$ becomes ``completely mixed''. Upon applying this channel after
each of the $N_{g}$ gates $U_{k}$ of a circuit, starting from a state
$\rho$, one obtains
{\[
\mathcal{E}(\rho)=(1-p)^{N_{g}}U\rho U^{\dagger}+
\left(1-(1-p)^{N_{g}}\right)\frac{I}{2^{n}},
\]}\unskip
where $U=\prod_{k=1}^{N_{g}}U_{k}$. Thus, for a given input state
$|\psi{\mathop{\rangle}}$, $\langle\psi|\mathcal{U}^{-1}
\circ\mathcal{E}(|\psi\rangle\langle\psi|)|\psi\rangle=
(1-p)^{N_{g}}+(1-(1-p)^{N_{g}})\sfrac{1}{2^{n}}\approx(1-p)^{N_{g}}$
(using $1/2^{n}\ll1$). Thus the process fidelity, for small $p$, can be
expressed as a product of the fidelities $f=1-p$ of the individual
operations:
{\begin{equation}
F(\mathcal{E},U)\approx(1-p)^{N_{g}}.\label{eq:product_law}
\end{equation}}\unskip

Such an exponential decay of fidelity with the number of operators,
proven here exactly for the depolarizing channel, is also observed
in actual experiments, like the random circuit sampling task of 
Google~\cite{Arute2019}. 

This exponential decay of fidelity places a severe limitation of what
can be achieved on near-term computers. We shall see, in
Section~\ref{subsec:MPS}, how this product law for fidelity can be used
to quantitatively delineate the boundary to a potential ``advantage''
of quantum processors. More importantly, this stark decay has also
spurred the development of algorithms requiring less deep circuits
than, for instance, the quantum phase estimation circuit described in
the previous section.

\subsubsection{Variational algorithms for near-term quantum
processors}\label{subsec:Algorithms-for-near-term}


\paragraph*{The variational quantum eigensolver}

The textbook quantum algorithms presented in a previous section
(Section~\ref{subsec:Promises...-:-the}) require a number of gates
$N_{g}$ incompatible with the error levels (e.g.\ $p$ in the depolarizing
channel presented in the previous section,
Equation~(\ref{eq:depolarizing})) available on current quantum
processors. A simple way to reduce this number of gates was proposed
in~\cite{Peruzzo2014}. The idea was to use quantum computers to
prepare, and compute the energy of, variational states
$|\psi(\boldsymbol{\theta}){\mathop{\rangle}}$, and offload the rest of the
computation to a classical processor. Given an expressive enough
variational family, the variational principle
{\begin{equation}
E(\boldsymbol{\theta})=\langle\psi(\boldsymbol{\theta})|H|
\psi(\boldsymbol{\theta})\rangle\geq E_{0},\label{eq:var_principle}
\end{equation}}\unskip
(with $E_{0}$ the ground-state energy of $H$) guarantees that a proper
optimization, by a classical computer, of the parameters
$\boldsymbol{\theta}$, will lead to a good enough approximation of the
ground-state energy $E_{0}$. This is illustrated in
Figure~\ref{fig:Sketch-of-the-VQE}.

\begin{figure}
\includegraphics{fig06}
\caption{{\label{fig:Sketch-of-the-VQE}Sketch of the VQE algorithm. The left part is realized on a
classical processor, while the right part is performed on a quantum
processor.}}
\end{figure}


This approach, dubbed the variational quantum eigensolver (VQE, 
see~\cite{Tilly2021a} for a review, and~\cite{Kokail2018} for an analog
version), is robust to systematic errors like overrotations: if a
rotation is of angle $\tilde{\theta}=\theta+\delta\theta$ instead of
the wished-for $\theta$, the classical optimization will converge to
$\tilde{\theta}^{*}=\theta^{*}-\delta\theta$ without necessitating a
knowledge of the miscalibration error $\delta\theta$. Yet, its main
advantage over previous approaches to ground state estimation is that
the variational family $|\psi(\boldsymbol{\theta}){\mathop{\rangle}}$ can be
chosen at one's convenience. For instance, one may choose a quantum
circuit $U_{\boldsymbol{\theta}}$ that is shallow enough to ensure that
the fidelity of the final state
$\rho(\boldsymbol{\theta})=\mathcal{E}_{\boldsymbol{\theta}}(|0\rangle\langle0|)$,
with $\mathcal{E}_{\boldsymbol{\theta}}$ the noisy counterpart of
$U_{\boldsymbol{\theta}}$, is close to 1. A major challenge therefore
lies in finding short, but expressive enough variational state
preparations. Methods in this direction are presented in 
Sections~\ref{subsec:fermions}, and, to a lesser 
extent,~\ref{subsec:MPS}. 

Let us emphasize that switching to variational algorithms like VQE
comes at a double price, even in the absence of decoherence.

First, the variational formulation of the ground state energy problem
poses the question of the optimizability of the potential landscape
$E(\boldsymbol{\theta})$ (Equation~(\ref{eq:var_principle})). For
uniformly random circuits (which have the largest expressivity), the
phenomenon of concentration of measure---energy expectation values for
a given variational circuit tend to gather around the circuit-average
value as the size of the Hilbert space increases---leads to the
so-called barren plateau problem (\hspace{1sp}\cite{McClean2018},
see~\cite{Larocca2024} for a review): gradients of
$E(\boldsymbol{\theta})$ become exponentially small, even in the
absence of noise (which further compounds this phenomenon). Physically
motivated ans\"atze (like the unitary coupled cluster, UCC, ansatz,
used in chemistry, see Section~\ref{subsec:qchem}) could also suffer
from this problem if they are deep enough~\cite{Mao2023}. 

In fact, the barren plateau problem is directly linked to the
expressiveness of the ansatz: the variance $\langle
E(\boldsymbol{\theta})^{2}\rangle-\langle
E(\boldsymbol{\theta})\rangle^{2}$ (here $\langle \cdots \rangle$
denotes average over the parameter space) of the cost landscape, whose
vanishing signals a barren plateau, is inversely proportional to the
dimension of the so-called dynamical Lie algebra (DLA). If the
parametric circuit is
$U(\boldsymbol{\theta})=\prod_{k}\mathrm{e}^{-\mathrm{i}\theta_{k}G_{k}}$,
the DLA is the vector space generated by nested commutators of the
generators $G_{k}$ of the parametric circuit. (In fact, this statement
holds for simple Lie algebras and more importantly, for circuits deep
enough that they form a ``2-design'', namely an effectively random
circuit, see~\cite{Ragone2023} for the full result and its derivation).
In other words, the larger the variational space being explored by VQE,
the more likely the occurrence of flat landscapes.

A second price of VQE is that the estimation of the variational energy
$\langle\psi(\boldsymbol{\theta})|H|\psi(\boldsymbol{\theta})\rangle$
at~each~step is performed, in practice, by splitting $H$ as a sum of
Pauli terms, $H=\sum_{i=1}^{N_{p}}\lambda_{i}P_{i}$, collecting the
average value
$\langle\psi(\boldsymbol{\theta})|P_{i}|\psi(\boldsymbol{\theta})\rangle$
of each local operator, and then summing the terms (see
Figure~\ref{fig:Sketch-of-the-VQE}). The second stage is essentially
the construction of a classical estimator of
$\langle\psi(\boldsymbol{\theta})|P_{i}|\psi(\boldsymbol{\theta})\rangle$:
one generates $|\psi(\boldsymbol{\theta})\rangle$, does a projective
measurement of $P_{i}$, which yields a $\pm1$ outcome $\rmpi_{s}^{(i)}$,
and repeats the procedure $\mathcal{N}$ times, yielding
{\begin{equation}
\langle\psi(\boldsymbol{\theta})|P_{i}|\psi(\boldsymbol{\theta})\rangle
\approx\frac{1}{\mathcal{N}}\sum_{s=1}^{\mathcal{N}}\rmpi_{s}^{(i)}
=\hat{P}_{i}(\mathcal{N}).\label{eq:VQE_estimator}
\end{equation}}\unskip

The finite-sampling error $\epsilon$ between the left-hand side and the
right-hand side scales as $1/\sqrt{\mathcal{N}}$, so that the
computational time of VQE will generically scale as
$O(\mathcal{N})=O(1/\epsilon^{2})$, as in classical Monte-Carlo
techniques encountered in
Section~\ref{subsec:Quantum-Monte-Carlo-algorithms}. This is a major
drawback compared to QPE, whose run time scales as $1/\epsilon$ (albeit
with many more gates and thus the need for an error-corrected quantum
computer, as explained in Section~\ref{subsec:Promises...-:-the}). VQE
also suffers from a major drawback compared to variational Monte-Carlo
because, contrary to VMC, it does not benefit from the variance
reduction phenomenon: as one gets closer to the sought-after ground
state, the variance of the individual Pauli terms does not vanish (the
eigenstate of $H$ is not an eigenstate of the individual $P_{i}$'s), as
is the case in VMC for the local energy.

It must be pointed out that mathematical statements about barren plateaux
are average statements. They do not preclude the existence of portions
of the parametric space where gradients do not vanish. However, these
``fertile valleys'' are narrow ones~\cite{Arrasmith2022}. One
of the future challenges of VQE is to start the optimization process
close to or in these narrow gorges. Using classical methods to find
good starting points is most probably a promising avenue in this direction.
The recently introduced adaptive ansatz generation strategies, which
construct the ansatz by selecting the operators that maximize the
gradients, are also promising in this respect~\cite{Grimsley2019,Tang2019}.

Of course, even in the absence of barren plateaux and if shot noise is
small enough, decoherence also has to be reckoned with. A major challenge
of VQE is thus to find the shallowest ans\"atze to curb the deleterious
influence of noise.

\paragraph*{The design of variational quantum circuits}

Designing variational quantum circuits $U(\boldsymbol{\theta})$ must
meet two contradictory goals: being expressive enough to be able to
reach the sought-after ground state during optimization, and being
short enough to limit the impact of\break decoherence.

To meet the expressivity requirement, a simple strategy, sometimes
called the hardware-efficient ansatz~\cite{Kandala2017}, is to
define a circuit that is ``entangling enough'' by using the available
two-qubit gates on the hardware (hence the name), and random single-qubit
rotations, so that the state created by the ansatz approximates well
a random quantum state. These states are characterized by a high level
of entanglement, and may therefore have a good chance of getting close
to the ground state of the Hamiltonian. A main drawback of these circuits,
however, is that they are very prone, by design, to the barren plateau
problem.

The complementary strategy is to construct physically-motivated
ans\"atze: they will cover the Hilbert space less extensively but will
be easier to optimize. A well-known ansatz in this family is the
unitary coupled cluster (UCC) ansatz we introduced in
Equation~(\ref{eq:ucc_ansatz}). Since $T-T^{\dagger}$ is antihermitian,
${\mathrm{e}}^{T-T^{\dagger}}$ is unitary and can thus a priori be implemented on
a quantum processor. In practice, this is achieved by finding a Pauli
decomposition $T-T^{\dagger}=\mathrm{i}\sum_{k}\lambda_{k}P_{k}$ and
Trotterizing the corresponding sum of terms as described above
(Section~\ref{subsec:ground-state-energies}). The length of the
so-obtained circuit is polynomial in the number of orbitals, contrary
to (untruncated) classical implementations of the UCC (or VCC) methods,
raising hopes that VQE could bring an advantage compared to classical
methods. 

Among physically-inspired ans\"atze, one can also mention the
Hamiltonian variational ansatz (HVA,~\cite{Wecker2015}), which is
inspired by the annealing method (see
Section~\ref{subsec:ground-state-energies}). Following the adiabatic
theorem, if one starts from the ground state of the Hamiltonian $H$ at
hand (say the Hubbard model) where one has removed the hopping terms
(such a Hamiltonian, $H_{0}$, commutes with $c_{i}^{\dagger}c_{i}$ and
thus admits Fock states, namely factorized states, as eigenstates
$|\Phi_{0}{\mathop{\rangle}}$), and deforms it (see 
Equation~(\ref{eq:annealing_Hamiltonian})) into the Hamiltonian of
interest $H$, one is guaranteed to end up in the ground state of $H$.
The HVA takes inspiration from the Trotterization of the interpolating
Hamiltonian by proposing the following ansatz:
{\begin{equation}
|\Psi(\boldsymbol{\theta}){\mathop{\rangle}}=\prod_{k=1}^{M}
\mathrm{e}^{-\mathrm{i}\theta_{2k}H_{0}}
\mathrm{e}^{-\mathrm{i}\theta_{2k+1}H}|\Phi_{0}{\mathop{\rangle}},\label{eq:HVA}
\end{equation}}\unskip
with $2M$ parameters $\boldsymbol{\theta}$. The hope underlying the
method is that the total duration of the circuit once the parameters
have been optimized is shorter than the (long) duration dictated by the
adiabatic theorem (HVA relies on the hope to perform a ``shortcut to
adiabaticity'').

\subsubsection{Applications to fermionic many-body problems: from
molecules to the Hubbard model to the Anderson model}

After its introduction by~\cite{Peruzzo2014}, VQE was tested on a
number of toy systems that served as proofs of concept, with no aim to
compete with classical methods (see e.g.~\cite{OMalley2016a} for an
application for the $H_{2}$ molecule). In parallel, theoretical papers
estimated the amount of computational resources required to reach a
satisfactory level of accuracy for larger systems beyond toy models. In
particular,~\cite{Wecker2015} gave resource estimates for computing the
ground state energy of small molecules to chemical accuracy, namely to
within 1~mHa of the exact ground state energy (such levels of accuracy
are needed to compute chemical rates: those depend on activation
energies $E_{\mathrm{a}}$ through a factor $\propto
{\mathrm{e}}^{-E_{\mathrm{a}}/k_{\mathrm{B}}T}$ via the Arrhenius or Eyring laws;
only predicting whether a reaction is likely or not to happen requires
that $E_{\mathrm{a}}$, and therefore the ground-state energies involved
in computing $E_{\mathrm{a}}$, be known at least up to a
$O(k_{\mathrm{B}}T)$ error; at room temperature, this corresponds to 
1~mHa). To obtain a resource estimate,~\cite{Wecker2015} used the fact
that the shot noise error can be upper bounded as
{\begin{equation}
\Delta H(\mathcal{N})\leq\frac{\left\Vert H\right\Vert _{1}}
{\sqrt{\mathcal{N}}},\label{eq:shot_noise_error_bound}
\end{equation}}\unskip
with the one-norm defined as $\Vert H\Vert _{1}=\sum_{i}|\lambda_{i}|$.
Such an estimate is based on a nonuniform allocation of shots among the
Pauli terms: each Pauli term $\lambda_{i}P_{i}$ in the decomposition of
$H$ receives a fraction of the total budget proportional to
$|\lambda_{i}|$ (\hspace{1sp}\cite{Rubin2018}; more advanced strategies are
possible, see e.g.~\cite{Arrasmith2020}, but do not alter the upper
bound). At least $\mathcal{N}\approx\Vert H\Vert _{1}^{2}/\Delta H^{2}$
are thus needed to reach a given accuracy $\Delta H$. For the
$\mathrm{H_{2}O}$ molecule, since $\Vert H\Vert _{1}=36$ Ha (in the
STO-6G basis), one obtains a number of
$\mathcal{N}=36^{2}/(10^{-3})^{2}\approx10^{9}$ samples. Assuming a
clock cycle (processor rate) of 1~MHz (an optimistic estimate for
superconducting processors, whose measurement durations are in the 
$\rmmu \mathrm{s}$ range), this corresponds to 1000~s per energy
evaluation, without factoring in gradients (which basically require a
multiplication of the above figure by the number of parameters).
Counting gradients and the number of optimization steps, one easily
reaches months~\cite{Wecker2015}.

Similar estimates for the Hubbard model lead to less dire estimates---of
the order of days---due to a smaller number of terms and translation
invariance~\cite{Wecker2015}. This appears more promising, although
this estimate completely leaves aside optimization issues (the barren
plateau problem) and decoherence.

As argued in the introductory section, using embedding techniques
like DMFT, RISB or DMET bring another level of simplification by shifting
the focus to a simpler effective model, namely the Anderson impurity
model (AIM, Equation~(\ref{eq:AIM})). Motivated by the success of these
techniques and by the more modest effort of solving the AIM compared
to the Hubbard model, a few works proposed to use a quantum processor
to solve the impurity model.

References~\cite{Kreula2016} and~\cite{Bauer2016} introduced the general
methodology of using quantum processors to solve DMFT
equations. Reference~\cite{Kreula2016} numerically simulated the effect of
hardware errors in a trapped-ion implementation of a nonequilibrium
DMFT scheme to describe a quench of the kinetic term of the Hubbard
model, with 3 to 21 qubits (corresponding to 2 and 10 AIM bath sites,
respectively). The Green's function was measured using a Trotter
compilation of the time evolution (as described
above). Reference~\cite{Bauer2016} focussed on equilibrium DMFT, with a proposal
to use quantum phase estimation to measure the Green's function. The
following works, which were aimed at practical implementations,
resorted to a simplified DMFT scheme called two-site
DMFT~\cite{Potthoff2001}, which consists in limiting the bath to only
one bath site. This in turns means that only 4 qubits are
needed. Reference~\cite{Kreula2016a} numerically simulated two-site DMFT with a
(noiseless) superconducting platform implementation of the Trotter
evolution required by the Green's function computation, while the
implementation of the ground state preparation needed to initialize the
Green's function circuit was not discussed. Reference~\cite{Rungger2019}
implemented two-site DMFT with VQE as the algorithm used to compute the
energy of the ground state as well as of the $N+1$ and $N-1$ manifolds
needed to compute the Green's function in a Lehmann representation.
This VQE was implemented on superconducting as well as trapped-ion
hardware. Reference~\cite{Jaderberg2020} performed noisy simulations of two-site
DMFT, with a standard Trotter time evolution to compute the Green's
function, and a black box for the ground state preparation. The minimal
levels of noise needed to get useful results for two-site DMFT were
computed in this same work. Reference~\cite{Yao2020} implemented a Gutzwiller
(RISB) method to study the periodic Anderson model (a variant of the
Hubbard model), with a VQE-based, two-qubit implementation on a
superconducting processor. More recent work~\cite{Selisko2024} carried
out a DFT+DMFT computation for a multiorbital Hubbard model with an AIM
with up to 6 bath sites (and a single impurity), corresponding to 14
qubits. However, the determination of the optimal VQE parameters was
done on a classical computer.

All the experimental runs comprising a full-blown state preparation
were limited to at most 4 qubits, and should thus be considered as
mere proof-of-principle experiments. The main question this state
of affairs raises is how to increase the size of the AIM (whether in
terms of number of impurities, or number of bath sites, or both) while
remaining tractable. We will give some insights into this in the next
section.

\section{Paths towards useful quantum-classical
algorithms}\label{sec:Paths-towards-useful}

In the previous two sections, we have outlined the main methods used on
classical computers (Section~\ref{sec:Many-body-problems:-a}) and on
the emerging quantum processors
(Section~\ref{sec:Quantum-computers:-Promises}) for solving fermionic
quantum many-body problems. We have identified specific advantages and
drawbacks of both computing paradigms, and in particular
\begin{enumerate}[(1)]
\item the difficulty of classical computers to handle equilibrium systems
with long-distance correlations and/or a high degree of entanglement,
limiting the reach of the most advanced methods to solve strongly-correlated
models like the Hubbard model, or even simpler models, self-consistently
``distilled'' by embedding methods, like the Anderson impurity model;
\item the difficulty of classical computers to handle out-of-equilibrium
dynamics due to the fast growth of entanglement;
\item the limitations of current noisy quantum hardware due to decoherence,
which put a low cap to the maximum size of viable quantum circuits;
\item the fundamental limitations of even noiseless quantum circuits, like
shot noise, or the need for input states with high overlaps in quantum
phase estimation.
\end{enumerate}
The goal of this section is to outline recent research results to
overcome some of these difficulties with a common pattern: the combination
of classical and quantum algorithms to solve the many-body problems
at hand.

Three main directions will be elaborated on: Section~\ref{subsec:MPS}
explores how tensor network methods can be used to delineate, and
perhaps reach, quantum advantage. Section~\ref{subsec:fermions}
elaborates on hybrid methods for tackling fermionic many-body problems.
Finally, Section~\ref{subsec:optimization} extends the scope to hard
optimization problems, which can be regarded as classical many-body
problems, and which are natural targets for quantum processors.

\subsection{Tensor networks: classical yardsticks of quantum supremacy
claims\,$\ldots$\ and antidotes to the plague of decoherence?}\label{subsec:MPS}

As discussed in Section~\ref{subsec:MPS} above, tensor networks are
powerful classical tools to represent many-body states. As it were,
tensor networks are also deeply tied to quantum circuits: executing a
quantum circuit on a gate-based quantum processor can be regarded as
contracting a tensor network. Before we explain this statement, let us
emphasize that this deep connection both opposes, and brings closer,
tensor network methods and quantum computing: it opposes them as two
competing methods, whose computational merits and shortcomings can be
compared; it brings them closer as complementary methods, which could
possibly be used in combination with each other depending on the
hardware constraints of the computation. 

Thus, tensor network provide a natural comparison point for the run
time and memory cost of perfect quantum computations, namely the CPU
and memory needed for contracting the corresponding tensor network. As
we shall see in the next section, this comparison point has been used
to support claims of quantum supremacy. However, we will also argue
that for the noisy quantum computers that are available today, the
reference point should be a \emph{lossy} contraction of the tensor
network (Section~\ref{subsec:Debunking-quantum-supremacy}). We will
also see that some tensor networks could potentially be used to
construct quantum circuits that are more robust to noise
(Section~\ref{subsec:Matrix-product-states-trotter}).

\begin{figure}
\includegraphics{fig07}
\caption{\label{fig:Tensor-network-and-circuits}Tensor network representations of quantum circuits and
contraction strategies. (a) General tensor network, with vertices
$\psi_{0}$, $u_{1}$, $u_{2}$, $u_{3}$ and $x_{1}$, $x_{2}$, $x_{3}$,
and edges $a_{1}$, $a_{2}$, $a_{3}$, $b_{1}$, $b_{2}$, $b_{3}$ and
$c_{1}$, $c_{2}$. (b) ``Schr\"odinger'' strategy to contract the tensor
network: the quantum state is stored as a dense $2\times2\times2$ array
$\psi_{b_{1},b_{2},b_{3}}$. (c) Matrix-product-state strategy: the
quantum state is represented as a matrix product state, e.g.\ with three
connected tensors. After the application of a two-qubit gate, a SVD
compression is performed.}
\end{figure}

The connection between a quantum circuit execution and a tensor network
contraction is the following. The expectation values
$\langle\psi|\hat{O}|\psi\rangle$ or outcome probabilities
$|\langle x|\psi\rangle|^{2}$ (where
$x=(b_{1},b_{2},\dots,b_{n})$ represents a bitstring, and $|x{\mathop{\rangle}}$
an eigenstate of the $\hat{Z}\otimes\hat{Z}\cdots\hat{Z}$ operator), of
a state $|\psi{\mathop{\rangle}}=\prod_{k=1}^{N_{g}}U_{k}|0{\mathop{\rangle}}^{\otimes n}$
generated by a quantum circuit with $N_{g}$ gates on $n$ qubits can be
regarded as the outcome of contracting a tensor network. As a reminder,
a tensor network is defined by a graph $G=(V,E)$ with vertices $V$ and
edges $E$, and tensors $\{ T_{k}\} _{k\in V}$ associated to
each vertex. In our situation, the tensors correspond to gates 
$\{U_{k}\} _{k=1\dots N_{g}}$, operators $\hat{O}$ or states
$|\psi{\mathop{\rangle}}$, $|x{\mathop{\rangle}}$ or $|0\rangle^{\otimes n}$, and one edge
$(i,j)$ of the graph corresponds to a summation over the corresponding
indices of tensors $T_{i}$ and $T_{j}$. This is illustrated in
Figure~\ref{fig:Tensor-network-and-circuits}(a) for the computation of
$\langle x|\psi\rangle=\langle x|U|0\rangle^{\otimes n}$ for $n=3$
qubits. To compute the actual values $\langle\psi|\hat{O}|\psi\rangle$
or $|\langle x|\psi\rangle|^{2}$, one needs to perform the
contraction of the tensor network, namely actually perform the
summation over the internal variables. Two steps are necessary for
this: the determination of the contraction order (some orders yield
better computational complexities) and the contraction itself. The
determination of the optimal contraction order (that leads to the
smallest complexity) is a NP-complete problem. As already discussed in
Section~\ref{subsec:Tensor-network-methods}, finding this order
generically scales as $O({\mathrm{e}}^{T(G)})$, where $T(G)$ is the so-called
treewidth of the graph. For graphs corresponding to, e.g.\ $l\times L$
lattices, $T(G)\sim\mathrm{min}(l,L)$. Thus, if a circuit topology
roughly corresponds to a $n\times D$ lattice (where $n$ is the number
of qubits and $D$ the depth of the circuit), the generic cost of
finding the best tensor contraction order for this circuit is
$O({\mathrm{e}}^{\mathrm{min}(n,D)})$. The general space complexity of the
contraction is also of order ${\mathrm{e}}^{T(G)}$. This means that tensor network
contractions are in principle exponentially costly, making a strong
case for (perfect) quantum computers: they ``contract'' this tensor
network in polynomial time. However, real quantum computers come with
their own exponential burden, namely decoherence
(Equation~(\ref{eq:product_law})). The outcome of the comparison
between tensor networks and quantum computers thus depends on the
parameters of the circuit to be executed or simulated and on the
hardware properties. 

\subsubsection{Assessing quantum supremacy claims with grouped matrix
product states}\label{subsec:Debunking-quantum-supremacy}

\paragraph*{Google's supremacy claim with random quantum circuits}

The advent of large-scale quantum processors, with tens of qubits, in
the 2010s, led to efforts to design special experiments geared at
demonstrating the advantage of quantum computers over classical
computers. The first such claim was made in 2019 by Google's team using
transmon processors~\cite{Arute2019}, essentially coupled anharmonic
oscillators described by a Bose--Hubbard model. The task that Google
proposed as a possible low-hanging fruit for demonstrating quantum
advantage of these processors over classical algorithms was that of
sampling bitstrings in random, two-dimensional quantum circuits with 53
qubits. Such circuits are known to lead to a fast growth of
entanglement entropy~\cite{Boixo2017a}. They are therefore the best way
to beat the exponential decay of fidelity presented in
Section~\ref{subsec:The-exponential-effect} while being difficult
targets for tensor contraction techniques (and their exponential
scaling with entanglement).

The main figure of merit of Google's experiment was the so-called
linear cross-entropy benchmarking fidelity,
{\begin{equation}
\mathrm{XEB}=\frac{2^{n}}{\mathcal{N}}\sum_{i=1}^{\mathcal{N}}
P(x_{i})-1,\label{eq:XEB}
\end{equation}}\unskip
where $P(x)=|\langle x|\Psi\rangle|^{2}=|\langle 
x|U|0\rangle|^{2}$ is the probability of measuring bitstring $x$
after the execution of the random quantum circuit described by unitary
operator $U$, and $\{ x_{i}\} _{i=1\dots\mathcal{N}}$ are $\mathcal{N}$
samples measured in the experiment (note that $P(x)$ is exponentially
costly to compute, a fact that led us to propose an alternative
benchmark of quantum computers, see Section~\ref{subsec:Q-score}
below). Under certain assumptions, this number can be proven to equal
the state fidelity $\mathcal{F}=\langle\Psi|\rho|\Psi\rangle$ of the
(noisy) final state $\rho$ obtained in the experiment. 

Google's claim essentially was that the run time needed to reach a
given XEB in the experiment (0.2\% in the 2019 experiment, down from
100\% for a perfect quantum computer) was orders of magnitude smaller
(200~s compared to 10,000 years) than that of performing a
classical computation to sample from the final distribution with a
similar quality. The figure of 0.2\% can be obtained directly (albeit
only roughly) from the product formula Equation~(\ref{eq:product_law})
with the experiment's error rates and number of operations. The figure
of 10,000 years was obtained by resorting to a particular way of
contracting the tensor network corresponding to the random quantum
circuit at hand. We sketch here the rough strategy: instead of
performing the full tensor network contraction, Google considered the
computation of $P(x)$ as the square modulus of the summation over the
internal indices of the tensor network:
{\[
P(x)=\left|\sum_{a_{1},a_{2},a_{3},b_{1},b_{2},
b_{3},c_{1},c_{2}}\left[\psi_{0}\right]_{a_{1}b_{1}c_{1}}
[u_{1}]_{a_{1}a_{2}}[u_{2}]_{a_{2}b_{1}a_{3}b_{2}}
[u_{3}]_{b_{2}c_{1}b_{3}c_{2}}[x_{1}]_{a_{3}}
[x_{2}]_{b_{3}}[x_{3}]_{c_{2}}\right|^{2}
\]}\unskip
(here spelled out in the simple case illustrated in
Figure~\ref{fig:Tensor-network-and-circuits}). They evaluated cost
$\mathcal{C}$ of evaluating one term (one fixed assignment of
$a_{1},a_{2},a_{3},b_{1},b_{2},b_{3},c_{1},c_{2}$) on a classical
computer. To get the total (naive) contraction cost, one then in
principle needs to multiply $\mathcal{C}$ by the number $\mathcal{A}$
of possible assignments or ``paths'' (in analogy to the Feynman path
integral technique,~\cite{Rudiak-Gould2006}). In practice, however, the
finite XEB or fidelity needs to be taken into account; it can be shown
that summing only a fraction $\mathcal{F}$ of the paths leads to a
state of fidelity $\mathcal{F}$ with respect to the perfect state (that
corresponds to all paths). Therefore, the back-of-the-envelope total
computational cost is $\mathcal{F}\cdot\mathcal{A}\cdot\mathcal{C}$.
The 10,000 years are obtained from a similar formula.

Many followup works challenged Google's estimation of the best classical
run time. Most relied on a combination of the general tensor network
contraction strategy outlined above and the Feynman-path approach:
the so-called tensor-slicing method selects only a subset of the internal
variables for a Feynman-like handling, and contracts, using usual
tensor-contraction strategies, the rest of the internal variables.
These methods~\cite{Pan2021,Pan2021a,Gray2021,Huang2021b} led
to greatly reduced estimations, down to a few seconds. However, due
to the aforementioned inherent exponential scaling of tensor network
contraction methods, these methods are hardly scalable, since they
are exponential in the number of qubits or depth of the circuit.

\paragraph*{A tensor network compression strategy}

A different strategy to tackle the problem of matching experimental
fidelities with a classical technique was proposed in~\cite{Zhou2020}.
It consists in performing the tensor network contraction in a
``Schr\"odinger''-style, as illustrated in
Figure~\ref{fig:Tensor-network-and-circuits}(b): the tensors are
contracted from the left to the right, i.e.\ following the time arrow,
like the evolution of a Schr\"odinger equation. However, to avoid the
computational cost of storing the full wave function and of performing
the matrix-vector multiplications corresponding to the contractions,
the wavefunction is stored not as a dense vector of $2^{n}$ complex
amplitudes, but as a matrix product state (see~\cite{Vidal2003} for a
first use of MPS to emulate quantum circuits). Each entangling gate is
followed by a compression step via a singular value decomposition
(SVD), see Figure~\ref{fig:Tensor-network-and-circuits}(c). The
singular values (also known as Schmidt coefficients in this context)
$s_{i}$ are truncated when their index is larger than the
bond-dimension $\chi$ of the MPS.


The fidelity of such a compression is given by the overlap between
the original MPS state $|\psi{\mathop{\rangle}}$ (with Schmidt coefficients
$s_{i}$) and the compressed state $|\tilde{\psi}{\mathop{\rangle}}$ (the truncated
singular values $\tilde{s}_{i}$ are also renormalized to preserve
the unity of the norm):
{\begin{equation}
f=\left|\langle\psi|\tilde{\psi}\rangle\right|^{2}=
\left(\sum_{i}s_{i}\tilde{s}_{i}\right)^{2}=
\sum_{i\leq\chi}s_{i}^{2}.\label{eq:compression_fid}
\end{equation}}\unskip

For random quantum circuits, the product formula
Equation~(\ref{eq:product_law}) can be shown to hold
(\hspace{1sp}\cite{Zhou2020}), so that the final fidelity can be estimated as
{\begin{equation}
\mathcal{F}_{\mathrm{MPS}}=\left|\langle\Psi|\Psi_{\mathrm{MPS}}
\rangle\right|^{2}\approx\prod_{k}f_{k},\label{eq:prod_formula_MPS}
\end{equation}}\unskip
where $f_{k}$ is the fidelity of the $k$th compression step. Such an
algorithm works, but the final fidelity $\mathcal{F}$ obtained for 2D
circuits of the type used by Google's team is far below the aimed-for
0.2\%. This failure comes from the numerous contraction steps and from
the fact that the singular values of the random quantum states
generated by the circuit have a very long tail, which means that
neglecting even high-index singular values causes a sizable error
$1-f=\sum_{i>\chi}s_{i}^{2}$.

This is true even when using so-called ``grouped MPS''~\cite{Zhou2020},
where some of the qubits are grouped into one tensor (instead of one
tensor per qubit for regular MPS), as illustrated in
Figure~\ref{fig:Grouped-MPS-representation}. Grouped MPS interpolate
between a dense representation of $|\Psi{\mathop{\rangle}}$ (of storage cost
$2^{n}$) and a plain-vanilla MPS representation (of storage cost
$\leq2\chi^{2}n$).

\begin{figure}
\includegraphics{fig08}
\caption{\label{fig:Grouped-MPS-representation}``Grouped'' MPS
representation of a 8-qubit quantum state $|\Psi{\mathop{\rangle}}$:
the eight qubits are partitioned into three groups of 3, 2 and 3 qubits
respectively, corresponding to three tensors $M^{(1)}$, $M^{(2)}$ and
$M^{(3)}$. A standard MPS would use $n$ tensors for $n$ qubits. Adapted
from~\cite{Ayral2022}. Copyright (2023) by the American Physical Society.}
\end{figure}

A subsequent work of ours~\cite{Ayral2023a} addresses this issue
by switching from a SVD compression strategy to a strategy inspired
by the density-matrix renormalization group (DMRG) method. Instead
of compressing the (grouped) MPS after each entangling gate via a
SVD, we optimize the MPS only after a certain number of layers of
entangling gates using the DMRG algorithm, namely by finding the MPS
of fixed bond dimension with largest overlap with the state obtained
by applying a number $N_{l}$ of entangling layers on the previous
MPS, as illustrated in Figure~\ref{fig:mps_dmrg_compression}(a) and
(b). To perform this optimization technique, we need to perform the
contraction of a network containing the previous MPS and $N_{l}$
new layers, with a cost of $O({\mathrm{e}}^{N_{l}})$ provided $N_{l}\leq n$
(using the treewidth estimation of the cost): indeed, the optimal
tensor $M^{(\tau)*}$ is given, up to a normalization constant, by
{\[
M^{(\tau)*}\propto F^{(\tau)},
\]}\unskip
with $F^{(\tau)}$ graphically defined in
Figure~\ref{fig:mps_dmrg_compression}(c), namely it is essentially a
$N_l \times n$ grid of tensors to be contracted.

\begin{figure}
\includegraphics{fig09}
\caption{\label{fig:mps_dmrg_compression}Compression step in the DMRG
algorithm. (a) General schematic of the compression step: one adds $K$
($N_l$ in the text) layers of the quantum circuit, then approximates
the resulting state with a MPS. (b) Tensor network representation of
the scalar product to be optimized. (c) Computation of the $F^{(\tau)}$
tensor. From~\cite{Ayral2022}. Copyright (2023) by the American Physical Society.}
\end{figure}

This new optimization strategy leads to larger compression fidelities,
as illustrated in Figure~\ref{fig:Error-per-gate}, which compares the
technique used in~\cite{Zhou2020} (dubbed time-evolving block
decimation, TEBD, in reference to a similar method for time-evolving
matrix product states) with two variants of our technique, the ``open''
and the ``closed'' simulation. The open strategy contracts the circuit
on the initial MPS from left to right, yielding an ``open'' MPS, namely
with arbitrary external indices. The closed strategy aims at computing
$\langle x|\Psi\rangle$ for a fixed bitstring $|x{\mathop{\rangle}}$: one fixes
the external indices to a set bitstring $x$ and contracts the circuit
both from the left and from the right, thus halving the effective depth
of the circuit and thus improving the compression error. The fact that
our method (in its two variants) outperforms the TEBD variant can be
ascribed to the lesser frequency of compression steps and, more
importantly, to the larger optimization freedom entailed by DMRG
compared to simple SVD compression, a well-known feature in
MPS~\cite{Schollwoeck2011}. 

Concretely, for Google's random circuits on 54 qubits (a sketch of
which is shown in Figure~\ref{fig:mps_google_circuits}(left)), we
obtain fidelities above the experimental fidelities for bond dimensions
of $\chi\sim50$, corresponding to a few hours of computations. More
importantly, the scaling of our simulation is not exponential in the circuit
depth $D$ nor in the number of qubits $n=n_{c}\times n_{r}$ (with
$n_{c}$ the number of columns and $n_{r}$ the number of rows of
the 2D grid of qubits). A rough estimate of the run time assuming
a grouping of the qubits by columns is 
{\begin{equation}
O(\chi^{3}2^{n_{c}}n_{r}N_{\mathrm{sweeps}}{\mathrm{e}}^{N_{l}}D/N_{l}).
\label{eq:total_cost}
\end{equation}}\unskip
The exponential in $n_{c}$ (which, in the worst case, is $\sqrt{n}$)
comes from the storage of dense vectors for a given column, while
the exponential in $N_{l}$ comes from the contraction over $N_{l}$
layers. $N_{\mathrm{sweeps}}$ is the number of DMRG sweeps. The $\chi^{3}$
scaling comes from the QR decompositions needed to make the MPS canonical,
a crucial requirement of DMRG. This makes the computation scalable
to quite large numbers of qubits compared to techniques scaling in
$2^{n}$ (instead of $2^{n_{c}}\sim2^{\sqrt{n}}$). We note in passing
that this $2^{\sqrt{n}}$ scaling would in principle disappear when
using 2D tensor networks instead of (grouped) MPS to represent the
state---with, however, the issues associated with 2D tensor networks
such as PEPS, that do not come with a canonical form and therefore
a polynomial contraction strategy (an exception being the recently
developed isometric tensor network states (isoTNS,~\cite{Zaletel2020}),
which however come with a steeper $O(\chi^{7})$ scaling).

\begin{figure}
\includegraphics{fig10}
\caption{\label{fig:mps_google_circuits}Circuits used in the MPS simulations. Left: Google (modified)
supremacy sequence. Right: QAOA circuit for a MaxCut problem.
From~\cite{Ayral2022}. Copyright (2023) by the American Physical Society.}
\end{figure}

\begin{figure}
\includegraphics{fig11}
\caption{\label{fig:Error-per-gate}Error per gate as a function of the bond dimension. Left:
Google supremacy random circuit. Right: QAOA circuit for a MaxCut
problem with 54 qubits. From~\cite{Ayral2022}. Copyright (2023) by the American Physical Society.}
\end{figure}

Another interesting feature of the quality of the approximate state
obtained through this method is that is strongly depends on the circuit
to be simulated: for random quantum circuits, reaching low enough error
rates requires quite large bond dimensions, while circuits used in the
QAOA algorithm~\cite{Farhi2014}, a variational quantum algorithm for
solving combinatorial optimization problems, illustrated in
Figure~\ref{fig:mps_google_circuits}(right), yield very low error
rates even for small bond dimensions, as illustrated in
Figure~\ref{fig:Error-per-gate}(right).

The reason for this quality variability among circuits is linked to
the entanglement generated by the circuit at hand: the fundamental
bottleneck of the method comes from the requisite bond dimension $\chi$.
It is related to the (bipartite) von Neumann entanglement entropy
$S$ of the state to be represented. $S$ is defined as the Shannon
entropy of the distribution of the squared singular values (or Schmidt
coefficients) of state $|\Psi{\mathop{\rangle}}$:
{\begin{equation}
S=-\sum_{i=1}^{\chi}s_{i}^{2}\log_{2}s_{i}^{2}.\label{eq:entanglement_entropy}
\end{equation}}\unskip

Since the entropy is convex, it is maximized by constant singular
values $s_{i}=\mathrm{const}=1/\sqrt{\chi}$ (the value of the constant
is chosen to ensure normalization of the state), leading so $S\leq\log_{2}\chi$.
Thus, a necessary condition for a MPS to be able to exactly represent
a state with an entanglement entropy $S$ is
{\begin{equation}
\chi\geq2^{S}.\label{eq:necessary_condition_bond_dim}
\end{equation}}\unskip

Usually, this statement is colloquially turned into a scaling of
$\chi\sim O(2^{S})$, which we will use in our analysis. It means that
the exponential wall of a MPS representation comes, as already
mentioned in Section~\ref{subsec:Tensor-network-methods}, from the
degree of entanglement of the state.

We can now come back to the difference between random quantum circuits
and QAOA circuits: random quantum circuits are known to generate a
fast (ballistic) increase of entanglement and quickly reach a volume
law entanglement, namely the maximum reachable entropy. Consequently,
any other circuit, like a QAOA circuit, will produce less entangled
states and thus be more easy to simulate with this method. More details
about this MPS simulation of Google's circuit can be found in the
original article~\cite{Ayral2022}.

\paragraph*{Noisy simulations with tensor networks}

In the previous subsection, we described a tensor-network method that
can mimic a quantum computation with a comparable level of fidelity.
However, the origin of the errors in the MPS algorithm we just
outlined, namely compression via the DMRG technique, is very different
from the origin of errors in actual quantum computers, that is,
decoherence.

In fact, tensor networks can also be used to simulate the evolution of
noisy quantum states. In principle, this task looks much
(exponentially) harder than simulating a perfect quantum computer,
since the cost of storing a density matrix $\rho$ on $n$ qubits is
$4^{n}$, compared to $2^{n}$ for a pure state. But since the cost of
tensor network computations is largely driven by the degree of
entanglement of the state to be represented, they should come with a
natural advantage for simulating noisy evolutions: intuitively,
decoherence destroys entanglement, and therefore a smaller bond
dimension (as per Equation~(\ref{eq:necessary_condition_bond_dim}))
should be needed to represent a noisy state.

The most straightforward extension of matrix product states to
represent operators such as the density matrix is the matrix product
operator (MPO) representation, illustrated in
Figure~\ref{fig:Two-representations-of-rho}(a). It allows to define an
entropy for mixed states called the operator-space or
matrix-product-operator entanglement entropy (OEE): it is the Shannon
entropy of the (renormalized) singular values
$s_{i}^{2}/\sum_{j}s_{j}^{2}$ of the MPO. 

\begin{figure}
\includegraphics{fig12}
\caption{\label{fig:Two-representations-of-rho}Two representations of density matrices. (a) Matrix product
operator (MPO). (b)~Matrix product density operator (MPDO). (c) MPS
representation of the purification $|\Psi_{\mathrm{tot}}\rangle$ of
$\rho$.}
\end{figure}

In the presence of depolarizing noise of probability $p$ (see 
Equation~(\ref{eq:depolarizing})), one observes, in 1D random quantum
circuits, that the entanglement entropy increases to a maximum value
of~\cite{Noh2020}:
{\begin{equation}
S_{\mathrm{OEE}}^{\mathrm{max}}\sim\frac{1}{p^{1/\alpha}},\label{eq:mpo_ee_scaling}
\end{equation}}\unskip
with $\alpha\approx2$, before decreasing once the optimal depth (which
is itself shorter and shorter as $p$ increases) is reached. This is
illustrated in the top row of Figure~\ref{fig:Density-matrix-simulations} by the ``MPO'' curve: the
entropy first increases and then decreases (the increase is almost
absent for $p=10$\%, which is a very strong noise level). This trend in
the entropy is reflected in the bond dimension: here, we adjusted the
bond dimension to discard all singular values below a certain threshold
$\epsilon$. Thus, the bond dimension automatically adjusts to the
entropy of the state: we do observe first an increasing bond dimension
(entanglement creation through a unitary evolution dominates), followed
by a decrease (entanglement destruction by noise dominates). This
behavior and the scaling law Equation~(\ref{eq:mpo_ee_scaling}) point
to the importance of hardware improvements to lower $p$: this will in
turn raise the bar for the bond dimensions needed to reproduce quantum
results.

\begin{figure}
\includegraphics{fig13}
\caption{\label{fig:Density-matrix-simulations}Simulations of random 1D
quantum circuits with $8$ qubits and depolarizing noise with MPOs and
two MPDO truncation strategies. {Top row}: evolution of the operator
entanglement entropy as a function of the number of layers. {Bottom
row}: evolution of the (bond-averaged) bond dimension for truncation
thresholds of $\epsilon=0.01$, 0.015 and $0.001$ for MPO, MPDO (LC) and
MPDO (IPD). {Left} {column: }depolarization level of $p=2$\%. {Right
column: }$p=10$\%. Adapted from~\cite{Muller2024}.} 
\end{figure}

A formal issue with a MPO representation of the density matrix is that
is does not guarantee the positive definiteness of the matrix upon
truncation of its singular values. This property can be imposed by
first decomposing $\rho$ as $\rho=AA^{\dagger}$ (with $A$ a
$2^{n}\times r$ matrix, with $r$ the rank of $\rho$) and then using a
MPO representation of $A$. This yields the so-called matrix product
density operator (MPDO) representation, illustrated in
Figure~\ref{fig:Two-representations-of-rho}(b). The internal vertical
bonds can be interpreted as representing environment sites (as opposed
to the external vertical bonds, which are physical system sites).
Indeed, a MPDO representation can also be thought of as the MPS
representation (see Figure~\ref{fig:Two-representations-of-rho}(c)) of
a given purification $\Psi_{\mathrm{tot}}$ of $\rho$, namely a state
$\Psi_{\mathrm{tot}}$ defined in an enlarged Hilbert space with
environmental degrees of freedom and such that
{\begin{equation}
\rho=\mathrm{Tr}_{{\mathrm{env}}}\left[|\Psi_{\mathrm{tot}}
\mathop{\rangle}{\mathop{\langle}}\Psi_{\mathrm{tot}}|
\right],\label{eq:purification}
\end{equation}}\unskip
with $\mathrm{Tr}_{\mathrm{env}}$ the partial trace on the environment. 

Thus, in addition to the usual ``horizontal'' bond dimensions $\chi_{i}$,
which essentially capture the degree of entanglement of the purification
$\Psi_{\mathrm{tot}}$, a MPDO is also parameterized by ``vertical''
bond dimensions $r_{i}$, which depend on the purity of the state
(a pure state should have $r_{i}=1$, namely $A$ is $2^{n}\times1$
in this case). Evolving a MPDO in time is similar to evolving a MPS
or a MPO (see Figure \ref{fig:Tensor-network-and-circuits}(c)),
except that compression steps involve both compressing both the horizontal
and the vertical bonds.

The issue with prior art in MPDO simulations of noisy random
circuits~\cite{Cheng2020} is that such a simple contraction strategy
(where both horizontal and vertical singular values are discarded below
a certain threshold) leads to ever increasing bond horizontal bond
dimensions, as illustrated in
Figure~\ref{fig:Density-matrix-simulations} (lower panels) on the MPDO
``LC'' curve. This is due to the fact that the horizontal bond
dimension carries not only entanglement within the system (as in
regular MPS), but also possibly within the environment: in
Figure~\ref{fig:Two-representations-of-rho}(c), it is clear that two
neighboring tensors share a bond that links a system-environment site
with another. A major flexibility of MPDO, however, is the freedom in
choosing a purification. In other words, one can attempt to reduce the
``horizontal'' entanglement by using the gauge freedom:
$\rho=AA^{\dagger}=(AU)(U^{\dagger}A^{\dagger})$ for any unitary $U$,
which is equivalent to transforming $|\Psi_{\mathrm{tot}}\rangle$ to
$U|\Psi_{\mathrm{tot}}\rangle$. One can optimize $U$ to reduce the
entanglement between environmental sites. This is in essence what we
propose in~\cite{Muller2024}. Of course, one cannot optimize over a
general $2^{n}\times2^{n}$ unitary $U$. Rather, in the spirit of e.g.\ 
two-site DMRG, we optimize over two-site unitaries acting on two
neighboring tensors. This leads to the suppression of the ever
increasing horizontal bond dimension, as shown in 
Figure~\ref{fig:Density-matrix-simulations} (bottom row). The requisite bond
dimensions for MPDOs is now similar to that of MPOs.

\paragraph*{Circuit cutting: a hybrid tensor contraction strategy}

Let us conclude this section by mentioning that tensor network
contraction is also at the heart of work we have conducted to achieve
the practical task of executing a quantum circuit with more qubits than
the number of qubits available on the available quantum processor. In
these works~\cite{Ayral2020,Ayral2021}, we examine a hybrid quantum
classical scheme where the tensor network corresponding to the circuit
is cut into \mbox{subnetworks}. Each subnetwork is then ``contracted'' by
translating it back to a (small) quantum circuit that is executed on a
quantum processor. The individual subnetwork results are combined by
contracting obtained tensors. The exponential classical complexity of
the overall task is then reduced to an exponential in the number of
cuts needed to disect the whole tensor network into subnetworks whose
corresponding circuits can fit on the available processor. Noteworthy,
this technique, also called ``circuit knitting'', seems to be of
importance in roadmaps for large-scale processors~\cite{Mohseni2024}.
One main challenge is to balance the classical and quantum costs by
optimizing the cut locations.

\subsubsection{Matrix product states as an antidote to decoherence:
compressing quantum
circuits}\label{subsec:Matrix-product-states-trotter}

In the previous section, we argued that matrix product states can
be used to replicate finite-fidelity experiments with random quantum
circuits, and to emulate noisy quantum circuits in a realistic fashion.
In this section, we aim at showing that they can also be
used to ``give a leg up'' to noisy quantum computers.

The goal of the method, presented in a recent publication
(\hspace{1sp}\cite{Anselme-Martin2023}), is to study the dynamics of a 1D
transverse-field Ising model (TFIM)
{\begin{equation}
H=\sum_{i=1}^{n}Z_{i}Z_{i+1}+h\sum_{i=1}^{n}X_{i}\label{eq:TFIM}
\end{equation}}\unskip
after a sudden quench of the transverse field from 0 to a finite value
$h$, starting from the antiferromagnetic spin state. The entanglement
entropy $S$ of the state of such a system typically increases linearly
with time~\cite{Calabrese2005}, making it a hard target for MPS
methods, due to their $\chi\sim O(2^{S})$ scaling. While it seems to be
an easy target for quantum computing via a Trotterization of
$\mathrm{e}^{-\mathrm{i}Ht}$, the exponential suppression of the
fidelity presented in the previous sections poses a serious practical
challenge. Indeed, to keep a fixed Trotter error, one needs to increase
the number of Trotter steps with time, meaning a circuit depth also
increasing with time. Therefore, the final fidelity (given by the
product formula (\ref{eq:product_law})) is going to decrease
exponentially with time (in the absence of error correction), thus
starkly degrading the quality of a purely quantum computation.

In~\cite{Anselme-Martin2023}, we propose to use a MPS and MPO-based
time evolution to produce better (shallower) quantum circuits than
those one would have obtained with a simple Trotter evolution.
Two main stages, illustrated in Figure~\ref{fig:sketch-qmps}, have 
to be distinguished. The first stage consists
in performing a standard TEBD time evolution up to the maximal time
$t_{\mathrm{max}}<t_{\mathrm{f}}$ reachable by the classical computer.
This part of the algorithm is constrained by the growth of entropy
with time, which means that a large enough $\chi$ should be chosen
to avoid large compression errors until, at some point, one reaches
the random access memory limits of the classical processor. One then
converts the so-obtained MPS into a quantum circuit, using recent techniques
to perform this conversion~\cite{Lin2021}: we optimize a staircase
circuit of $N_{l}$ layers, with maximum bond dimension $2^{N_{l}}$
to have the largest overlap with the MPS state obtained with TEBD.
Said circuit is much shallower than the circuit that would have been
obtained by simple Trotterization. Intuitively, one could interpret
this by saying that the TEBD procedure selects the circuit that will
be the most efficient in terms of entropy creation (as opposed to
a Trotter evolution, where smaller Trotter errors means more layers,
each of which generates less entanglement than with larger, yet more
discretization-error-prone, steps). 

\begin{figure}
\includegraphics{fig14}
\caption{\label{fig:sketch-qmps}Sketch of the QMPS method. (a) Sketch of the
evolution of the von Neumann entanglement entropy as a function of
time. (b) TEBD evolution. Right: corresponding
staircase layer quantum circuit. Adapted
from~\cite{Anselme-Martin2023}. Copyright (2024) by the American Physical Society.}
\end{figure}


The second stage consists in first representing subsequent Trotter
steps with a MPO, and then turning this MPO into a staircase circuit
that is appended to the circuit obtained in the first stage. 

We first perform noisy simulations of the performance of three
different time evolution strategies: MPS-TEBD alone, quantum Trotter
evolution alone, and our QMPSO hybrid method. The respective state
fidelities are compared in the left and middle panels of
Figure~\ref{fig:results-qmps}: for short times, MPS alone is the best
method as it is essentially perfect (small times mean small enough bond
dimensions), while the quantum computer will suffer from decoherence.
For longer times, which method is best depends on the noise level: for
large noise levels, as expected, MPS yields better fidelities as the
quantum state is degraded by quantum noise. For very small noise
levels, the quantum Trotter evolution is essentially perfect (up to
Trotter errors, which can be made small because longer circuits will
not suffer from very weak decoherence), while the MPS-TEBD evolution is
plagued with large truncation errors (the maximum affordable bond
dimension is smaller than what the entropy increase with time would
dictate). Thus, the quantum Trotter evolution is the best method. In
between, namely in a range between about 0.01\% and 1\% depolarizing
error (supposing the maximum bond dimension on the classical computer
is $\chi=32$, an artificially small budget), our hybrid QMPSO method
wins. This error range contains most NISQ hardware.

\begin{figure}
\includegraphics{fig15}
\vspace*{-5pt}
\caption{\label{fig:results-qmps}{Results of the QMPS method}. {Left}: theoretical
diagram of the respective advantage zones of MPS (purely classical),
QMPSO (hybrid classical ${+}$ quantum) and Trotterization (purely quantum)
as a function of depolarizing error rate $\epsilon$ and (dimensionless)
evolution time $Jt$ for $\chi=8$ (left) and $\chi=32$. {Right}:
Evolution in time of the local magnetization for a $n=10$ Ising chain.
QC Trotter and QMPSO involve experimental runs on ibmq\_kolkata
hardware. Adapted from~\cite{Anselme-Martin2023}. 
Copyright (2023) by the American Physical Society.}
\vspace*{-5pt}
\end{figure}

We test this prediction through experimental runs on a transmon
processor of the IBM company. We compare a MPS-only algorithm (with a
large compression error due to a limited bond dimension), a
quantum-only algorithm (with a large error coming from decoherence and
Trotter discretization), and our hybrid method, for a TFIM simulation
on 10 spins (qubits). We observe, on the right panel of
Figure~\ref{fig:results-qmps}, that the hybrid method indeed
outperforms the other two methods.

Of course, this work calls for future improvements: in practice,
advanced MPS implementations can reach much larger bond dimensions, so
that the window for advantage using our hybrid method versus a purely
quantum Trotter evolution is much smaller, namely it requires very low
quantum error rates. A more realistic setting for a quantum advantage
of quantum-enhanced time evolution (whether hybrid or purely quantum)
is time evolutions on two-dimensional lattices. There, MPS are quickly
limited by their 1D structure, and PEPS face many difficulties, among
which the fact that their exact contraction is exponentially costly, as
opposed to MPS. Therefore, a hybrid method turning PEPS into quantum
circuits~\cite{Schwarz2012,Schwarz2012a} appears very 
promising.\looseness=-1

\subsubsection{Perspective: quantum advantage and tensor networks}

In the sections above, we explained how tensor networks could be used
to assess and challenge the validity of quantum advantage claims made
for a very specific task, that of sampling from the output distribution
of random quantum circuits. These classical simulation efforts pushed
back the boundary of quantum advantage, as evidenced in subsequent
publications by e.g.\ Google's team like~\cite{Morvan2023}, who reassessed
the classical simulation time to 6.18~s down from the original
10,000 years. However, hardware improvements were made both on the
Sycamore processor and on a Chinese superconducting processor called
Zuchongzhi~\cite{Wu2021}. On the former, the new bar is set to
47.2 years. Classical algorithms could yet again emerge to redefine
this boundary, but the actual challenge of quantum processors lies
elsewhere. Indeed, if quantum processors are to be deemed useful,
they need to outperform classical processors for ``useful'' applications,
be they as specialized as quantum many-body physics.

First claims in this direction were made by~\cite{Kim2023}. There, the
problem at hand was a quench of a 127-spin transverse-field Ising model
Equation~(\ref{eq:TFIM}) and the subsequent time evolution. 
Reference~\cite{Kim2023} claimed that their superconducting processor, after the
appropriate error mitigation, could obtain essentially exact
time-evolved observables in regimes where tensor network methods like
MPS or isometric tensor networks failed. However, a few weeks after
this publication, a series of publications appeared that succeeded in
using classical simulation methods to reproduce the results of IBM's
teams:~\cite{Tindall2023} used a PEPS representation of the state with
an approximate belief propagation algorithm,~\cite{Begusic2024} used a
sparse Pauli dynamics algorithm,~\cite{Kechedzhi2024} used a simple
numerical mitigation strategy to extrapolate the experimental results
from a 30-qubit simulation, while~\cite{Torre2023} performed a
single-site dissipative mean-field computation and got reasonable
agreement with the experimental results. The ease with which these
various classical \mbox{methods} reproduced the experimental results can be
interpreted as follows: the low connectivity of the IBM device (most
qubits have 2 neighbors, a few 3 neighbors) make it an easy target for
tensor networks, which are designed to perform well in low dimensions
(like one dimension, cf. MPS), or graphs that are close to trees. This
is the case for IBM's processor, which is locally very close to a tree.
There, approximate PEPS contraction methods work very well. The level
of noise in the device likely also plays an important role: it leads to
smaller correlation lengths, which (i) justify why the large loops on
IBM's device ``look'' like trees (the correlation length is not long
enough that interferences within a loop can happen), (ii) explains why
methods which rely on a small causal cone (like~\cite{Kechedzhi2024})
work well. Also, noise (in particular depolarizing noise) kills ``Pauli
paths'' with high weight, making Pauli sparse dynamics techniques more
efficient~\cite{Begusic2024}.

In the light of these observations, one can conclude that one of the
main challenges for quantum processors today is their ability to
generate states with a large correlation length despite decoherence.

In the next section, we turn to hybrid techniques that could help in
the quest for this goal.

\subsection{Hybrid algorithms for fermionic
problems}\label{subsec:fermions}

In the previous subsection, we saw how tensor-network techniques could
be used either to assess the potential advantage of quantum computers
compared to classical methods, or as a way to build more noise-robust
circuits to study many-spin problems such as the transverse-field Ising
model.

In this section, we review our recent works on the use of quantum
processors to tackle many-electron problems, whether on gate-based
processors (Section~\ref{subsec:Optimizing-the-single-orbital}) or on
analog processors (Section~\ref{subsec:Rydberg-atoms-slave-spin}). Our
guiding principle will be to generate correlated states with the
shortest possible circuits or time evolutions. This goal is all the
more challenging with fermions as they come with complications due to
their antisymmetry properties, which typically, through fermionic
encodings, lead to more complex time evolutions.

\subsubsection{Optimizing the single-orbital
basis}\label{subsec:Optimizing-the-single-orbital}


As already discussed, one main limitation of current processors is the
limited depth of the circuits that can be executed. In this subsection,
we explore a way to reduce the depth required for performing
many-fermion computations with quantum processors.

In near-term quantum algorithms like the variational quantum
eigensolver (VQE, see discussion in
Section~\ref{subsec:Algorithms-for-near-term}), one main challenge is
the choice of ans\"atze that are both expressive enough (so that they
stand a chance of approximating the sought-after ground state) and
short enough (so as to reduce the quality degradation induced by
decoherence). In two recent works~\cite{Besserve2021,Besserve2024}, we
present a method to keep the same expressivity while reducing the
circuit depth. This gain is obtained by changing the orbital basis in
which the problem is represented.

The link between fermionic orbitals and qubits is routinely made by
fermion-to-spin transformations. There is a wide variety of them, which
differ in the ratio of qubit count to orbital count and locality
properties of the qubit Hamiltonian obtained after the transformation.
Here, we will focus on the most straightforward transform, called the
Jordan--Wigner transform, that we already mentioned previously. It maps
fermionic orbital occupations (0 or 1 due to the Pauli principle) to
qubit states ($|0{\mathop{\rangle}}$ or $|1{\mathop{\rangle}}$). The $i$th qubit is thus
directly related to the $i$th orbital (for instance a site orbital
$\phi_{i}(r)$ localized around the $i$th site in the Hubbard model,
Equation~(\ref{eq:Hubbard})). In such a representation, a Fock state
$\prod_{k=1}^{N_{\mathrm{el}}}c_{i_{k}}^{\dagger}|0{\mathop{\rangle}}$ (orbitals
$i_{1},\dots,i_{k}$ are filled) is represented by a computational state
with ones at locations $i_{1},\dots,i_{k}$ and zeros elsewhere. Such
states are very easy to generate: a circuit with $X$ gates on qubits
$i_{1},\dots,i_{k}$ will create them (see
Figure~\ref{fig:orbital_basis}(b)). In other words, in the right basis
(the Fock basis), a Fock state is a trivial computational basis state.

\begin{figure}
\includegraphics{fig16}
\caption{\label{fig:orbital_basis}The importance of the orbital basis. (a,c)  Original
basis. (b,d)~Optimized basis (natural orbitals). (a,b)  Noninteracting case. 
(c,d)~Interacting case. $G_{k}$
denote Givens rotations (Gaussian gates).}
\end{figure}

This statement is not true in any other basis. For instance, our Fock
state,
$|\psi{\mathop{\rangle}}=\prod_{k=1}^{N_{\mathrm{el}}}c_{i_{k}}^{\dagger}|0{\mathop{\rangle}}$,
relative to orbitals $\{ \phi_{i}(r)\} _{i=1\dots n}$,
becomes much more complicated if expressed in another (rotated) basis
$\varphi_{\alpha}(r)=\sum_{i}V_{\alpha i}\phi_{i}(r)$. Using
$c_{i}^{\dagger}=\sum_{\alpha}V_{i\alpha}^{\dagger}\tilde{c}_{\alpha}^{\dagger}$,
we see that
$|\psi{\mathop{\rangle}}=\sum_{\alpha_{1}\dots\alpha_{n}}A_{\alpha_{1}
\dots\alpha_{n}}\prod_{k=1}^{N_{\mathrm{el}}}\tilde{c}_{\alpha_{k}}^{\dagger}|0{\mathop{\rangle}}$
(with $A$ a function of the $V$ coefficients). If the qubits are
mapped, via the Jordan--Wigner transformation, to the $\varphi_{\alpha}$
orbitals instead of $\phi_{i}$ orbitals, the circuit to build
$|\psi{\mathop{\rangle}}$ is thus much more complex than the simple circuit with
only $X$ gates that we outlined above. This remark leads to the notion
that the orbital basis can be optimized to reduce circuit depth (to
construct one and the same state).

This statement easily extends to ``mean-field states'' or Hartree--Fock
states, namely single Slater determinants: they can always be written
in the form
$\prod_{k=1}^{N_{\mathrm{el}}}\tilde{c}_{\alpha_{k}}^{\dagger}|0{\mathop{\rangle}}$,
with $\tilde{c}_{\alpha}^{\dagger}$ creating an electron in the
$\varphi_{\alpha}(r)$ orbital. In this basis, the corresponding circuit
will consist only of single-qubit rotations
(Figure~\ref{fig:orbital_basis}(b)). In other bases, a circuit made up
of so-called Givens rotations~\cite{Kivlichan2018}, which are
entangling gates, will have to be used (see
Figure~\ref{fig:orbital_basis}(a)). This class of gates essentially
implements the orbital rotation $V$ in the many-body Fock space. In
other words, to prepare a Slater determinant, either one does not use
the right orbital basis and uses a potentially long and thus
error-prone circuit with Givens rotations, or one changes the orbital
basis of the qubits so that the corresponding circuit becomes trivial.

For strongly-correlated states (or multireference states in a quantum
chemistry context), no orbital rotation can lead to a single Slater
determinant and hence to a trivial circuit: one expects that they will
require more complex circuits, since they are linear combinations of
(usually a large number of) uncorrelated (product) states, and are thus
highly entangled states. However, one also expects the orbital basis to
play a role in the complexity of the corresponding state preparation
circuit.

For these generic states (as opposed to single Slater determinants),
no clear mathematical statement as to the optimal single-particle
basis has been made, to our best knowledge. However, quantum chemists
have long used a particular basis called the natural-orbital (NO)
basis~\cite{Lowdin1955}. It is defined relatively to a given state
$|\psi{\mathop{\rangle}}$ as the basis that diagonalizes the 1-RDM (which we
already encountered, see Equation~(\ref{eq:1RDM})):
{\[
D_{ij}=\langle\psi|c_{i}^{\dagger}c_{j}|\psi\rangle.
\]}\unskip

This basis is usually loosely claimed to be the basis that minimizes
the number of Slater determinants required to represent a given state
$|\psi{\mathop{\rangle}}$. While this is strictly true for Slater determinants
as discussed in the previous paragraph, it is unclear that it is true
for generic states (in fact it is proven for two-electron states but
probably wrong for more-than-two-electron states).

Despite these caveats, we expect that this basis will provide the
simplest possible circuit implementation for the state preparation. In
a way, rotating to the NO basis should strip the ``unnecessary'' Givens
rotations from the circuit, as illustrated in
Figure~\ref{fig:orbital_basis}(c,d).

In practice, to perform the computation in the NO basis, one needs to
compute the 1-RDM $D$, which requires the knowledge of $|\psi{\mathop{\rangle}}$,
and then diagonalize $D=VnV^{\dagger}$. The basis
$\chi_{\alpha}=\sum_{i}V_{\alpha i}\phi_{i}$ is then the NO basis. Yet,
in chemistry (and in VQE), the state of interest, $|\psi{\mathop{\rangle}}$, is
unknown since it is precisely the ground state that one is looking for.
We thus propose the following iterative procedure (inspired by
e.g.~\cite{Lu2014,Lu2019}) to find an approximate NO basis, 
as illustrated in Figure~\ref{fig:noization-sketch}:
\begin{enumerate}[(1)]
\item Start from the original (e.g.\ site) basis $\phi_{i}$.  
\item Perform a VQE step to obtain an approximate ground state
$|\psi(\theta^{*}){\mathop{\rangle}}$ in this basis
\item Compute the 1-RDM corresponding to this state. Diagonalize it to find
the orbital-rotation matrix $V$. 
\item Transform the qubits (i.e., in practice, the Hamiltonian $H$ on which
the VQE is applied) to the basis $V$ and go back to step $2$ until
convergence.
\end{enumerate}
We applied this method to the Anderson impurity Hamiltonian
(\ref{eq:AIM}) in~\cite{Besserve2021} and the Hubbard model
in~\cite{Besserve2024}, and find that (i) the iterative procedure
(dubbed NOization) usually converges to the true NO basis, and (ii) one
can afford to use shorter VQE ansatz circuits when using this basis
than when using the original basis (which is uneconomical in terms of
the number of Slater determinants to represent the ground state, and
thus the complexity of the corresponding circuit). This shortening of
the circuits in turn leads to a better robustness to noise.

\begin{figure}
\includegraphics{fig17}
\caption{\label{fig:noization-sketch}Sketch of the (adaptive) NOization
procedure.}
\end{figure}


In~\cite{Besserve2024}, we refined the method by (i) studying its
robustness to shot noise and (ii) combining it to the recently proposed
adaptive-VQE schemes~\cite{Grimsley2019,Tang2019}.


The importance of point (i) is linked to the fact that changing the
orbital basis comes at a cost: in general, it increases the number of
Pauli terms in the Pauli representation of the Hamiltonian at hand. For
instance, while the Hubbard model (Equation~(\ref{eq:Hubbard})) has
$O(n)$ Pauli terms after a Jordan--Wigner transformation, after a
generic orbital transformation, it can have up to $O(n^{4})$ terms.
This should play a role when computing the variational energy
$E(\boldsymbol{\theta})$ using a finite number of shots. In particular,
we need to assess whether the statistical precision for a given shot
budget does not suffer too much from a rotation to the (approximate)
natural orbital basis. We observe (Figure~\ref{fig:noization-nshots}(left)) 
that while NOization does reduce the sensitivity to decoherence
noise (one observes a linear bias with increasing noise ratio $r$ for
standard VQE, while with NOization, which can afford a much shallower
circuit, noise has virtually no impact), it also does not increase
sensitivity to shot noise, at least in this example. This can be
ascribed to the quite sharply peaked distribution of the Pauli
coefficients despite their increased number
(Figure~\ref{fig:noization-nshots}(right)). This leads to a one-norm
(shown in the inset) that does not significantly increase upon orbital
rotation, and thus a similar shot noise (since the one-norm upper
bounds shot noise, see Equation~(\ref{eq:shot_noise_error_bound})).

\begin{figure}
\includegraphics{fig18}
\vspace*{-5pt}
\caption{\label{fig:noization-nshots}Interplay of shot noise and natural orbitalization.
Left: relative VQE error (including bias and variance) as a
function of the noise ratio to typical NISQ levels of noise for a
standard VQE run with a LDCA ansatz (solid lines) compared to a
natural-orbitalization strategy with the fSim circuit (dashed lines),
half-filled Hubbard dimer, $U/t=1$. Right: distribution of the
Pauli coefficients $|\lambda_{i}|$ for successive NOization steps at
$r=0.01$. Inset: one-norm as a function of the NOization step.
Reproduced from~\cite{Besserve2024}.}
\vspace*{-5pt}
\end{figure}

Let us now focus on the use of adaptive techniques (ii). The goal here
is to take advantage of the fact that rotating the basis in principle
allows for shorter circuits. However, with standard VQE, this means
changing the circuit ``manually'' after each rotation. Adaptive
techniques should solve this issue by automatically constructing a
circuit that is well suited to the new orbital basis. Adaptive methods
iteratively construct the variational circuit by adding new gates based
on their potential to decrease the variational energy at the next step,
which is measured by the corresponding energy gradient. In particular,
they come with a natural stopping criterion: the ansatz construction
stops when the gradients vanish, i.e.\ when a minimum (hopefully a global
one) of the energy landscape is reached. In~\cite{Besserve2024}, we
observe that the size (number of gates) of the iteratively constructed
ansatz decreases when gradually rotating to the approximate
natural-orbital basis, and that the energy reached by the method gets
closer to the ground state energy at the same time
(Figure~\ref{fig:noization-adaptive}).

\begin{figure}
\includegraphics{fig19}
\vspace*{-5pt}
\caption{\label{fig:noization-adaptive}Results of the NOization procedure with an adaptive ansatz
construction: converged VQE energy as a function of the ADAPT-VQE step
(one operator in the ADAPT-VQE pool corresponds to a few additional
gates in the ansatz). Reproduced
from~\cite{Besserve2024}.}
\vspace*{-5pt}
\end{figure}

This investigation of the gradual transformation of the orbital basis
to the natural orbitals shows the promising potential of the method.
Much remains to be done: the true target application is the Anderson
impurity model, with the goal of successfully preparing a good ground
state and then computing the corresponding 1-RDM (for DMET or RISB) or
Green's function (for DMFT). This orbital rotation could be combined
with a recent proposal by~\cite{Jamet2023} (similar to the QMPS method
of Section~\ref{subsec:Matrix-product-states-trotter}) to use a tensor
network method to compute an approximate AIM ground state and then load
it on the quantum processor to compute the Green's function.

\subsubsection{Rydberg atoms for Mott physics through a slave-spin
mapping}\label{subsec:Rydberg-atoms-slave-spin}

Most quantum algorithms for studying fermionic many-body problems use
(i) fermion-to-qubit encodings like the Jordan--Wigner transformation,
and (ii) are for designed gate-based quantum processors. In this
section, we explore a method that avoids the overheads of the encodings
and that is particularly well-suited (but by no means restricted to)
analog Rydberg processors. 

\paragraph*{Rydberg processors}

These processors are described by the following
Hamiltonian~\cite{Browaeys2020}:
{\begin{equation}
\hat{H}_{\mathrm{Rydberg}}=\sum_{i\ne j}\frac{C_{6}}{|\mathbf{r}_{i}-
\mathbf{r}_{j}|^{6}}\hat{n}_{i}\hat{n}_{j}-\hbar\delta(t)\sum_{i}
\hat{n}_{i}+\frac{\hbar\Omega(t)}{2}\sum_{i}\hat{S}_{i}^{x},\label{eq:Rydberg_hamiltonian}
\end{equation}}\unskip
where $C_{6}/|\mathbf{r}_{i}-\mathbf{r}_{j}|^{6}$ is the van der
Waals interaction between atoms located at positions $\mathbf{r}_{i}$
and $\mathbf{r}_{j}$, $\delta(t)$ and $\Omega(t)$ are the detuning
and Rabi drives, respectively. The atoms are described as an assembly
of two-level systems where level $|0{\mathop{\rangle}}$ is the atom at rest
and $|1{\mathop{\rangle}}$ is a highly-excited Rydberg state of the atom.


``Programming'' such a processor consists in finding ways to manipulate
the quantum state of the machine using this Hamiltonian, with three
main knobs: the positions $\{ \mathbf{r}_{i}\} $ of the atoms, the Rabi
drive $\Omega(t)$ and the detuning drive $\delta(t)$. The Hamiltonian
(\ref{eq:Rydberg_hamiltonian}) realized experimentally in Rydberg
processors---essentially an antiferromagnetic transverse-field Ising
model (TFIM)---represents a difficult many-body problem to simulate for
classical computers, especially in out-of-equilibrium situations. This
difficulty results from to the combination of a large number of
particles (196), long correlation lengths (7 lattice spacings) and true
two-dimensional geometry reported in recent
experiments~\cite{Scholl2020}. These properties make their classical
simulation by the most advanced tensor-network techniques (like MPS or
PEPS) a very tall order (as opposed to the TFIM of~\cite{Kim2023},
which is defined on quasi-1D heavy-hex geometry).

As we will discuss below in Section~\ref{subsec:Rydberg-udmis}, this
Hamiltonian has ``straightforward'' candidate use cases such as the
unit-disk maximum independent set problem. However, for fermionic
many-body problems that are the focus of this section, one must
reconcile two seemingly very different problems: an interacting fermion
problem and an interacting spin problem.  Although there exists
straightforward transformations from fermionic variables to spin
variables (as briefly explained in
Section~\ref{subsec:ground-state-energies}), they all come with an
overhead in terms of circuit depth either because they give up on
locality or because they require additional qubits: On the one hand,
the most widespread transformations like the Jordan--Wigner
transformation, parity or Bravyi--Kitaev transformations (see
e.g.~\cite{Havlicek2017} for a unified view) lead to a more-or-less
drastic loss of locality of the Hamiltonian. For instance, as we
already saw before, a $c_{i}^{\dagger}c_{j}$ term leads to terms like
$X_{i}Z_{i+1}\cdots Z_{j-1}X_{j}$ terms in the corresponding spin
Hamiltonian. This in turn leads to long circuits~to implement e.g.\ time
evolutions with $\mathrm{e}^{-\mathrm{i}X_{i}Z_{i+1}\cdots
Z_{j-1}X_{j}t}$ operators, which require a $O(n)$ depth ($O(\log(n))$
for the Bravyi--Kitaev transformation). On the other hand, other
techniques like the Verstraete--Cirac
transformation~\cite{Verstraete2005} require auxiliary qubits, and thus
ultimately longer circuits since deeper circuits are needed to entangle
more qubits. 

What is more, the analog Rydberg platform we are considering comes with
a controllable but fixed Hamiltonian that differs from the spin
Hamiltonian resulting from the aforementioned fermion-qubit
transformations.

\paragraph*{A slave-particle method to disentangle fermionic and spin
degrees of freedom}

In a recent work~\cite{Michel2023a}, we propose a way to circumvent the
fermion-to-spin conversion overhead by resorting to an existing
technique called the slave-spin
method~\cite{DeMedici2005,Ruegg2010,deMedici2016}, a simplification of
the slave-boson method we briefly mentioned in
Section~\ref{sec:Many-body-problems:-a}. In its $\mathbb{Z}_{2}$
flavor~\cite{Ruegg2010}, it consists in rewriting the original
fermionic creation operators $c_{i\sigma}^{\dagger}$ as
{\begin{equation}
c_{i\sigma}^{\dagger}=S_{i}^{z}f_{i\sigma}^{\dagger},
\label{eq:slave_spin_decoupling}
\end{equation}}\unskip
where $S_{i}^{z}$ is the Pauli-$z$ spin matrix, and $f_{i\sigma}^{\dagger}$
creates a (pseudo) fermion at site $i$. Since the corresponding Hilbert
space is larger than the original one, one imposes contraints to project
states back onto a so-called physical Hilbert space in one-to-one
correspondence to the original. The constraints read
{\begin{equation}
S_{i}^{x}+1=2(n_{i}^{f}-1)^{2}\label{eq:constraints}
\end{equation}}\unskip
at each site (with
$n_{i}^{f}=\sum_{\sigma}f_{i\sigma}^{\dagger}f_{i\sigma}$), so that
only 4 states among the 8 possible local states are allowed, as in the
original model. These four states (left column below) as well as the
corresponding original states (right column) are:
{\begin{eqnarray*}
|0{\mathop{\rangle}}|+{\mathop{\rangle}}_{x} & \leftrightarrow&|0{\mathop{\rangle}}\\
|\uparrow{\mathop{\rangle}}|-{\mathop{\rangle}}_{x} & \leftrightarrow&|\uparrow{\mathop{\rangle}}\\
|{\downarrow}{\mathop{\rangle}}|-{\mathop{\rangle}}_{x} & \leftrightarrow&|{\downarrow}{\mathop{\rangle}}\\
|\uparrow{\downarrow}{\mathop{\rangle}}|+{\mathop{\rangle}}_{x} & \leftrightarrow&|\uparrow{\downarrow}{\mathop{\rangle}}.
\end{eqnarray*}}\unskip

The method thus a priori consists in tackling this larger model and
then projecting it back onto the subspace satisfying
(\ref{eq:constraints}). Approximate strategies to handle the model in
this larger space will be needed (as for the original Hubbard model),
but the usual conjecture is that approximations in the larger space
will lead to better results than a similar level of approximation in
the original model.

Plugging Equation~(\ref{eq:slave_spin_decoupling}) into the original
Hubbard model (Equation~(\ref{eq:Hubbard})) and using the constraint
(\ref{eq:constraints}) leads to
{\begin{equation}
H=\sum_{ij\sigma}t_{ij}S_{i}^{z}S_{j}^{z}f_{i\sigma}^{\dagger}
f_{j\sigma}+\frac{U}{4}\sum_{i}S_{i}^{x}.\label{eq:Hubbard_slave_spin}
\end{equation}}\unskip

We now perform a mean-field approximation to decouple the fermion-spin
coupling terms:
{\[
S_{i}^{z}S_{j}^{z}f_{i\sigma}^{\dagger}f_{j\sigma}
\approx\langle S_{i}^{z}S_{j}^{z}\rangle 
f_{i\sigma}^{\dagger}f_{j\sigma}+S_{i}^{z}S_{j}^{z}\langle 
f_{i\sigma}^{\dagger}f_{j\sigma}\rangle+\mathrm{const}.
\]}\unskip
This yields
{\begin{equation}
H\approx\left\{\sum_{ij\sigma}Q_{ij}f_{i\sigma}^{\dagger}
f_{j\sigma}\right\}+\left\{ \sum_{ij}J_{ij}S_{i}^{z}
S_{j}^{z}+\frac{U}{4}\sum_{i}S_{i}^{x}\right\} ,
\label{eq:slave_spin_hamiltonian}
\end{equation}}\unskip
namely the Hamiltonian turns into the sum of a free-electron
Hamiltonian with renormalized hopping $Q_{ij}=t_{ij}\langle
S_{i}^{z}S_{j}^{z}\rangle$ and a transverse-field Ising model with
Ising coupling $J_{ij}=\sum_{\sigma}t_{ij}\langle
f_{i\sigma}^{\dagger}f_{j\sigma}\rangle$. The ground state of $H$ reads
$|\Psi{\mathop{\rangle}}=|\psi_{\mathrm{f}}{\mathop{\rangle}}\otimes|\psi_{\mathrm{s}}{\mathop{\rangle}}$
where $|\psi_{\mathrm{f}}{\mathop{\rangle}}$ and $|\psi_{\mathrm{s}}{\mathop{\rangle}}$ are
the respective ground states of the fermionic and spin Hamiltonians.
They need to be computed in a self-consistent fashion because the
hopping and Ising interaction matrices depend on the solution through
$\langle S_{i}^{z}S_{j}^{z}\rangle$ and $\langle
f_{i\sigma}^{\dagger}f_{j\sigma}\rangle$. Thus, the slave-spin
decoupling translates (at the cost of the mean-field decoupling of the
fermion and spin variables) the Hubbard model into a free fermionic
model (which carries the fermionic nature of the original problem)
self-consistently coupled to an interacting spin model (which carries
the interacting nature of the original problem), as illustrated in
Figure~\ref{fig:slave-spin-sketch}.

\begin{figure}
\includegraphics{fig20}
\caption{\label{fig:slave-spin-sketch}Sketch of the slave-spin method with a quantum processor.
Reproduced from~\cite{Michel2023a}. Copyright (2024) by the American Physical Society.}
\end{figure}

While former (purely classical)
approaches~\cite{Ruegg2010,Schiro2011a,Sandri2014} introduced a further
single-site (or small cluster,~\cite{Hassan2010}) mean-field
approximation to solve the spin model, we propose to tackle it instead
with a quantum processor: indeed, the spin model
{\begin{equation}
H_{\mathrm{s}}=\sum_{ij}J_{ij}S_{i}^{z}S_{j}^{z}+
\frac{U}{4}\sum_{i}S_{i}^{x}\label{eq:spin_model}
\end{equation}}\unskip
is very similar to the Hamiltonian realized by Rydberg atoms, Equation~
(\ref{eq:Rydberg_hamiltonian}). The main deviation lies (i) in the
finite number of atoms of Equation~(\ref{eq:Rydberg_hamiltonian}), as
opposed to the thermodynamic size of Hubbard's Hamiltonian and hence
$H_{\mathrm{s}}$, and (ii) in the specific form
$C_{6}/|\mathbf{r}_{i}-\mathbf{r}_{j}|^{6}$ of the spin interaction in
the Rydberg platform.


We propose to handle the first issue by resorting to a cluster mean-field
approach to $H_{\mathrm{s}}$ (like in~\cite{Hassan2010}), which
gives rise to
{\begin{equation}
H_{\mathrm{s}}^{\mathcal{C}}=\sum_{i,j\in\mathcal{C}}J_{ij}
S_{i}^{z}S_{j}^{z}+\frac{U}{4}\sum_{i\in\mathcal{C}}
S_{i}^{x}+\sum_{i\in\mathcal{C}}h_{i}S_{i}^{z},\label{eq:spin_model_cluster}
\end{equation}}\unskip
where $h_{i}=2z_{i}\overline{J}\overline{m}$,
$\overline{J}=\sfrac{1}{N_{p}}\sum_{\langle
i,j\rangle\in\mathcal{C}}J_{i,j}$ and
$\overline{m}=\sfrac{1}{N}\sum_{i\in\mathcal{C}}\langle
S_{i}^{z}\rangle$. $\mathcal{C}$ denotes the cluster of sites, whose
size is fixed by the number of available atoms.
$H_{\mathrm{s}}^{\mathcal{C}}$ thus has the same size as
$H_{\mathrm{Rydberg}}$.

The second issue is solved in an approximate fashion by optimizing
the locations $\{ \boldsymbol{r}_{i}\} _{i\in\mathcal{C}}$
of the atoms to have $C_{6}/|\mathbf{r}_{i}-\mathbf{r}_{j}|^{6}$
match $J_{ij}$. For the small sizes we tackled, the optimization
converges well, although there will probably always be some residual
mismatch due to the $1/r^{6}$ tail of the van der Waals interaction
(as opposed to the nearest-neighbor-only character of $J_{ij}$).


\paragraph*{Quasiparticle dependence on interaction at equilibrium}

At equilibrium (and $T=0$), the main computational step consists in
computing the correlation function $\langle S_{i}^{z}S_{j}^{z}\rangle$
(needed in the self-consistent loop) in the ground state of
$H_{\mathrm{s}}^{\mathcal{C}}$ using $H_{\mathrm{Rydberg}}$ as a proxy.
This can be done by resorting to an annealing procedure on the Rydberg
platform, namely we slowly ramp up the value of the interaction
(therefore of $\Omega$) to its final value to prepare the (approximate)
ground state of $H_{\mathrm{s}}^{\mathcal{C}}$. As for any annealing
technique, longer annealing times will lead to a larger success
probability, namely a larger overlap of the state generated by this
ramp with the sought-after ground state. In practice, because the
hardware suffers from decoherence, long anneal times will lead to more
errors. It is thus vital that one check that annealing can succeed in
realistic times given the hardware noise levels. To answer this
question, we model noise in the Rydberg platform using a Lindblad
master equation
{\begin{equation}
\mathrm{i}\hbar\frac{\mathrm{d}}{\mathrm{d}t}\rho=
\left[H_{\mathrm{Rydberg}}(t),\rho\right]
-\frac{\mathrm{i}}{2}\sum_{k}\left\{ L_{k}^{\dagger}
L_{k},\rho\right\} -2L_{k}\rho L_{k}^{\dagger},\label{eq:lindblad_equation}
\end{equation}}\unskip
where the $L_{k}$ are called jump operators ($k$ has a priori no
physical meaning yet, it just labels the operators). In the specific
case of Rydberg platforms, dephasing noise is an important source of
noise. It can be modelled using $L_{k}=\sqrt{\gamma}Z_{k}$ (where $k$
now labels a Rydberg atom, and $\gamma$ is the dephasing strength,
which can be fitted to experiments (see e.g.~\cite{Serret2020})). Here,
we take $\gamma=0.02~\mathrm{MHz}$. We also include false positive and
negative readout errors (with a 3\% rate corresponding to today's
experimental situation).

Performing this time evolution to get the $\langle
S_{i}^{z}S_{j}^{z}\rangle$ correlation function and performing the
whole cycle self-consistently yields the left panel of
Figure~\ref{fig:slave-spin-results} for different mean-field cluster
sizes. We observe that noise does cause a deviation from exact
diagonalization results, but that we can predict a reasonable estimate
for the Mott transition's critical interaction (within the slave-spin
approximation) by computing the $U$-dependence of the quasiparticle
weight $Z$. The latter, in the slave-spin mapping, can be accessed via
the squared magnetization $Z\approx\overline{m}^{2}$ of the spin model.
For $n=12$, the computations are not converged with respect to cluster
sizes, larger sizes are thus required. As for classical techniques,
exact diagonalization can probably reach up to 50 atoms, and MPS
techniques can be pushed to 100 atoms~\cite{Scholl2020}. On the quantum
side, sizes of up to 200 atoms are now feasible~\cite{Scholl2020}.
Whether these atoms can be manipulated in a useful way depends on the
interaction regime and the quality of the hardware: intuitively, the
target is regimes where the true spin--spin correlation length is large,
so that classical methods cannot capture it, and hardware platforms
that are good enough that they can sustain these long correlation
lengths. To precisely delineate this target, a better understanding of
the dependence of the correlation length on hardware noise is needed.
(As for finding regimes with a large correlation length, this is an
easier task as the Mott transition corresponds, through the slave-spin
mapping, to the ferromagnetic transition of the TFIM, where the
correlation length is expected to diverge, see e.g.~\cite{Rader2018}
for a PEPS investigation of this transition).

\begin{figure}
\includegraphics{fig21}
\caption{\label{fig:slave-spin-results}Hubbard physics with Rydberg
platforms. {Left:} evolution of the quasiparticle weight as a function
of the interaction strength for various cluster sizes (6, 8 and 12)
with an exact diagonalization of the spin Hamiltonian (dashed lines)
and a noisy annealing algorithm with a Rydberg Hamiltonian (solid
lines). {Right:} Dynamics of the quasiparticle weight after an
interaction quench with $U_{f}<U_{c}$ (top left) and $U_{f}>U_{c}$ (top
right) on a noiseless and a noisy platform. Fourier transform of $Z(t)$
for various interaction strengths (bottom left) (noiseless case) and
influence of the hopping (noiseless case). Reproduced
from~\cite{Michel2023a}. Copyright (2024) by the American Physical Society.} 
\end{figure}

\paragraph*{Dynamics of the quasiparticle weight after an interaction
quench}

A similar procedure can be applied in the out-of-equilibrium 
case~\cite{Schiro2011a}. Through the slave-spin mapping, an interaction
quench in the Hubbard model translates to a tranverse-field quench in
the spin model. One can thus quench the Rabi field to $\Omega=U/4$ in
$H_{\mathrm{Rydberg}}$ to study the quench-induced dynamics of the
quasiparticle weight $Z(t)$ in the Hubbard model. 

The right panel of Figure~\ref{fig:slave-spin-results} displays the
resulting $Z(t)$ dynamics for a perfect Rydberg platform and in the
presence of imperfections, simulated by the aforementioned Lindblad
equation. One can see that the main phenomenology of such quenches can
be resolved even in the presence of noise: $U$ oscillations with
hopping-enhanced damping in quenches to the Mott phase, $U/2$ and other
oscillation frequencies in quenches to the Fermi-liquid phase.

This first study is limited to the emulation of a 12-atom Rydberg
platform. In the future, experimental runs with more atoms should lead
to the observation of temporal and spatial dynamics of the
quasiparticle weight. Such dynamics are arguably hard to simulate with
classical computers given the number of atoms and correlation lengths
attainable by current platforms~\cite{Scholl2020}, so that the
introduced mapping could lead to the investigation of dynamical
properties of the Hubbard model previously unreachable by classical
means. The method is not limited to analog quantum computers. Yet,
implementing it on an analog quantum computer like Rydberg atoms allows
to avoid the discretization errors inherent to gate-based computers,
like Trotterization (see Section~\ref{subsec:ground-state-energies}).

Of course, several aspects of the work can be improved: away from the
particle-hole symmetric case studied here (by working at half-filling
on a bipartite lattice), the fulfilment of the constraint needs to be
taken into account explicitly. Multi-orbital models
(see~\cite{DeMedici2005}) will also require some adaptations as the
slave-spin model $H_{\mathrm{s}}$ is no longer directly that of the
Rydberg atoms, Equation~(\ref{eq:Rydberg_hamiltonian}).

\subsubsection{Quantitative criteria for fermionic
problems}\label{subsec:Quantitative-criteria-for}

In the previous subsections of this section on fermionic problems, we
have introduced algorithms and tested them on small-size problems. A
natural question to ask is whether these methods scale up to large
sizes. In~\cite{Louvet2023}, we try to answer this question for two
major methods that we already introduced, namely the variational
quantum eigensolver (VQE) and the quantum phase estimation (QPE)
algorithms.


\paragraph*{The scale of noise in VQE\,\ldots}

As already argued in Section~\ref{subsec:Algorithms-for-near-term}, VQE
intrinsically suffers from optimization issues and statistical shot
noise. Here, we will neglect these issues and instead focus on the
influence of decoherence. We will thus suppose we have found parameters
$\boldsymbol{\theta}^{\star}$ such that
$|\Psi(\boldsymbol{\theta}^{\star}){\mathop{\rangle}}=|\Psi_{0}{\mathop{\rangle}}$, where
$|\Psi_{0}{\mathop{\rangle}}$ is the ground state of the Hamiltonian at hand. For
a parametric circuit with $N_{g}$ gates, noise generically turns our
perfect state $|\Psi_{0}{\mathop{\rangle}}$ into a mixed state $\rho$ with a
fidelity
{\[
F=\langle\Psi_{0}|\rho|\Psi_{0}\rangle\ll1
\]}\unskip
 with the ground state. We can write, generically,
{\[
\rho=F|\Psi_{0}{\mathop{\rangle}}{\mathop{\langle}}\Psi_{0}|+(1-F)\rho_{\mathrm{noise}},
\]}\unskip
where $\rho_{\mathrm{noise}}$ is the (unknown) state generated by the
noise. Thus, the energy we can compute is (in the absence of shot
noise) $E=FE_{0}+(1-F)E_{\mathrm{noise}}$ with
$E_{\mathrm{noise}}=\mathrm{Tr}(\rho_{\mathrm{noise}}H)$. In other
words, noise introduces a bias
{\begin{equation}
\Delta E=E-E_{0}=(1-F)(E_{\mathrm{noise}}-E_{0}).\label{eq:bias}
\end{equation}}\unskip

We can relate the final fidelity to the individual gate error rates
$\epsilon$ using the product formula we already encountered
(Equation~(\ref{eq:product_law})), which we rewrite, for our current
purpose, as $F={\mathrm{e}}^{-\epsilon N_{g}}$. We now need to assume that
$\epsilon N_{g}\ll1$(otherwise, our final state is essentially noise
and we have no chance of getting anywhere close to the solution), so
that $1-F\approx\epsilon N_{g}$. 

The success of VQE is quantified by the targetted accuracy on the
energy. If we target $\Delta E\leq\eta$, it means that our hardware
error rate should satisfy
{\begin{equation}
\epsilon\leq\frac{\eta}{\left(E_{\mathrm{noise}}-
E_{0}\right)N_{g}}.\label{eq:vqe_criterion}
\end{equation}}\unskip

This gives us a quantitative estimate of the hardware quality
$\epsilon$ needed to reach a given accuracy $\eta$ on the energy. The
dependence on the size of the system (number of electrons or orbitals
$n$) comes from $E_{\mathrm{noise}}$, $E_{0}$ and $N_{g}$. The ground
state energy $E_{0}$ is an extensive quantity, $E_{0}\propto n$. The
number of gates in the ansatz should probably increase with $n$. We now
argue that the most severe effect comes from the $n$-dependence of
$E_{\mathrm{noise}}$.

To this aim, we need to focus on a specific case. We assume the noise
to be a global depolarizing channel (Equation~(\ref{eq:depolarizing})).
This choice may look arbitrary, but any long enough circuit (that
it forms a so-called 2-design) will produce global depolarizing noise
, so that this choice is quite representative of many current ansatz
circuits. For this noise, $\rho_{\mathrm{noise}}=I/2^{n}$, and\break therefore
{\[
E_{\mathrm{noise}}=\mathrm{Tr}(H)/2^{n}=\mathrm{Tr}(\rho_{\infty}H),
\]}\unskip
where $\rho_{\infty}$ is the thermal state
$\rho_{T}={\mathrm{e}}^{-H/k_{\mathrm{B}}T}/Z$ at infinite temperature. In other
words, global depolarizing noise populates very high energy states.

Let us now turn to the scaling of $E_{\mathrm{noise}}$. In chemical
systems, in the absence of screening of the charge of electrons, the
Coulomb energy scales as $n^{2}$ (each electron interacts with all
other electrons). In the ground state, screening reduces this scaling
to $\propto n$. This means that noise brings in excited states whose
scaling with size is no longer $\propto n$ but $\propto n^{2}$, a
dramatic change of scaling. To make this statement more quantitative,
we computed $(E_{\infty}-E_{\mathrm{HF}})/n$ (with $E_{\mathrm{HF}}$ a
replacement for exact energy but, given the energy scales,
$E_{\mathrm{HF}}-E_{0}$ is negligible) as a function of $n$ for the
first few elements of the periodic table. The results, shown in
Figure~\ref{fig:Quantitative-criteria-for}(a), do display a behavior
$E_{\infty}-E_{\mathrm{HF}}\propto n^{2}$. In addition, the scale of
$E_{\infty}-E_{\mathrm{HF}}$ itself is very large, namely tens of
Hartrees, to be compared to the chemical accuracy,
$\eta=1~\mathrm{mHa}$. Assuming 100 gates (a very conservative number)
and $E_{\mathrm{noise}}-E_{0}=10~\mathrm{Ha}$ (also a conservative
estimate), this gives
{\[
\epsilon\leq10^{-6}\ndash 10^{-4}\%,
\]}\unskip
at least three orders of magnitude from current noise rates
(superconducting processors achieve $\epsilon\approx0.1\%$ for
two-qubit gates~\cite{Morvan2023}).

\begin{figure}
\includegraphics{fig22}
\caption{\label{fig:Quantitative-criteria-for}Quantitative criteria for VQE and QPE. (a)
$(E_{\infty}-E_{\mathrm{HF}})/N$ as a function the number of electrons
$N$ for various molecules and basis sets. (b) Sketch of $E(\tau)$ as a
function of imaginary time $\tau$ and definition of $\kappa$ (area
under the curve) and $I_{\Omega}$ (triangle defined by the slope at
$\tau=0$). (c) Overlap index $I_{\Omega}$ as a function of the energy
error for various classical computations for various models. Adapted
from~\cite{Louvet2023}.}
\end{figure}

\paragraph*{A criterion to estimate overlaps for QPE}

We now turn to a central question pertaining to the quantum phase
estimation algorithm (QPE), which is often hailed as the go-to replacement
of VQE when fault-tolerant quantum computers will be available: how
large is the overlap $\Omega$ (Equation~(\ref{eq:overlap})) of the state
that is input to QPE? Since QPE's complexity scales as $O(1/\Omega)$,
knowing how $\Omega$ scales with system size is a crucial issue.
While, for weakly correlated molecules, it seems that large enough
overlaps can be obtained using classical methods, for Hubbard-like
models in a regime of large interactions, the overlaps obtained by
classical methods seem to be much smaller~\cite{Tubman2018}, calling
for a quantitative examination of this question~\cite{Lee2022}.

Here, we provide a way of computing this overlap with the only knowledge
of the energy and variance of the input state, as well as some estimate
of the exact ground state energy. The former two are typical outputs
of many classical methods (like variational Monte-Carlo methods, see
Section~\ref{subsec:Quantum-Monte-Carlo-algorithms}) and can also
be estimated in VQE, if VQE is used to prepare inputs to QPE. The
latter can usually be estimated by extrapolating e.g.\ coupled cluster
computations with increasing excitation order (from CCS to CCSD to
CCSDT\,$\ldots$).

Let $|\Phi_{0}{\mathop{\rangle}}$ denote a variational state obtained e.g.\ by
VQE (or by a classical variational method), and $|\Psi_{0}{\mathop{\rangle}}$
denote the exact ground state wavefunction. We want to estimate the
overlap $\Omega$. Reference~\cite{Mora2018} established the following identity:
{\begin{equation}
\Omega={\mathrm{e}}^{-\int_{0}^{\infty}\mathrm{d}\tau\left[E(\tau)-E_{0}\right]},
\label{eq:overlap}
\end{equation}}\unskip

with $E(\tau)$ the energy of imaginary-time evolved state
$|\Psi(\tau){\mathop{\rangle}}\propto {\mathrm{e}}^{-\tau H/2}|\Phi_{0}{\mathop{\rangle}}$, namely:
{\[
E(\tau)=\frac{\langle\Psi(\tau)|H|\Psi(\tau)\rangle}
{\langle\Psi(\tau)|\Psi(\tau)\rangle}=
\frac{\langle\Phi_{0}|H{\mathrm{e}}^{-\tau H}|\Phi_{0}\rangle}
{\langle\Phi_{0}|{\mathrm{e}}^{-\tau H}|\Phi_{0}\rangle}.
\]}\unskip

The area under the $E(\tau)-E_{0}$ curve is hard to compute, but
one can approximate it with the area of a triangle, as illustrated
in Figure~\ref{fig:Quantitative-criteria-for}(b):
{\[
\int_{0}^{\infty}\mathrm{d}\tau\left[E(\tau)-E_{0}\right]
\approx\frac{1}{2}(E_{V}-E_{0})\frac{E_{V}-
E_{0}}{\left|\partial_{\tau}E(\tau=0)\right|}.
\]}\unskip

Let us compute $\partial_{\tau}E(\tau)=(-\langle\Phi_{0}|H^{2}{\mathrm{e}}^{-\tau
H}|\Phi_{0}\rangle\langle\Phi_{0}|{\mathrm{e}}^{-\tau
H}|\Phi_{0}\rangle+\langle\Phi_{0}|H{\mathrm{e}}^{-\tau
H}|\Phi_{0}\rangle\langle\Phi_{0}|H{\mathrm{e}}^{-\tau
H}|\Phi_{0}\rangle)/${\ubreak}$\langle\Phi_{0}|{\mathrm{e}}^{-\tau H}|\Phi_{0}\rangle^{2}$,
and so
{\begin{equation*}
\partial_{\tau}E(\tau=0)  =\frac{-\langle\Phi_{0}|H^{2}|
\Phi_{0}\rangle\langle\Phi_{0}|\Phi_{0}\rangle+
\langle\Phi_{0}|H|\Phi_{0}\rangle^{2}}{\langle
\Phi_{0}|\Phi_{0}\rangle^{2}}=-\langle H^{2}\rangle+\langle H\rangle^{2}
\end{equation*}}\unskip

Therefore,
{\begin{equation}
\Omega\approx\exp\left(-\frac{\left(E_{V}-E_{0}
\right)^{2}}{2\sigma_{V}^{2}}\right),\label{eq:overlap_estimate}
\end{equation}}\unskip
namely we can compute an approximation to the overlap using the energy
$E_{V}$ and variance $\sigma_{V}^{2}$ of the input state, as well as
the exact energy $E_{0}$ (or an estimate thereof), as claimed. 

With this formula, we estimated the overlap of classical states
obtained by a large sample of state-of-the-art methods applied on a
variety of models~\cite{Wu2023}. The results are shown in 
Figure~\ref{fig:Quantitative-criteria-for}(c), where we plot the
overlap index
{\begin{equation}
I_{\Omega}=\frac{\left(E_{V}-E_{0}\right)^{2}}
{2\sigma_{V}^{2}}.\label{eq:overlap_index}
\end{equation}}\unskip

We see that $I_{\Omega}\propto|E-E_{0}|$. Besides, we know
that $E$, $E_{0}$ and $\sigma_{V}^{2}$ are extensive, therefore
$I_{\Omega}\propto N$, and therefore $\Omega={\mathrm{e}}^{- I_\Omega}$ decreases
exponentially with system size for large variety of models and methods.
This is not a surprising result: it is the infamous orthogonality
catastrophe of condensed-matter physics. This catastrophe a priori
renders QPE exponentially costly in the system size. Modifications of
the algorithm, or other algorithmic advances, are thus probably needed.

\subsection{Beyond many-fermion problems: solving hard optimization
problems with quantum processors }\label{subsec:optimization}

We have focused so far on the solution of fermionic quantum many-body
problems with quantum processors. In this section, we shed light on a
seemingly less straightforward application of quantum processors, that
is to solve \emph{classical} optimization problems. More precisely, we
will focus on combinatorial optimization problems. As we have seen in
Section~\ref{subsec:classical-many-body}, they are nothing but
classical many-body problems since they can be mapped to classical
interacting spin problems. (We refer the reader to~\cite{Abbas2023} for
a review on general optimization problems and quantum computing).

\subsubsection{A reminder on complexity theory\,$\ldots$~and what
accelerations quantum computers can (or cannot)
bring}\label{subsec:complexity-theory}

Before turning to concrete examples, let us first elaborate---although
in a quite informal way---on the formal guarantees (or lack thereof)
that quantum computers bring in terms of computational acceleration.

It is customary to classify computational problems---in fact, decision
problems, corresponding to yes/no questions---using the notion of
complexity classes. They---informally speaking---indicate the scaling
of the time to solution of a given problem as a function of the problem
size. On classical computers (classical deterministic Turing machines),
the $\mathrm{P}$ class corresponds to the problems that can be solved
in polynomial time. $\mathrm{NP}$ (for ``nondeterministic polynomial''
time) denotes problems that can be solved in polynomial time on
\emph{nondeterministic} Turing machines (which in practice captures
problems that cannot be solved polynomially on deterministic
machines\,$\ldots$~at least this is a very likely fact, known as the
$\mathrm{P\neq NP}$ conjecture in computer science). $\mathrm{NP}$-hard
problems are the problems that are at least as hard to solve as the
problems in the $\mathrm{NP}$ class. Finally, the
$\mathrm{NP}$-complete class is the intersection of $\mathrm{NP}$ and
$\mathrm{NP}$-hard problems. They are very likely (that is, under the
$\mathrm{P\neq NP}$ conjecture) to have an exponential run time. These
complexity classes are illustrated in
Figure~\ref{fig:Complexity-classes.}.

\begin{figure}
\includegraphics{fig23}
\caption{\label{fig:Complexity-classes.}Complexity classes.}
\vspace*{6pt}
\end{figure}

The same classification can be sketched for quantum computers. The
polynomial quantum counterpart of $\mathrm{P}$ is called $\mathrm{BQP}$
(for bounded-error quantum polynomial time), while that of $\mathrm{NP}$
is $\mathrm{QMA}$ (for quantum Merlin--Arthur).

Let us take a look at a few of the famous problems that have been
considered as promising for quantum computers:
\begin{itemize}
\item factoring is a NP problem (but not NP-hard), which is why it has a
subexponential time classical algorithm (being subexponential does
not prevent it from being hard to solve in practice, which guarantees
the safety of encryption methods based on the RSA algorithm). Shor's
algorithm is polynomial, which means that factoring is in BQP. It
achieves an exponential speedup compared to the best known classical
algorithm.
\item combinatorial optimization problems like the traveling salesperson
problem, the maximum independent set (MIS) problem (see 
Section~\ref{subsec:Rydberg-udmis}) or the maximum cut problem 
(see Section~\ref{subsec:Q-score}) are NP-complete problems. It is conjectured
that the NP-complete and the BQP class are disjoint, and therefore
that there exist no polynomial quantum algorithm for these problems.
\item the problem of finding the ground state energy of a $k$-local
Hamiltonian (namely with Pauli terms, as 
in~(\ref{eq:Pauli_decomposition}), acting on at most $k$ qubits) is
NP-hard~\cite{Barahona1982} and QMA-complete~\cite{Kitaev2002}, that
is, informally, exponentially hard both for classical and quantum
computers. One consequence of QMA-complete being conjectured to be
disjoint from BQP is that there will \emph{not} be a polynomial quantum
algorithm to solve it. 
\item time-dynamics simulation, the original idea of
Feynman~\cite{Feynman1982}, is an example where quantum computers are
efficient (this problem is in BQP, see~\cite{Lloyd1996}
and~\cite{Aharonov2003} for two polynomial implementations for
$k$-local and sparse Hamiltonians, respectively) while classical
computers are not. 
\end{itemize}
Should one shy away from NP-complete or QMA-complete problems on the
account that they will necessarily lead to exponential algorithms?
A reasonable answer is no.

A first reason is that even if both classical computers and quantum
computers scale exponentially for a given problem, (i) in concrete
examples, the specific regime (symmetries, temperature, filling etc)
of the problem at hand may make it easier to solve than the complexity
theoretic prediction, and it could well be that a quantum algorithm
outperforms a classical one in such regime (although it could also
go in the other direction, namely the classical algorithm better exploits
the specificity of the problem at hand), and (ii) even in a true exponential
regime, the scaling of the exponent could be different and therefore
lead to a speedup (the definition of speedup itself is worth 
discussing~\cite{Chan2024}: one could either define it as the ratio of the
complexities, or express $C_{\mathrm{classical}}=f(C_{\mathrm{quantum}})$
and call the speedup polynomial or exponential depending on the scaling
of the ratio or on the polynomial or exponential scaling of $f$.
The first choice implies that two exponentially scaling algorithms
could still scale exponentially with respect to each other; the second
implies a polynomial scaling between two such algorithms).

A second more practical reason is that most application fields are
not looking for the exact solution of a problem but for approximate
solutions. The gain in \emph{approximability} of a problem when switching
from classical to quantum is all that matters in many cases. Indeed,
in practice, most hard problems are solved only in an approximate
fashion: obtaining an approximate solution in a reasonable time is
considered to be a satisfactory goal for exponential problems. Strikingly,
not all NP-complete problems are equal when it comes to their ``approximability'',
namely the possibility to find approximate solutions in ``reasonable''
time. A classification in terms of approximability is therefore a
more useful notion than the complexity classes we sketched above.
There, the main quantity of interest is the ``approximation ratio'',
defined as
{\begin{equation}
\alpha(S)=\frac{C(S)}{C(S^{\star})},\label{eq:approximation_ratio}
\end{equation}}\unskip
where $C(S)$ is the cost of a solution (the quantity one wants to
maximize, see Equation~(\ref{eq:cost_maxcut}) for MaxCut or Equation~(\ref{eq:UDMIS_cost_function})
below for the unit-disk MIS, UDMIS) and $S^{\star}$ the optimal solution
(which, for a NP-complete problem, can only be obtained in an exponential
run time). The closest $\alpha$ is to 1, the better the solution.
The approximability classification works in terms of how close to
1 the approximation ratio is: 
\begin{enumerate}[(1)]
\item the easiest-to-approximate problems are those that, for any
$\epsilon>0$, have an approximation algorithm with $\alpha>1-\epsilon$
that runs in time $\mathrm{poly}(1/\epsilon,n)$. Such algorithms are
called FPTAS (for fully polynomial time approximation scheme). This is
the case for the ``backpack'' problem, a NP-complete problem that
consists in putting as many valuable objects in a backpack as possible
but with a cap on the maximum weight. Belonging to FPTAS essentially
means that the problem is a very easy problem in practice.
\item then, there exists algorithms such that for any $\epsilon>0$, the
approximate solution with $\alpha>1-\epsilon$ is obtained in time
$\mathrm{poly}(n)$ (but this run time could scale as
${\mathrm{e}}^{1/\epsilon}$!). UDMIS is among those problems, called PTAS (for
polynomial time approximation scheme).
\item the class of problems called APX has algorithms that achieve
$\alpha>f(n)$ in time $1/\mathrm{poly}(n)$:
\begin{enumerate}[(a)]
\item MaxCut and bounded-degree MIS (MIS on graphs with a bounded degree)
are such that $f(n)=\mathrm{const}=C$: there are algorithms that
find an approximate solution with a guarantee that its approximation
ratio will be at most $C$.
\item the general MIS problem has $f(n)=1/\mathrm{poly}(n)$ (where
$\mathrm{poly(}n)$ denotes a polynomial in the size $n$) of the
problem. 
\end{enumerate}
\end{enumerate}
This classification suggests that the approximation ratio of a given
algorithm (whether classical or quantum) be placed at the center of
the stage when considering a new method. In particular, how $\alpha$
scales with system size is a key property to be monitored, with the
hope that the scaling offered by quantum computers could be better
than that of classical computers. Finally, it also helps identify
those problems that a worth tackling with quantum computers: problems
that are easy to approximate using a classical algorithm, like knapsack,
are probably not a promising target. Conversely, problems that are
hard to approximate by classical algorithms probably offer more space
for quantum advantage (although nothing guarantees that there will
be one).

\subsubsection{Rydberg processors: a spin-1/2 machine dedicated to
unit-disk problems: the effect of decoherence}\label{subsec:Rydberg-udmis}

The Hamiltonian (\ref{eq:Rydberg_hamiltonian}) of Rydberg atoms turns
out to be similar to the Hamiltonian translation of a well-known
combinatorial optimization problem called the unit-disk maximum
independent set (UDMIS) problem, as first pointed out
by~\cite{Pichler2018}. This problem consists in finding, given a graph
$G=(V,E)$ (with vertices $V$ and edges $E$), the largest set of
vertices that do not share an edge (a set called the maximum
independent set). This is illustrated in Figure~\ref{fig:UDMIS}(a). The
``unit-disk'' qualifier restricts the class of graphs to ``unit-disk
graphs'', which correspond to planar graphs such that edges are drawn
only between vertices that are closer than a certain distance (one by
convention, hence the name ``unit'').

\begin{figure}
\includegraphics{fig24}
\caption{\label{fig:UDMIS}Solving the UDMIS problem with a Rydberg
platform. (a) Sketch of the UDMIS problem: dots are vertices $V$ while
lines are edges $E$ of the graph $G=(V,E)$ at hand. The red dots stand
for the sought-after solution, the MIS. The circles are unit disks that
determine the existence of edges. (b) Approximation ratio as a function
of graph size for various dephasing noise levels $\gamma$ (green curve:
results obtained with uniform random sampling algorithm). (c) Absolute
value of the spin--spin correlation as a function of distance (y-log
scale) for the three noise levels of panel (b), and exponential fit.
(d) Break-even diagram: black dots represent the approximation ratio
and computational time achieved by a state-of-the-art classical
heuristic algorithm to solve UDMIS. Reproduced
from~\cite{Serret2020}. Copyright (2020, 2024) by the American Physical Society.}
\end{figure}

A candidate solution to the problem is characterized by a bitstring
$S=(b_{1},\dots, b_{n})$ (where $n$ is the number of vertices), with
$b_{i}=1$ if the $i$th vertex belongs to the set and $b_{i}=0$
otherwise. Maximizing the size of the set corresponds to maximizing
$\sum_{i=1}^{n}b_{i}$, while excluding vertices that share an edge
corresponds to imposing $b_{i}b_{j}=0$ if $i,j\in E$. This constrained
optimization problem can be formulated in an unconstrained way by
using a Lagrange multiplier $U$ to uphold the constraint, yielding
the cost function
{\begin{equation}
C(S)=\sum_{i=1}^{n}b_{i}-U\sum_{ij\in E}b_{i} b_{j}.\label{eq:UDMIS_cost_function}
\end{equation}}\unskip

This classical cost function can be ``quantized'' by turning the
binary variables $b_{i}$ into operators $\hat{n}_{i}$. This yields
the Hamiltonian
{\begin{equation}
H=U\sum_{ij\in E}\hat{n}_{i}\hat{n}_{j}-\sum_{i=1}^{n}\hat{n}_{i},\label{eq:UDMIS_hamiltonian}
\end{equation}}\unskip
where we also added an overall minus sign to turn the problem into a
minimization rather than a maximization problem. Hamiltonian
(\ref{eq:UDMIS_hamiltonian}) is very similar to
(\ref{eq:Rydberg_hamiltonian}), provided one approximates
$C_{6}/|\mathbf{r}_{i}-\mathbf{r}_{j}|^{6}>0$ for
$|\mathbf{r}_{i}-\mathbf{r}_{j}|<r_{R}$ and $=0$ for larger
distances. Thus, finding the solution to the UDMIS problem can be
translated to finding the ground state of the Rydberg Hamiltonian (with
$\Omega=0$). This ground state is a priori not easy to find as the rest
state of the processor is the $|00,\dots,0{\mathop{\rangle}}$ state (all atoms at
rest). To reach the ground state with high probability, one can resort
to a quantum annealing procedure (see
Section~\ref{subsec:adiabatic-annealing}), where one slowly turns the
instantaneous Hamiltonian $H_{\mathrm{Rydberg}}$ from a Hamiltonian
$H_{0}$ whose ground state is $|00,\dots,0{\mathop{\rangle}}$ to a final
Hamiltonian as close as possible to $H$
(Equation~(\ref{eq:UDMIS_hamiltonian})). The adiabatic theorem
guarantees that, provided this time evolution is long enough compared
to the inverse instantaneous squared gap, and neglecting the difference
between the final Rydberg Hamiltonian and the target $H$, the final
state will be the ground state of $H$.

In~\cite{Serret2020}, we analyzed the performance of a simple quantum
annealing procedure in the presence of decoherence. We quantitatively
verified the intuitive tradeoff between long annealing times, which
increase the success probability of the annealing procedure, and short
times, which limit the influence of noise. This competition leads
to an optimal, noise-dependent annealing time. Choosing this time
as our annealing time, we studied approximation ratio (Equation~(\ref{eq:approximation_ratio}))
of our quantum algorithm, with $S^{\star}$ denoting the exact solution
of the UD-MIS problem.

We investigated the dependence of the approximation ratio on the size
of the graph for different noise strengths (with a Lindblad noise model
similar to the one we presented in Section~\ref{subsec:Rydberg-atoms-slave-spin}).
As expected, more noise leads to a smaller approximation ratio (see
Figure~\ref{fig:UDMIS}(b)). The approximation ratio slightly decreases
with increasing size, with a plateau that comes at smaller and smaller
sizes as the noise level grows. We use the value of this plateau to
extrapolate the results to large graphs.

We compared these approximations ratios to those of a state-of-the-art
classical approximation algorithm for UD-MIS. In this classical
algorithm, the graph at hand is divided into subgraphs of small size
$d$, whose MIS is determined exactly. These subsolutions are combined
to build an approximate solution. The size $d$ is tuned to reach a
given approximation ratio (the larger $d$, the better quality) within a
given time budget (the run time is exponential in $d$ since UD-MIS is a
NP-complete, hence exponential, problem). This leads to
Figure~\ref{fig:UDMIS}(d), where the black dots are the approximation
ratios and graph sizes that can be reached within a 2-second time
budget. It allows to draw a ``break-even'' diagram, where the region of
quantum advantage is drawn in orange. The quantum approximation ratios
corresponding to two noise levels ($\gamma=3$ corresponding
to~\cite{Lienhard2017} and $\gamma=0.3$ close to 2024 levels of noise)
are represented by the red and blue line, assuming the decoherence
levels are kept constant with the number of atoms. We predict that a
number of more than 8000 atoms will be needed to beat the classical
algorithm (to reach the right-hand ``advantage area''), or much better
noise levels (to reach the top ``advantage area''). Increasing the
number of atoms is an undergoing effort in experimental labs and
companies. One major element to factor in, though, is the repetition
rate of the experiment. In Rydberg platforms, the repetition rate about
a Herz~\cite{Browaeys2020}, with possible improvements up to a few tens
of Herz. This is many orders of magnitude away from superconducting
processors (MHz), not to mention classical processors (GHz).

Improving the quality is another key aspect. Beyond the final
approximation ratio estimates that we were able to extract from our
simulations, our simulations also indicate one of the causes of the
degradation in solution quality. Indeed, a maximum independent set can
be regarded as the generalization to any graph of a (classical)
antiferromagnetic (AF) state: in a 1D chain a MIS is nothing but an
antiferromagnetic spin chain $0101\dots10$. Errors will cause
disruptions in the perfect AF order by flipping some of the spins,
resulting in AF domains separated by boundaries corresponding to
defects. The size of the corresponding IS will be that of the maximum
IS (the AF state) minus the number of boundary sites. To count the
number of defects, the relevant quantity to look at is the (AF)
correlation length $\xi$ of the chain: it gives the typical size of the
domains. For a fixed $\xi$, the number of boundary sites be
approximately $n/{\xi}$. Thus, the approximation ratio in 1D
will be 
{\begin{equation}
\alpha=\frac{{{|\mathrm{MIS}|}-
{n}/{\xi}}}{\left|\mathrm{MIS}\right|}=\frac{\kappa n
-{n}/{\xi}}{\kappa n}=
1-\frac{1}{\kappa\xi},\label{eq:approximation_ratio_corrlength}
\end{equation}}\unskip
where we used the fact that the size of the MIS is proportional to the
graph size. The same reasoning can be extended to two dimensions:
there, the number of domains is ${\sim} n/\xi^{2}$ and the size of their
boundary is ${\sim}\xi$, so that the size of the IS will be
$|\mathrm{MIS}|-{n}/{\xi^{2}\times\xi}$.
Equation~(\ref{eq:approximation_ratio_corrlength}) is consistent with
the numerical data: one can extract the correlation length from the
noisy data (Figure~\ref{fig:UDMIS}(c)). One does see that noise
shortens $\xi$, leading to a lower approximation ratio. The
relationship between $\alpha$ and $\xi$ we briefly derived is confirmed
by our data. In other words, this classical optimization example
emphasizes the importance for quantum processors to be able to generate
states with very long correlations lengths. Classical heuristics, in a
way, also come with a finite correlation length or short-sightedness in
that they solve subproblems with a given size $d$ exactly (at an
exponential price) and they then patch the results together. The
challenge of quantum computers is to ``solve'' subproblems that are
larger, and/or to solve them faster, than classical algorithms.

There are other promising avenues besides our work. One is to find
better algorithms than quantum annealing to find the ground state
of the problem at hand. Variational approaches, in the same vein as
the variational quantum eigensolver, have been explored. The quantum
approximate optimization algorithm (QAOA,~\cite{Farhi2014}) is,
like the Hamiltonian variational ansatz we encountered previously,
directly inspired by quantum annealing, and suffers from the same
issues as VQE.

One issue with the UDMIS problem is that it is, in the approximability
hierarchy, a quite easy problem. It thus looks promising to try and
tackle the MIS problem on more general graphs than unit-disk graphs. Ways to achieve this
for general 2D graphs with Rydberg atoms have been laid out in~\cite{Nguyen2023a}
(with the help of additional atoms adding up to $O(n^{2})$ atoms
to handle a graph of size $n$) and in~\cite{Dalyac2023} (by using
3D arrangements of atoms).

\subsubsection{Q-score: a benchmark for quantum
computers}\label{subsec:Q-score}

The above study of the UDMIS problem introduced the approximation ratio
$\alpha$ as a quantitative measure of the solution quality of a given
(quantum as well as classical) algorithm. This measure of success is
also, as we explained in Section~\ref{subsec:complexity-theory}, a
formal measure of the \emph{approximability} of hard computational
problems. 

Starting from the importance of this metric, we defined,
in~\cite{Martiel2021}, a benchmark for quantum computing architectures
based on the assessment of the approximation ratio reachable by quantum
processors when solving another combinatorial optimization problem,
namely the maximum cut (MaxCut) problem. As introduced in
Section~\ref{subsec:classical-many-body}, this problem consists in
finding the bipartition of vertices in a graph that maximizes the
number of edges between the vertices of each partition, and finds
applications in many fields.

The design of this benchmark obeyed three criteria: to be (i)
application-oriented, as opposed to low-level benchmarks like
randomized benchmarking~\cite{Magesan2011,Magesan2011a}, (ii) scalable,
namely efficiently computable for large problem instances, as opposed
to benchmarks like the quantum volume or cross-entropy benchmarking,
which require the exponential computation of circuit outcome
probabilities, and (iii) hardware-agnostic, namely not restricted to,
let alone favoring any technology, whether gate-based or analog. 

While points (i) and (iii) are easily met by the choice of the MaxCut
problem, (ii) poses an important challenge: to compute the
approximation ratio $\alpha$, one in principle needs to compute the
exact solution $S^{\star}$ to compute the denominator of
Equation~(\ref{eq:approximation_ratio}), thus incurring an exponential
run time. This issue is circumvented in our proposal by computing the
ratio of the \emph{averages} over graph classes. We chose the
Erdos--Renyi class to do this average because of the possibility to
reach hard graph instances by tuning the edge probability $p$ of the
Erdos--Renyi graphs. The advantage of performing averages is that
asymptotic scalings of average optimal costs $\langle
C(S^{\star})\rangle _{\mathcal{G}}$ can be computed efficiently,
as opposed to costs of individual solutions $S^{\star}$.

Generically, the so-obtained approximation ratio (in fact, a slight
modification thereof that we called $\beta$ instead of $\alpha$ to
subtract a trivial contribution) decreases with graph size due to the
increasing effect of decoherence. We call ``Q-score'' the graph size
above which $\beta$ falls below an arbitrary threshold of 20\%.

\begin{figure}
\includegraphics{fig25}
\caption{\label{fig:Q-score-computation.}Simulation of the Q-score: modified approximation ratio
$\beta$ as a function of the number of qubits (graph size) for a QAOA
algorithm with $p=1$ and $p=2$ layers, for a perfect (noiseless, blue
and cyan lines) and a noisy (red lines) quantum computer (with two
qubit topologies in the latter case: all-to-all (solid) and
nearest-neighbor on a grid (dashed)). The noise levels (2\% and 0.4\%
average depolarizing error rates for two and one-qubit gates,
respectively) are representative of NISQ devices in 2021. Reproduced
from~\cite{Martiel2021}. Copyright (2021) by IEEE.}
\end{figure}

Figure~\ref{fig:Q-score-computation.} illustrates a Q-score computation
by using a variational algorithm, QAOA, to solve the MaxCut problem:
while noiseless simulations show more or less constant approximation
ratios as the graph size (hence number of qubits) grows (with improved
ratios as the number $p$ of layers, and hence ansatz expressivity, increase),
noise leads to a gradual degradation of the ratio with size, even
more so when the connectivity is limited, which leads to deeper, and
hence more error-prone, circuits. Thus, the Q-score reflects both
the quality of hardware and that of the software (a better compilation
or a better algorithm can lead to higher Q-scores for the same hardware).

Since the publication of the Q-score proposal, the Q-score was computed
for three architectures: (i) a $d$-wave computer architecture, with
reported an experimental Q-score of 140 (and even 12,500 for a hybrid
quantum-classical approach,~\cite{vanderSchoot2022}), (ii) a Rydberg
processor, with \mbox{numerical} simulations predicting Q-scores above 
18~\cite{Coelho2022}, and (iii) a transmon quantum computer by the IQM
company, with a reported experimental Q-score of 11 (unpublished,
\url{https://www.meetiqm.com/newsroom/press-releases/iqm-achieves-20-qubit-benchmark-result}),
with error mitigation techniques.

\section{Conclusion}

The field of quantum computing for many-body physics stands at a crossroads.
By now, many proof-of-concept computations have shown that one could
indeed run many-body computations on the emerging hardware platforms,
but also that these demonstrations are still largely outperformed
by classical algorithms. This state of affairs is both due to hardware
imperfections and to more fundamental limitations of variational quantum
algorithms, which have been the main fuel of the NISQ era so far.
Resource estimates and a better knowledge of the generic cost landscape
unambiguously indicate that sophisticated approaches will be needed
to avoid the measurement variance problem and the barren plateau problem.
Whether these hurdles will prove insurmountable or not is an open
question. In any case, the design of resource-efficient algorithms
to generate complex many-body states will be a central ingredient. 

Given these difficulties on the one hand, and the recent demonstrations
of small-scale error correction on the other hand, one may be tempted
to directly aim for fault-tolerant quantum algorithms. It is at the
same time an optimistic move, and a pessimistic one. It is optimistic
because there are strings attached to this program: first of all,
the scale of the experimental effort required to obtain fault-tolerant
processors is such that this technology will probably not be available
before long years~\cite{Mohseni2024}. Second, even perfectly corrected quantum computers
come with open challenges, like the orthogonality catastrophe we discussed
in Section~\ref{fig:Quantitative-criteria-for}.

It is also a pessimistic move, because the scale and quality of current
processors, with hundreds of particles with large spatial correlations
(7 lattice spacings in 2021,~\cite{Scholl2020}), makes it very likely
that hitherto unreachable physical regimes are already accessible
with current or near-term hardware. Indeed, although the most advanced
classical methods can still rival with quantum advantage contentions,
as we showed in Section~\ref{subsec:Debunking-quantum-supremacy},
they also come with hard limitations, especially when it comes to
dynamics. Of course, the precise way to squeeze acceleration out of
these NISQ processors is still elusive. Our view is that it will probably
result from a subtle interplay between classical and quantum algorithms.
The hybrid algorithms we presented, that exchange classical and quantum
information to optimize the qubit orbital basis (Section~\ref{subsec:Optimizing-the-single-orbital}),
to compress state preparation circuits (Section~\ref{subsec:Matrix-product-states-trotter}),
or to sidestep the issues related to fermionic antisymmetry 
(Section~\ref{subsec:Rydberg-atoms-slave-spin}), are first steps in this direction.

Very recently, much work has been devoted to going beyond the limitations
of simple VQEs without directly turning to fault-tolerant algorithms
like quantum phase estimation. For instance,~\cite{Stair2021a} introduces
a quantum version of the projective variant of coupled cluster equations
(see Section~\ref{subsec:qchem}), with promising convergence properties
compared to VQE. Reference~\cite{Robledo-Moreno2024} do not carry any optimization
of the variational energy, but instead (i) start from a classical
CCSD computation to build their quantum circuit and (ii) do not attempt
to estimate energies but sample bitstrings from the obtained state,
to then construct restrictions of the many-body Hamiltonian to the
subspace of the most probable bistrings, before classically diagonalizing
this Hamiltonian using exact diagonalization techniques (similar to
those described in Section~\ref{subsec:Direct-diagonalization}).
 Another promising strand of methods~\cite{Jiang2024a} uses a
quantum processing step within an otherwise classical quantum Monte
Carlo code like AFQMC~\cite{Huggins2021} or FCI-QMC~\cite{Zhang2022a}.
There again, a careful investigation of what advantage the quantum
processor brings will be crucial~\cite{Mazzola2023}.

\section*{Declaration of interests}
The authors do not work for, advise, own shares in, or receive funds
from any organization that could benefit from this article, and have
declared no affiliations other than their research organizations.

\section*{Acknowledgments}
This manuscript was submitted for my Habilitation \`a Diriger des
Recherches (HDR). I would like to thank the reviewers Gr\'egoire
Misguich, Laurent Sanchez-Palencia and Eric Canc\`es, as well as the
other jury members Yuri Alexeev, Antoine Browaeys, Denis Lacroix and
Daniel Est\`eve, for a careful reading of the manuscript.

\back{}

\printbibliography
\refinput{crphys20241068-reference.tex}

\end{document}