michaelsalbergo [at] gmail
malbergo [at] fas.harvard.edu
Publications: google scholar
CV: Available upon request.
with Aavishkar Patel and Peter Lunts
'Strange' metals that do not follow the predictions of Fermi liquid theory are prevalent in materials that feature superconductivity arising from electron interactions.
In recent years, it has been hypothesized that spatial randomness in electron interactions must play a crucial role in strange metals for their hallmark linear-in-temperature
(T) resistivity to survive down to low temperatures where phonon and Umklapp processes are ineffective, as is observed in experiments. However, a clear picture of how this
happens has not yet been provided in a realistic model free from artificial constructions such as large-N limits and replica tricks. We study a realistic model of
two-dimensional metals with spatially random antiferromagnetic interactions in a non-perturbative regime, using numerically exact high-performance large-scale hybrid Monte
Carlo and exact averages over the quenched spatial randomness. Our simulations reproduce strange metals' key experimental signature of linear-in-T resistivity with a
'planckian' transport scattering rate Γtr∼kBT/ℏ that is independent of coupling constants. We further find that strange metallicity in these systems is not associated with a
quantum critical point, and instead arises from a phase of matter with gapless order parameter fluctuations that lacks long-range correlations and spans an extended region of
parameter space: a feature that is also observed in several experiments. Our work paves the way for an eventual microscopic understanding of the role of spatial disorder in
determining important properties of correlated electron materials.
with Eric Vanden-Eijnden
We propose an algorithm, termed the Non-Equilibrium Transport Sampler (NETS), to sample from unnormalized probability distributions.
NETS can be viewed as a variant of annealed importance sampling (AIS) based on Jarzynski's equality, in which the stochastic differential
equation used to perform the non-equilibrium sampling is augmented with an additional learned drift term that lowers the impact of the
unbiasing weights used in AIS. We show that this drift is the minimizer of a variety of objective functions, which can all be estimated
in an unbiased fashion without backpropagating through solutions of the stochastic differential equations governing the sampling. We also
prove that some these objectives control the Kullback-Leibler divergence of the estimated distribution from its target. NETS is shown to
be unbiased and, in addition, has a tunable diffusion coefficient which can be adjusted post-training to maximize the effective sample size.
We demonstrate the efficacy of the method on standard benchmarks, high-dimensional Gaussian mixture distributions, and a model from statistical
lattice field theory, for which it surpasses the performances of related work and existing baselines.
with Nicholas Boffi and Eric Vanden-Eijnden
Generative models based on dynamical transport of measure, such as diffusion models, flow matching models, and stochastic interpolants,
learn an ordinary or stochastic differential equation whose trajectories push initial conditions from a known base distribution onto the target.
While training is cheap, samples are generated via simulation, which is more expensive than one-step models like GANs. To close this gap,
we introduce flow map matching -- an algorithm that learns the two-time flow map of an underlying ordinary differential equation.
The approach leads to an efficient few-step generative model whose step count can be chosen a-posteriori to smoothly trade off accuracy
for computational expense. Leveraging the stochastic interpolant framework, we introduce losses for both direct training of flow maps and
distillation from pre-trained (or otherwise known) velocity fields. Theoretically, we show that our approach unifies many existing few-step generative models,
including consistency models, consistency trajectory models, progressive distillation, and neural operator approaches, which can be obtained as particular cases of our formalism.
With experiments on CIFAR-10 and ImageNet 32x32, we show that flow map matching leads to high-quality samples with significantly reduced sampling cost compared to diffusion or stochastic interpolant methods.
with Yifan Chen, Mark Goldstein, Mengjian Hua, Nicholas Boffi, and Eric Vanden-Eijnden
We propose a framework for probabilistic forecasting of dynamical systems based on generative modeling.
Given observations of the system state over time, we formulate the forecasting problem as sampling from the conditional distribution of the future system state given its current state.
To this end, we leverage the framework of stochastic interpolants, which facilitates the construction of a generative model between an arbitrary base distribution and the target.
We design a fictitious, non-physical stochastic dynamics that takes as initial condition the current system state and produces as output a sample from the target conditional distribution in finite time and without bias.
This process therefore maps a point mass centered at the current state onto a probabilistic ensemble of forecasts.
We prove that the drift coefficient entering the stochastic differential equation (SDE) achieving this task is non-singular, and that it can be learned efficiently by quadratic regression over the time-series data.
We show that the drift and the diffusion coefficients of this SDE can be adjusted after training, and that a specific choice that minimizes the impact of the estimation error gives a Föllmer process.
We highlight the utility of our approach on several complex, high-dimensional forecasting problems, including stochastically forced Navier-Stokes and video prediction on the KTH and CLEVRER datasets.
with Nanye (Willis) Ma, Mark Goldstein, Nicholas Boffi, Eric Vanden-Eijnden, and Saining Xie
We present Scalable Interpolant Transformers (SiT), a family of generative models built on the backbone of Diffusion Transformers (DiT). The interpolant framework, which
allows for connecting two distributions in a more flexible way than standard diffusion models, makes possible a modular study of various design choices impacting generative
models built on dynamical transport: using discrete vs. continuous time learning, deciding the objective for the model to learn, choosing the interpolant connecting the distributions,
and deploying a deterministic or stochastic sampler. By carefully introducing the above ingredients, SiT surpasses DiT uniformly across model sizes on the conditional ImageNet 256x256
benchmark using the exact same backbone, number of parameters, and GFLOPs. By exploring various diffusion coefficients, which can be tuned separately from learning, SiT achieves an FID-50K score of 2.06.
Submitted, 2023
with Eric Vanden-Eijnden
These lecture notes provide an introduction to recent advances in generative modeling methods based on the dynamical transportation of measures,
by means of which samples from a simple base measure are mapped to samples from a target measure of interest. Special emphasis is put on the applications
of these methods to Monte-Carlo (MC) sampling techniques, such as importance sampling and Markov Chain Monte-Carlo (MCMC) schemes. In this context,
it is shown how the maps can be learned variationally using data generated by MC sampling, and how they can in turn be used to improve such sampling in a positive feedback loop.
Lecture Notes from the 2022 Les Houches Summer School on Statistical Physics
with Mark Goldstein, Nicholas M. Boffi, Rajesh Ranganath, and Eric Vanden-Eijnden
Generative models inspired by dynamical transport of measure -- such as flows and diffusions -- construct a continuous-time map between two probability densities.
Conventionally, one of these is the target density, only accessible through samples, while the other is taken as a simple base density that is data-agnostic.
In this work, using the framework of stochastic interpolants, we formalize how to \textit{couple} the base and the target densities.
This enables us to incorporate information about class labels or continuous embeddings to construct dynamical transport maps that serve as conditional generative models.
We show that these transport maps can be learned by solving a simple square loss regression problem analogous to the standard independent setting.
We demonstrate the usefulness of constructing dependent couplings in practice through experiments in super-resolution and in-painting.
with Nicholas M. Boffi, Michael Lindsey, and Eric Vanden-Eijnden
Given a set of $K$ probability densities, we consider the multimarginal generative modeling problem of learning a joint distribution that recovers these densities as marginals. The structure of this joint distribution should identify multi-way correspondences among the prescribed marginals.
We formalize an approach to this task within a generalization of the stochastic interpolant framework, leading to efficient learning algorithms built upon dynamical transport of measure.
Our generative models are defined by velocity and score fields that can be characterized as the minimizers of simple quadratic objectives, and they are defined on a simplex that generalizes the time variable in the usual dynamical transport framework.
The resulting transport on the simplex is influenced by all marginals, and we show that multi-way correspondences can be extracted.
The identification of such correspondences has applications to style transfer, algorithmic fairness, and data decorruption.
In addition, the multimarginal perspective enables an efficient algorithm for reducing the dynamical transport cost in the ordinary two-marginal setting.
We demonstrate these capacities with several numerical examples.
with Ryan Abbott, Alex Botev, Denis Boyda, Kyle Cranmer, Daniel C. Hackett, Gurtej Kanwar, Alexander G.D.G Matthews, Sébastien Racanière, Danilo J. Rezende, Fernando Romero-López, Phiala E. Shanahan, and Julian M. Urban
Applications of normalizing flows to the sampling of field configurations in lattice gauge theory have so far been explored almost exclusively in two space-time dimensions. We report new algorithmic developments of gauge-equivariant flow architectures
facilitating the generalization to higher-dimensional lattice geometries. Specifically, we discuss masked autoregressive transformations with tractable and unbiased Jacobian determinants, a key ingredient for scalable and asymptotically exact flow-based
sampling algorithms. For concreteness, results from a proof-of-principle application to SU(3) lattice gauge theory in four space-time dimensions are reported.
Published: ArXiv Preprint
with Nicholas M. Boffi and Eric Vanden-Eijnden
We introduce a class of generative models based on the stochastic interpolant framework proposed in Albergo & Vanden-Eijnden (2023) that unifies flow-based and diffusion-based methods. We first show how to construct a broad class of continuous-time stochastic processes whose time-dependent probability density function bridges two arbitrary densities exactly in finite time.
These `stochastic interpolants' are built by combining data from the two densities with an additional latent variable, and the specific details of the construction can be leveraged to shape the resulting time-dependent density in a flexible way. We then show that the time-dependent density of the stochastic interpolant satisfies a first-order transport equation as well as a family of
forward and backward Fokker-Planck equations with tunable diffusion; upon consideration of the time evolution of an individual sample, this viewpoint immediately leads to both deterministic and stochastic generative models based on probability flow equations or stochastic differential equations with a tunable level of noise. The drift coefficients entering these models are time-dependent
velocity fields characterized as the unique minimizers of simple quadratic objective functions, one of which is a new objective for the score of the interpolant density. Remarkably, we show that minimization of these quadratic objectives leads to control of the likelihood for generative models built upon stochastic dynamics; by contrast, we show that generative models based upon a deterministic dynamics must, in addition,
control the Fisher divergence between the target and the model. Finally, we construct estimators for the likelihood and the cross-entropy of interpolant-based generative models, and demonstrate that such models recover the Schrödinger bridge between the two target densities when explicitly optimizing over the interpolant.
Published: JMLR
with Eric Vanden-Eijnden
A generative model based on a continuous-time normalizing flow between any pair of base and target probability densities is proposed. The velocity field of this flow is inferred from the probability current of a time-dependent density that interpolates between the base and the target in finite time. Unlike conventional normalizing flow inference methods based the maximum likelihood principle, which require costly backpropagation through ODE solvers, our interpolant approach leads to a simple quadratic loss for the velocity itself which is expressed in terms of expectations that are readily amenable to empirical estimation. The flow can be used to generate samples from either the base or target, and to estimate the likelihood at any time along the interpolant. In addition, the flow can be optimized to minimize the path length of the interpolant density, thereby paving the way for building optimal transport maps.
In situations where the base is a Gaussian density, we also show that the velocity of our normalizing flow can also be used to construct a diffusion model to sample the target as well as estimate its score. However, our approach shows that we can bypass this diffusion completely and work at the level of the probability flow with greater simplicity, opening an avenue for methods based solely on ordinary differential equations as an alternative to those based on stochastic differential equations.
Benchmarking on density estimation tasks illustrates that the learned flow can match and surpass maximum likelihood continuous flows at a fraction of the conventional ODE training costs, and compares with diffusions on image generation on CIFAR-10 and ImageNet 32x32. The method scales ab-initio ODE flows to previously unreachable image resolutions.
Published: ICLR 2023
with Ryan Abbott, Aleksandar Botev, Denis Boyda, Kyle Cranmer, Daniel C Hackett, Gurtej Kanwar, Alexander GDG Matthews, Sébastien Racanière, Ali Razavi, Dailo J Rezende, Fernando Romero-López, Phiala E Shanahan, and Julian M Urban
Machine learning methods based on normalizing flows have been shown to address important challenges, such as critical slowing-down and topological freezing, in the sampling of gauge field configurations in simple lattice field theories. A critical question is whether this success will translate to studies of QCD. This Proceedings presents a status update on advances in this area. In particular, it is illustrated how recently developed algorithmic components may be combined to construct flow-based sampling algorithms for QCD in four dimensions. The prospects and challenges for future use of this approach in at-scale applications are summarized.
Published: Lattice 2022
with Peter Lunts and Michael Lindsey
We numerically study the O(3) spin-fermion model, a minimal model of the onset of antiferromagnetic spin-density wave (SDW) order in a two-dimensional metal. We employ a Hybrid Monte Carlo (HMC) algorithm with a novel auto-tuning procedure, which learns the optimal HMC hyperparameters in an initial warmup phase. This allows us to study unprecedentedly large systems, even at criticality. At the quantum critical point, we find a critical scaling of the dynamical spin susceptibility χ(ω,q ) that strongly violates the Hertz-Millis form, which is the first demonstrated instance of such a phenomenon in this model. The form that we do observe provides strong evidence that the universal scaling is actually governed by the fixed point near perfect hot-spot nesting of Schlief, Lunts, and Lee [Phys. Rev. X 7, 021010 (2017)], even away from perfect nesting. Our work provides a concrete link between controlled calculations of SDW metallic criticality in the long-wavelength and small nesting angle limits and a microscopic finite-size model at realistic appreciable values of the nesting angle. Additionally, the HMC method we introduce is generic and can be used to study other fermionic models of quantum criticality, where there is a strong need to simulate large systems.
Published: Nature Communications
with Denis Boyda Kyle Cranmer, Daniel C. Hackett, Gurtej Kanwar, Sébastien Racanière, Danilo J Rezende, Fernando Romero-López, Phiala E. Shanahan, and Julian M Urban
Recent results suggest that flow-based algorithms may provide efficient sampling of field distributions for lattice field theory applications, such as studies of quantum chromodynamics and the Schwinger model. In this work, we provide a numerical demonstration of robust flow-based sampling in the Schwinger model at the critical value of the fermion mass. In contrast, at the same parameters, conventional methods fail to sample all parts of configuration space, leading to severely underestimated uncertainties.
Published: Physical Review D
with Denis Boyda, Gurtej Kanwar, Sébastien Racanière, Danilo Jimenez Rezende, Kyle Cranmer, Daniel C. Hackett, and Phiala E. Shanahan
We develop a flow-based sampling algorithm for SU(N) lattice gauge theories that is gauge-invariant by construction. Our key contribution is constructing a class of flows on an SU(N) variable (or on a U(N) variable by a simple alternative) that respect matrix conjugation symmetry. We apply this technique to sample distributions of single SU(N) variables and to construct flow-based samplers for SU(2) and SU(3) lattice gauge theory in two dimensions.
Published: Physical Review D
with Gurtej Kanwar, Sébastien Racanière, Danilo Jimenez Rezende, Julian M. Urban, Denis Boyda Kyle Cranmer, Daniel C. Hackett, and Phiala E. Shanahan
Algorithms based on normalizing flows are emerging as promising machine learning approaches to sampling complicated probability distributions in a way that can be made asymptotically exact. In the context of lattice field theory, proof-of-principle studies have demonstrated the effectiveness of this approach for scalar theories, gauge theories, and statistical systems. This work develops approaches that enable flow-based sampling of theories with dynamical fermions, which is necessary for the technique to be applied to lattice field theory studies of the Standard Model of particle physics and many condensed matter systems. As a practical demonstration, these methods are applied to the sampling of field configurations for a two-dimensional theory of massless staggered fermions coupled to a scalar field via a Yukawa interaction.
Published: Physical Review D
with Gurtej Kanwar, Denis Boyda, Kyle Cranmer, Daniel C. Hackett, Sébastien Racanière, Danilo Jimenez Rezende, and Phiala E. Shanahan
We define a class of machine-learned flow-based sampling algorithms for lattice gauge theories that are gauge invariant by construction. We demonstrate the application of this framework to U(1) gauge theory in two spacetime dimensions, and find that, at small bare coupling, the approach is orders of magnitude more efficient at sampling topological quantities than more traditional sampling procedures such as hybrid Monte Carlo and heat bath.
Published: Physical Review Letters
with Danilo Jimenez Rezende, George Papamakarios, Sébastien Racanière, Gurtej Kanwar, Phiala E. Shanahan, Kyle Cranmer
Normalizing flows are a powerful tool for building expressive distributions in high dimensions. So far, most of the literature has concentrated on learning flows on Euclidean spaces. Some problems however, such as those involving angles, are defined on spaces with more complex geometries, such as tori or spheres. In this paper, we propose and compare expressive and numerically stable flows on such spaces. Our flows are built recursively on the dimension of the space, starting from flows on circles, closed intervals or spheres.
Published: ICML 2020
with Dan Sehayek, Anna Golubeva, Bohdan Kulchytskyy, Giacomo Torlai, and Roger G. Melko
Generative modeling with machine learning has provided a new perspective on the data-driven task of reconstructing quantum states from a set of qubit measurements. As increasingly large experimental quantum devices are built in laboratories, the question of how these machine learning techniques scale with the number of qubits is becoming crucial. We empirically study the scaling of restricted Boltzmann machines (RBMs) applied to reconstruct ground-state wavefunctions of the one-dimensional transverse-field Ising model from projective measurement data. We define a learning criterion via a threshold on the relative error in the energy estimator of the machine. With this criterion, we observe that the number of RBM weight parameters required for accurate representation of the ground state in the worst case - near criticality - scales quadratically with the number of qubits. By pruning small parameters of the trained model, we find that the number of weights can be significantly reduced while still retaining an accurate reconstruction. This provides evidence that over-parametrization of the RBM is required to facilitate the learning process.
Published: Physical Review B -- Editor's Suggestions
Preprint: ArXiv
with G. Kanwar, and P. E. Shanahan
A Markov chain update scheme using a machine-learned flow-based generative model is proposed for Monte Carlo sampling in lattice field theories. The generative model may be optimized (trained) to produce samples from a distribution approximating the desired Boltzmann distribution determined by the lattice action of the theory being studied. Training the model systematically improves autocorrelation times in the Markov chain, even in regions of parameter space where standard Markov chain Monte Carlo algorithms exhibit critical slowing down in producing decorrelated updates. Moreover, the model may be trained without existing samples from the desired distribution. The algorithm is compared with HMC and local Metropolis sampling for ϕ4 theory in two dimensions.
Published: Physical Review D