Are AI weather models learning atmospheric physics? A sensitivity analysis of cyclone Xynthia

8 March 2025
colind88
News Feed

Introduction

A significant portion of atmospheric predictability is determined by the initial condition uncertainty. Small errors in the initial condition will grow as a function of forecast lead time amplified by the nonlinear and chaotic nature of the atmosphere¹. Sensitivity analyses allow one to examine how a target forecast metric/variable of interest (K), e.g., precipitation over the Western US, responds to (infinitesimal) perturbations to the atmospheric patterns at the initial time (Xi) at an upstream location, i.e., the gradient ∂K/∂Xi. Sensitivity studies are useful for several applications, such as targeted observing², observing network design³, parameter estimation⁴, and data assimilation⁵. The sensitivities are a representation of the links between the atmospheric variables through space and time, and can be potentially used to understand the physical processes that impact the forecast evolution. The latter is explored here with the new generation of Artificial Intelligence (AI) data-driven weather models to examine the physicality of the relationship learned.

AI data-driven models are revolutionizing the field of weather forecasting, showing competitive forecast skill with state-of-the-art dynamical (i.e., physics-based) models^{6,7,8,9,10,11,12}, at a much lower real-time computational cost (after training). They are based on neural networks¹³, which are combinations of differentiable linear operators and nonlinear activation functions, optimized by means of the backpropagation algorithm¹⁴, which modern libraries support via auto-differentiation. The backpropagation algorithm, along with its associated libraries, enables the computation of gradients (or sensitivity fields) by applying the “chain rule”. This process backpropagates information from the output layer (target metric of interest) to the input layer (initial condition). Remarkably, this can be done in a matter of minutes using a single CPU, or even seconds with a GPU.

The aforementioned properties make AI models attractive tools for initial condition sensitivity studies¹⁵, and can overcome current limitations of dynamical-based methods, which typically rely on the adjoint of a linearized version of the model, which linearizes the evolution of perturbations along the initial trajectory¹⁶. This generally implies that this method cannot be used as a sensitivity tool for lead times larger than a few days, beyond which the assumption of linearity typically does not hold. Moreover, the formulation of the adjoint is not trivial, especially with process-based parameterizations, and requires running the non-linear dynamical model, increasing the computational burden of the methodology. In turn, sensitivities from AI data-driven models are not constrained by linearity assumptions and are simpler to compute. For instance, recently the predictability limit of the 2021 Pacific Northwest heatwave was efficiently explored using backpropagation to analyze forecast error at weather and sub-seasonal timescales¹⁵.

Here, the ability of an AI data-driven model to replicate the sensitivities of cyclone Xynthia is examined, using values from the adjoint of a physics-based model¹⁷ as reference. This study will help elucidate whether these models are using information from physically-realistic relationships to produce the forecasts, and therefore potentially be used as initial condition sensitivity tools.

Results

Case study: cyclone Xynthia

Strong winds and heavy precipitation associated with mid-latitude cyclones can have significant socio-economic impacts. As a result, understanding the dynamics behind these events and enhancing forecast accuracy are crucial for society. Moreover, climate change simulations project an increase in both the intensity and frequency of precipitation extremes linked to extratropical cyclones under future emission scenarios, compared to pre-industrial conditions¹⁸. Cyclones are complex phenomena, often driven by non-linear interactions between dynamic and thermodynamic processes^{19,20,21,22,23,24}. In this context, cyclone Xynthia provides an ideal case study to explore the potential of AI-driven models as tools for sensitivity analysis and to evaluate the physical realism of the relationships these models uncover.

Cyclone Xynthia was an impactful event that caused important socio-economic damages in Western Europe, leading to high winds, extreme precipitation, flooding, and large ocean waves, resulting in more than 50 deaths across several European countries. It originated on February 26th, over a low-pressure center situated south of the Azores Islands (996 hPa, see Fig. 1), on the edge of an atmospheric river (AR). ARs are elongated structures in the atmosphere transporting moisture from the tropics to the mid-latitudes, and are often associated with extreme precipitation². In particular, this AR had a length of ~6500 km in the southwest-to-northeast direction—from the Azores to Southern Iberia—with an averaged width of ~700 km, and carried up to ~30 kg of water vapor per meter square (Fig. 2d). The system rapidly intensified as it passed over the coast of Portugal and Northern Spain, reaching Western France on February 28th with winds greater than 30 m/s, 7.5 m waves, and producing almost 2 m of rain. After March 1st, the intensity of the event decreased as it moved inland towards Central Europe. The evolution of cyclone Xynthia is shown in Fig. 1, and the reader is referred to the literature for more details²⁵.

**Fig. 1: Evolution of cyclone Xynthia.**

**Fig. 2: Comparison of physics-based and AI-based sensitivity fields for cyclone Xynthia.**

A sensitivity analysis of cyclone Xynthia using a dynamical and adjoint model was performed in a reference study¹⁷ (hereafter referred to as D14) to investigate the drivers and predictability of this extreme event. In the latter study, sensitivity fields of kinetic energy (KE) over the Bay of Biscay (highlighted by the magenta box in Fig. 1) were computed for a 36-h forecast, using February 26th at 12 UTC as the initial condition. These fields were obtained through the Coupled Ocean-Atmosphere Mesoscale Prediction System (COAMPS) adjoint model²⁶, which employs a non-hydrostatic dynamical core. The model was resolved over a nested domain with 45-km horizontal grid spacing for the coarse mesh and 15-km grid spacing and included a microphysics parameterization (predicting cloud water, rainwater, snow, and graupel concentrations) and a subgrid-scale parameterization for deep convection²⁷. The sensitivities from D14 are replotted here to facilitate comparison with the AI-based results.

The comparison presented is purely qualitative due to significant differences in how sensitivities are computed between the AI and D14 approaches. First, the AI model is initialized using the European Centre for Medium-Range Weather Forecast Reanalysis version 5 (ERA5²⁸) data, with initial condition sensitivities computed relative to this dataset, whereas the COAMPS model is initialized from the National Oceanic and Atmospheric Administration (NOAA) Global Forecast System (GFS). Second, the set of variables simulated differs between the models. Therefore, AI-based sensitivities are compared with the closest available proxies in COAMPS. For example, relative humidity from the AI model is compared to specific humidity in COAMPS, as shown in Fig. 3. Third, the vertical and spatial resolutions also vary. While COAMPS uses z-sigma vertical levels and a grid mesh of 45 km (coarse mesh) or 15 km (fine mesh) horizontal grid increments, the AI model simulates variables at 13 pressure levels on a 0.25° latitude-longitude grid. Fourth, the method for calculating kinetic energy differs slightly: D14 includes the vertical wind component, whereas the AI model considers only the zonal and meridional components (see section Methods). All of these factors should be taken into account when analyzing the results in the following sections.

**Fig. 3: Vertical cross-section of the sensitivity fields of kinetic energy over the Bay of Biscay for cyclone Xynthia.**

Additionally, cyclone Xynthia was included in the AI model’s training dataset (see section Data), but the model was not specifically optimized for 36-h forecasts during training (see section Methods), which is the lead time considered in this study. Even if specifically optimized to this aim this would not pose a concern, since the primary goal is to validate the spatio-temporal relationships the model has learned (and not measuring forecasting error), framing these kinds of experiments in the training dataset is a well-established practice in the literature^29,30,31. Moreover, AI models typically minimize an error function without incorporating explicit physical knowledge in their design. Cyclone Xynthia appeared in only 0.01% of the training samples and covered 0.06% of the global grid. This was a rare event, following an unusual trajectory for cyclones in that region²⁵, which typically move easterly toward southern Europe when originating near the Azores³². Given these factors, the contribution of Xynthia to the overall model loss and coefficient optimization is likely negligible. Therefore, Xynthia serves as a valid case study for analyzing spatio-temporal dependencies without concerns about overfitting. In fact, the demonstration of coherent spatio-temporal links would provide evidence of generalization, while the emergence of a non-physical pattern would suggest overfitting.

Forecasting cyclone Xynthia

Figure 1 shows the evolution of cyclone Xynthia as represented by ERA5²⁸ (first row), which is used as ground truth in this study, and as predicted by the Spherical Fourier Neural Operator (SFNO⁷) AI model (second row; see details in “Methods” section). The SFNO predicts a rapid intensification of Xynthia with winds carrying kinetic energy approaching 200 m²/s² and the low-pressure system moving towards the coast of Portugal by February 27th at 12 UTC, which is consistent with ERA5. By February 28th at 00 UTC, Xynthia reached the Bay of Biscay and Western coast of France. The SFNO accurately captures the strong pressure gradient in this area and predicts slightly less kinetic energy than ERA5, but still surpassing values of 200 m²/s² in the region. A comparison of the kinetic energy at the 36-h forecast time between SFNO and the Integrated Forecasting System (IFS),—ECMWF’s state-of-the-art dynamical prediction model—shows similar values of root mean squared error between models, with 58, and 82 m²/s², respectively. These values were averaged over the Bay of Biscay on February 28th at 00 UTC (magenta box in Fig. 1), and ERA5 was used as the ground truth. The IFS forecast was initialized using its own analysis dataset, which may partly explain the slightly higher RMSE observed in IFS compared to SFNO when validated against ERA5.

Initial condition sensitivities

Figure 2 shows the sensitivity fields of kinetic energy (KE) over the Bay of Biscay (highlighted by the green box in Fig. 2; see the “Methods” for further details) for a forecast lead time of 36 h, with the initial condition set to 12 UTC on February 26th. Specifically, the 45-km physics-based sensitivities for water vapor (g/kg), 700-hPa potential temperature (m² s⁻² K⁻¹), and 700-hPa meridional wind (m/s) from D14 are compared to the AI-based sensitivities for integrated water vapor (kg/m²), 700-hPa air temperature (K), and 700-hPa meridional wind (m/s), respectively. The 700 hPa level is chosen because it exhibits prominent behavior during the Xynthia cyclone (see Fig. 3). To improve the visualization, the color scales for each variable are different. For the AI-based maps, there are two color scales: one for the sensitivities computed relative to the standardized version of the inputs (for direct comparison with the physics-based sensitivities) and one for the sensitivities computed relative to the non-standardized version of the inputs (which allow comparison among AI-sensitivities, indicating the relative importance of each variable to the KE; see Methods section). Positive values (shown in red) indicate areas where infinitesimal increases in the initial condition fields lead to an increase in KE at the final time, while negative values (shown in blue) indicate areas where infinitesimal decreases in the initial conditions lead to an increase in KE.

The results reveal that a small moisture filament, ~40 km wide, within the atmospheric river (AR) is the main contributor to cyclone development (Fig. 2a, d). This filament extends west of the Azores, in a southwest-northeast direction, toward the coast of Portugal, with maximum sensitivity values between the 500 and 1000 hPa levels (Fig. 3a). Temperature sensitivities align with this moisture filament (Fig. 2b), while wind sensitivities are concentrated in the short-wave troughs over the Central and North Atlantic. This suggests that stronger southerly winds at the initial time lead to increased kinetic energy at the final time (Fig. 2c). Additionally, negative water vapor sensitivities flank the positive moisture filament within the AR, indicating that an intensification of the winds results from increasing the moisture gradient in this region (Fig. 2a). The AI-based and physics-based sensitivities are remarkably similar, showing nearly identical spatial structures for the variables examined. For instance, for the meridional wind, the S-shaped sensitivity structure west of the coast of Portugal along the short-wave trough (Fig. 2c) is well captured by the AI model (Fig. 2f). Both the integrated water vapor and temperature fields also show elongated structures along the AR. However, while the AI-based model’s maximum sensitivity values are generally lower than those of the physics-based model, as clearly illustrated in the meridional wind panels (Fig. 2c, f), the aggregated sensitivities across the map (shown in the top-right values) are of similar magnitude. This can be explained by the non-zero influence of non-relevant features due to complex interactions within the neural network. Also, despite having a higher-resolution grid, the AI-based model displays coarser sensitivities than the physics-based ones. This is likely due to the effective resolution of AI models, which is documented to be lower than the resolution used during the model’s training³³.

The AI-based sensitivities for geopotential are also shown (Fig. 2g–i), and they are notably more pronounced than those for the other variables (see standardized scales). These sensitivities exhibit wave-like patterns, which, based on experience from adjoint sensitivity studies³⁴, may result from the projection of sensitivities onto a Rossby wave packet in the waveguide. This packet interacts with precipitation and diabatic processes, propagating downstream. As the lead time increases from 24 to 48 h, these wave-like patterns shift westward, moving further offshore and upstream of the response function. This westward shift is consistent with the expected eastward propagation of the signal over time, driven by the background westerly winds, and highlights the physical consistency of the relationships learned by the AI model.

Figure 3b shows the relative humidity (RH) sensitivities for a vertical cross-section connecting two locations in the northwest-southeast direction (see the inner panel in Fig. 3), crossing the atmospheric river (AR). The positions of these two locations were determined based on the sensitivity maximums shown in Fig. 2. These same two locations are used to plot the specific humidity (SH) sensitivities from the physics-based model in D14 (Fig. 3a). As in Fig. 2, the comparison between the approaches is qualitative, since SFNO (COAMPS) does not include SH (or RH) in its variable set. The x-axis represents the longitudes between the two locations, and the y-axis corresponds to pressure levels (refer to section Data for a complete list of pressure levels in SFNO). The areas of maximum sensitivity are located between longitudes −25° and −15° spanning from the surface to 450 hPa. The positive-negative dipole indicates that an increased moisture gradient within the AR at the initial time leads to a corresponding increase in kinetic energy over the Bay of Biscay at the final time. This dipole tilts against the sloping warm frontal zone. Overall, the vertical sensitivities align well with the physics-based fields, with only minor differences in the positions of the maximum values. These differences can be partially explained by the fact that the two models are fundamentally different, with distinct horizontal and vertical resolutions, as well as notable methodological differences in how predictions are generated.

Finally, a positive sensitivity is observed at the upper levels of the atmosphere, which may suggest a spurious or noisy correlation learned by the AI model. This is particularly noteworthy, since to our knowledge there is no physical principle supporting such a relationship between upper-level atmospheric conditions and low-level final-time kinetic energy. Identifying these non-physical links is important in AI-based weather modeling, as it could help guide the development of more reliable and physically consistent models in the future.

Evolution of sensitivity-based perturbations

Sensitivity-based perturbations, scaled to align with estimates of initial condition uncertainty (see Methods), are applied to the variables at the initial time. Figure 4 shows the difference between the perturbed and the control forecasts for the kinetic energy at 12, 24, and 36 h of forecast lead time. Two different perturbed forecasts are generated. The first one adds the sensitivity-based perturbation to the initial condition (positive perturbation, first row), while the second one changes the sign (negative perturbation, second row). The evolved kinetic energy perturbations grow in intensity from overall 5–25 m²/s² at 12 h of forecast lead time to 15–50 m²/s² at final time, and they follow the trajectory of the low-pressure system at each forecast lead time until it reaches the Bay of Biscay on February 28th at 00 UTC. The positive perturbations, which added moisture to the initial condition and increased the moisture and temperature gradients along the AR, simulate an intensification of the winds related to Xynthia. In contrast, negative perturbations produce almost the opposite response, overall decreasing the kinetic energy of Xynthia relative to the control forecast. This is somewhat expected since perturbation fields were built based on the sensitivity fields. These properties were also identified in D14 (see Figure 12 therein). The symmetry between the positive and negative perturbations at final time is evidence of quasi-linear perturbation growth over this time period.

**Fig. 4: Evolved perturbation fields for the kinetic energy at 12, 24 and 36 h of lead time.**

In D14 physics-based simulations driven by adjoint-based perturbations to the initial condition with magnitudes comparable to initial condition uncertainty, were compared to a “control” forecast, suggesting that an even more extreme event than the one that actually happened was plausible (see Figures 6, 12, and 13 in D14). Figure 5 presents the control and perturbed AI forecasts at the 36-h forecast time over the Bay of Biscay, showing the predicted values of kinetic energy (in colors) and mean sea level pressure every 4 hPa (in contours). The forecast shows a low-level jet along the southern area of the low pressure, from the west coast of Spain to the west coast of France. Positive and negative perturbed forecasts increase and decrease the kinetic energy in the region, reaching maxima of 70 and −70 m²/s² respectively, while also modifying the low-pressure system, showing a comparable response to the one shown in D14, and also presenting similar spatial structures (Fig. 12 in D14). Differences in the mean sea level pressure between the perturbed and control forecasts are represented by contours in Fig. 5 every 1 hPa, showing a north-to-south gradient for the evolved perturbation fields which is consistent with the results from the dynamical model (Fig. 12 in D14). However, the magnitude of the change in pressure is smaller than the one from D14, which showed evolved perturbation fields every 2 hPa.

**Fig. 5: Control and perturbed forecasts, and evolved perturbations of kinetic energy for cyclone Xynthia at 36 h of lead time.**

Discussion

AI models are examined as initial condition sensitivity tools to explore the links learned between different fields as the forecast evolves. This provides a mechanism to assess the physical realism of the AI models, which is currently under-explored. So far, these models have produced physically consistent responses to simple dynamical tests³⁵, but have failed in preserving key atmospheric balances³³, and presented error-growth patterns from small-amplitude initial condition perturbations that did not reflect the characteristic “butterfly” effect of dynamical models³⁶. Here, sensitivities based on the Spherical Fourier Neural Operator (SFNO) AI model are compared to those from a dynamical model, which are used as references of physically plausible links in the atmosphere. The case of the study is cyclone Xynthia, which was an impactful extreme event in Western Europe in 2010, whose sensitivity fields were already examined by means of dynamical and adjoint models¹⁷.

The SFNO exhibits high sensitivities for the integrated water vapor, which is consistent with the results from D14, and is aligned with other numerous studies where moisture was identified as one key driver for cyclone development³⁷. Moreover, SFNO moisture sensitivity maximums are located over an anomalous warm filament of air within an atmospheric river, the same as in D14. Similar spatial structures between the dynamical and the AI model are found for other variables, such as temperature and the meridional wind velocity, and vertical sensitivities are also remarkably similar for relative humidity. Sensitivity-based perturbations simulate increased values of kinetic energy over the Bay of Biscay, similar to what is found in D14. The AI data-driven model was robust to a set of simple physical tests, such as shifting the perturbation fields 20° West, or 20° West and 25° North over areas of low sensitivity, showing little-to-none differences in the forecasts as compared to the control one (not shown). Moreover, the evolved kinetic energy perturbations always followed the path of the low-pressure system, showing no changes elsewhere in the domain. The sensitivities amplify and grow with an increased forecast lead time, and exhibit wave structures, similar to those usually appearing in the fields from adjoint models. In contrast to the dynamical models (in which geopotential is a derived quantity, not a model state variable), SFNO exhibits a strong dependence on the geopotential, partially masking the influence of the remainder of the variables. This might imply a certain lack of consistent inter-variable links in the model, as exemplified by the behavior of the evolved perturbation fields, where the sea level pressure did not show a change as big as in D14 under moisture-based-perturbation simulations. This strong dependence on geopotential has been further confirmed by conducting similar experiments in other cyclones (Typhoon Lupit²³; and Typhoon Nuri³⁸; not shown), and was also identified in other deep learning models for statistical downscaling, where this behavior was attributed to potential co-variabilities within the set of explanatory variables in the model^29,30.

The aforementioned properties outline the potential of AI data-driven models to learn physical relationships to some degree, and their ability to automatically identify plausible links between atmospheric variables over space and time. While improvements are still required to capture fully consistent and complex physical relationships, these models are already able to capture important relationships between atmospheric variables, and produce accurate forecasts based on them. One interesting conclusion of this study is that obtaining similar results from different modeling approaches -adjoint of a physical model and an AI model- support the robustness and generality of the results. That is, the general sensitivity characteristics are not model/technique dependent (although the specific details do vary with model). This study clearly exemplifies the potential of these tools for sensitivity studies, especially given the rapid computation of the gradients. Moreover, general properties of neural networks and the backpropagation algorithm can enable unprecedented sensitivity studies at longer timescales¹⁵ (>5 days), since they are not constrained by linear assumptions, as is the case with the adjoint model, and will be explored in future work.

Methods

AI data-driven model: Spherical Fourier Neural Operator (SFNO)

The Spherical Fourier Neural Operator (SFNO) is an AI data-driven model designed to forecast the next 6 h of weather at 0.25° of spatial resolution, given the same set of variables at initial time⁷. Forecasts at longer lead times can be produced by using the model outputs as inputs to the next iteration (auto-regression). SFNO is an updated version of Fourcastnet⁶, which built on the Adaptive Fourier Neural Operator (AFNO³⁹) to perform the Fast Fourier Transform (FFT) with a Vision Transformer⁴⁰ backbone, therefore taking advantage of the benefits of self-attention to extract meaningful patterns from spatial data. This model configuration pioneered the use of AI data-driven models for weather forecasting, being the first ever to achieve forecast skill on par with the Integrated Forecasting System (IFS) from the European Centre for Medium-Range Weather Forecasts (ECMWF) -later followed by Graphical Neural Networks-based topologies (GraphCast⁸ and AIFS¹⁰) or other transformer-based ones (FuXi⁹ and Pangu-Weather⁴¹). SFNO builds on spherical harmonics as opposed to the FFT of AFNO. The (trained) model can be downloaded from the ECMWF AI-models github: https://github.com/ecmwf-lab/ai-models. Once trained, 36-h forecasts for cyclone Xynthia were produced in the order of minutes with a single CPU, though significantly faster times (on the order of seconds) could be achieved using GPUs.

Computation of initial condition sensitivities with AI data-driven models

To compute the sensitivity of the kinetic energy (KE) over the Bay of Biscay at the final time (00 UTC, February 28th), relative to the input features at the initial time (12 UTC, February 26th), five steps are followed. The process is outlined below:

1.
Standardization of input variables: The input variables are first standardized using the training mean and standard deviation. A complete list of input variables is provided in the “Data” section.
2.
Model forecasting: The AI data-driven model’s auto-regression mechanism is then unfolded. Specifically, the model is iterated six times to generate a 36-h forecast, where the outputs from each step are used as inputs for the next.
3.
De-standardization of prediction: The prediction is de-standardized by applying the training mean and standard deviation, bringing the results back to the original scale of the variables.
4.
Kinetic energy calculation: The kinetic energy is calculated by summing the squares of the predicted zonal (u) and meridional (v) wind components at each grid point within the Bay of Biscay (43-48°N, 6-0°W, highlighted by the magenta area in Fig. 1). The result is then averaged over the total number of grid points (N), such that (KE=0.5{Sigma }_{i}({u}_{i}^{2}+{v}_{i}^{2})/N).
5.
Sensitivity computation: Finally, the gradients—i.e., the partial derivatives of KE with respect to the input features at the initial time-are calculated using the chain rule and the backpropagation algorithm from PyTorch’s automatic differentiation package. The chain rule for sensitivity computation is outlined in Eq. (1). This step is conceptually similar to the gradient calculations used during model training for gradient descent.

The sensitivity fields were computed within minutes on a single CPU, though significantly faster times (on the order of seconds) could be achieved using GPUs.

$$frac{partial KE}{partial {bf{X}}}=frac{partial KE}{partial {bf{Y}}}times frac{partial {bf{Y}}}{partial {F}_{6}}times frac{partial {F}_{6}}{partial {F}_{5}}times ldots times frac{partial {F}_{1}}{partial {{bf{X}}}^{{prime} }}times frac{partial {{bf{X}}}^{{prime} }}{partial {bf{X}}}$$

(1)

Where X and ({{bf{X}}}^{{prime} }) are the raw and standardized input features, respectively, F_z the model output at each auto-regressive step z, and Y is the de-standardized predicted zonal and meridional wind fields over the Bay of Biscay. Gradients relative to the standardized version of the inputs can be easily computed by ignoring the last term in Equation (1).

Sensitivity-based perturbations

To facilitate comparison, the perturbations to the initial condition (ΔX) are derived using a strategy similar to that in D14 (Eqs. (2), (3)). First, the sensitivity fields are multiplied by the square of the difference between atmospheric features at the final (forecast, F) and initial (analysis, X) times (see term w in Eq. (2)). This scaling approach ensures that the sensitivities are comparable across variables and levels, preventing any bias that could otherwise lead to suboptimal perturbations. Additionally, this scaling reflects the largest forecast differences, grounded in the model’s behavior rather than arbitrary assumptions. Next, the resulting product is adjusted by a scaling factor, s (Eq. (3)). The magnitude of the initial perturbations is chosen to align with analysis error estimates, which typically show maximum values of 1 m/s in the wind field and 1 K in the temperature field. These error estimates are consistent with those found in radiosonde and drop sonde observations as well as those represented in data assimilation systems and thus provide a reasonable lower bound for initial condition uncertainty¹⁷. Based on this criterion, a unique scaling factor of 0.4 is applied to adjust the perturbations for every variable. Once the perturbation fields are computed, they are added to the initial condition and are evolved using the AI data-driven model. To establish the fairest comparison with D14, every variable in SFNO’s input set was perturbed, therefore following the same approach therein. Hence, for each input variable X:

$$w={({{bf{F}}}_{{t}_{0}+Delta t}-{{bf{X}}}_{{t}_{0}})}^{2}$$

(2)

$$Delta {bf{X}}=swfrac{partial KE}{partial {bf{X}}}=0.4wfrac{partial KE}{partial {bf{X}}}$$

(3)

Data

Data for the initial condition is taken from ECMWF Reanalysis version 5 (ERA5²⁸), a reanalysis dataset at 0.25° of spatial resolution developed by the ECMWF, that provides atmospheric information at a large number of pressure levels and 1-h intervals for the period 1979-present. ERA5 is built by combining data from a global station network and “first guess” forecasts from Numerical Weather Prediction (NWP) models, by means of data assimilation algorithms, and is known to be the most accurate representation of the atmosphere. SFNO was trained on ERA5 using data at 0, 6, 12, and 18 UTC from the period 1979–2015. SFNO presents the following set of 73 variables: surface air temperature; mean sea level pressure; surface pressure; integrated water vapor; surface zonal and meridional wind; zonal and meridional wind at 100 meters; relative humidity, air temperature, zonal and meridional wind, and geopotential, at the following pressure levels—50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, and 1000 hPa. The input variables are standardized using the spatially averaged mean and spatially standard deviation from the training data.

Data availability

The datasets generated in this study, including model simulations and sensitivity fields, are available through the University of California San Diego Library⁴².

Code availability

The underlying code for this study is available in a Github repository and can be accessed via this link: https://github.com/CW3E/AI-sensitivity.

References

Lorenz, E. N. The predictability of a flow which possesses many scales of motion. Tellus 21, 289–307 (1969).

Article Google Scholar
Ralph, F. M., Dettinger, M. D., Rutz, J. J. & Waliser, D. E. Atmospheric Rivers, Vol. 1 (Springer, 2020).
Torn, R. D. & Hakim, G. J. Ensemble-based sensitivity analysis. Monthly Weather Rev. 136, 663–677 (2008).

Article Google Scholar
Zhao, Q. & Lu, X. Parameter estimation in a three-dimensional marine ecosystem model using the adjoint technique. J. Mar. Syst. 74, 443–452 (2008).

Article Google Scholar
Griffith, A. K. & Nichols, N. K. Adjoint methods in data assimilation for estimating model error. Flow. Turbul. Combust. 65, 469–488 (2000).

Article Google Scholar
Pathak, J. & et al. Fourcastnet: a global data-driven high-resolution weather model using adaptive Fourier neural operators. arXiv preprint arXiv:2202.11214 (2022).
Bonev, B. et al. Spherical Fourier neural operators: learning stable dynamics on the sphere. In Proc. International Conference on Machine Learning, 2806–2823 (2023).
Lam, R. et al. Learning skillful medium-range global weather forecasting. Science 382, 1416–1421 (2023).

Article CAS Google Scholar
Chen, L. et al. Fuxi_ A cascade machine learning forecasting system for 15-day global weather forecast. npj Clim. Atmos. Sci. 6, 190 (2023).

Article Google Scholar
Lang, S. et al. Aifs-ecmwf’s data-driven forecasting system. arXiv preprint arXiv:2406.01465 (2024).
Charlton-Perez, A. J. et al. Do ai models produce better weather forecasts than physics-based models? A quantitative evaluation case study of storm ciarán. npj Clim. Atmos. Sci. 7, 93 (2024).

Article Google Scholar
Kochkov, D. et al. Neural general circulation models for weather and climate. Nature 632, 1060–1066 (2024).

Article CAS Google Scholar
Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).
Hecht-Nielsen, R. Theory of the backpropagation neural network. In Neural Networks for Perception, 65–93 (Academic Press, 1992).
Vonich, P. T. & Hakim, G. J. Predictability limit of the 2021 Pacific Northwest heatwave from deep learning sensitivity analysis. Geophys. Res. Lett. 51, e2024GL110651 (2024).

Article Google Scholar
Errico, R. M. What is an adjoint model? Bull. Am. Meteorol. Soc. 78, 2577–2592 (1997).

2.0.CO;2″ data-track-item_id=”10.1175/1520-0477(1997)078<2577:WIAAM>2.0.CO;2″ data-track-value=”article reference” data-track-action=”article reference” href=”https://doi.org/10.1175%2F1520-0477%281997%29078%3C2577%3AWIAAM%3E2.0.CO%3B2″ aria-label=”Article reference 16″ data-doi=”10.1175/1520-0477(1997)078<2577:WIAAM>2.0.CO;2″>Article Google Scholar
Doyle, J. D., Amerault, C., Reynolds, C. A. & Reinecke, P. A. Initial condition sensitivity and predictability of a severe extratropical cyclone using a moist adjoint. Mon. Weather Rev. 142, 320–342 (2014).

Article Google Scholar
Hawcroft, M., Walsh, E., Hodges, K. & Zappa, G. Significantly increased extreme precipitation expected in Europe and North America from extratropical cyclones. Environ. Res. Lett. 13, 124006 (2018).

Article Google Scholar
Hoskins, B. & Berrisford, P. A potential vorticity perspective of the storm of 15-16 October 1987. Weather 43, 122–129 (1988).

Article Google Scholar
Ulbrich, U., Fink, A. H., Klawa, M. & Pinto, J. G. Three extreme storms over Europe in December 1999. Weather 56, 70–80 (2001).

Article Google Scholar
Wernli, H., Dirren, S., Liniger, M. A. & Zillig, M. Dynamical aspects of the life cycle of the winter storm ‘lothar’(24–26 december 1999). Q. J. R. Meteorol. Soc. 128, 405–429 (2002).

Article Google Scholar
Fink, A. H., Brücher, T., Ermert, V., Krüger, A. & Pinto, J. G. The European storm Kyrill in January 2007: synoptic evolution, meteorological impacts and some considerations with respect to climate change. Nat. Hazards Earth Syst. Sci. 9, 405–423 (2009).

Article Google Scholar
Doyle, J., Reynolds, C. & Amerault, C. Diagnosing tropical cyclone sensitivity. Comput. Sci. Eng. 13, 31–39 (2010).

Article Google Scholar
Dacre, H. F. A review of extratropical cyclones: observations and conceptual models over the past 100 years. Weather 75, 4–7 (2020).

Article Google Scholar
Liberato, M. L. R. et al. Explosive development of winter storm Xynthia over the southeastern North Atlantic Ocean. Nat. Hazards Earth Syst. Sci. Discuss 1, 443–470 (2013).

Google Scholar
Amerault, C., Zou, X. & Doyle, J. Tests of an adjoint mesoscale model with explicit moist physics on the cloud scale. Mon. Weather Rev. 136, 2120–2132 (2008).

Article Google Scholar
Molinari, J. A general form of Kuo’s cumulus parameterization. Mon. Weather Rev. 113, 1411–1416 (1985).

2.0.CO;2″ data-track-item_id=”10.1175/1520-0493(1985)113<1411:AGFOKC>2.0.CO;2″ data-track-value=”article reference” data-track-action=”article reference” href=”https://doi.org/10.1175%2F1520-0493%281985%29113%3C1411%3AAGFOKC%3E2.0.CO%3B2″ aria-label=”Article reference 27″ data-doi=”10.1175/1520-0493(1985)113<1411:AGFOKC>2.0.CO;2″>Article Google Scholar
Hersbach, H. et al. The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146, 1999–2049 (2020).

Article Google Scholar
Baño-Medina, J., Iturbide, M., Fernández, J. & Gutiérrez, J. M. Transferability and explainability of deep learning emulators for regional climate model projections: perspectives for future applications. Artif. Intell. Earth Syst. 3, e230099 (2024).

Google Scholar
Balmaceda-Huarte, R., Baño-Medina, J., Olmo, M. E. & Bettolli, M. L. On the use of convolutional neural networks for downscaling daily temperatures over southern South America in a climate change scenario. Clim. Dyn. 62, 383–397 (2024).

Article Google Scholar
González-Abad, J., Baño-Medina, J. & Gutiérrez, J. M. Using explainability to inform statistical downscaling based on deep learning beyond standard validation approaches. J. Adv. Model. Earth Syst. 15, e2023MS003641 (2023).

Article Google Scholar
Gramcianinov, C. B. et al. Analysis of Atlantic extratropical storm tracks characteristics in 41 years of era5 and cfsr/cfsv2 databases. Ocean Eng. 216, 108111 (2020).

Article Google Scholar
Bonavita, M. On some limitations of current machine learning weather prediction models. Geophys. Res. Lett. 51, e2023GL107377 (2024).

Article Google Scholar
Doyle, J. D., Reynolds, C. A. & Amerault, C. Adjoint sensitivity analysis of high-impact extratropical cyclones. Mon. Weather Rev. 147, 4511–4532 (2019).

Article Google Scholar
Hakim, G. J. & Masanam, S. Dynamical tests of a deep-learning weather prediction model. Artif. Intell. Earth Syst. 3, e230090 (2024).
Selz, T. & Craig, G. C. Can artificial intelligence-based weather prediction models simulate the butterfly effect? Geophys. Res. Lett. 50, e2023GL105747 (2023).

Article Google Scholar
Kuo, Y. H., Shapiro, M. A. & Donall, E. G. The interaction between baroclinic and diabatic processes in a numerical simulation of a rapidly intensifying extratropical marine cyclone. Mon. Weather Rev. 119, 368–384 (1991).

2.0.CO;2″ data-track-item_id=”10.1175/1520-0493(1991)119<0368:TIBBAD>2.0.CO;2″ data-track-value=”article reference” data-track-action=”article reference” href=”https://doi.org/10.1175%2F1520-0493%281991%29119%3C0368%3ATIBBAD%3E2.0.CO%3B2″ aria-label=”Article reference 37″ data-doi=”10.1175/1520-0493(1991)119<0368:TIBBAD>2.0.CO;2″>Article Google Scholar
Doyle, J. D., Reynolds, C. A., Amerault, C. & Moskaitis, J. Adjoint sensitivity and predictability of tropical cyclogenesis. J. Atmos. Sci. 69, 3535–3557 (2012).

Article Google Scholar
Guibas, J. et al. Adaptive Fourier neural operators: efficient token mixers for transformers. In Proc. International Conference on Representation Learning (2022).
Dosovitskiy, A. et al. An image is worth 16 × 16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
Bi, K. et al. Accurate medium-range global weather forecasting with 3d neural networks. Nature 619, 533–538 (2023).

Article CAS Google Scholar
Baño-Medina, J. et al. Are AI weather models learning atmospheric physics? A sensitivity analysis of cyclone Xynthia. UC San Diego Library Digital Collections https://doi.org/10.6075/J0QV3MWT (2025).

Download references

Acknowledgements

This work is supported by the Office of Naval Research (ONR) Award number N000142412731, the California Department of Water Resources Atmospheric River Program Phase IV (Grant 4600014942), and U.S. Army Corps of Engineers (USACE) Forecast Informed Reservoir Operations Phase 2 Award (USACE W912HZ192). A.S. was partly supported by the National Aeronautics and Space Administration (Grant 80NSSC22K0926). J.D.D. and C.A.R. gratefully acknowledge the support of the ONR Study of Air-Sea Fluxes and Atmospheric River Intensity (SAFARI) initiative, program element 0602435N.

Author information

Authors and Affiliations

Center for Western Weather and Water Extremes, Scripps Institution of Oceanography, University of California San Diego, San Diego, CA, USA

Jorge Baño-Medina, Agniv Sengupta, Duncan Watson-Parris & Luca Delle Monache
U.S. Naval Research Laboratory, Monterey, CA, USA

James D. Doyle & Carolyn A. Reynolds
Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA

Duncan Watson-Parris

Contributions

J.B., L.D.M., and A.S. conceived the idea of the study. J.B. performed the computations of the AI-based sensitivities and the AI simulation runs and produced all the figures. J.D.D. and C.A.R. produced the physics-based sensitivities. All the authors contributed to the writing of the manuscript and the analysis of the results.

Corresponding author

Correspondence to Jorge Baño-Medina.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baño-Medina, J., Sengupta, A., Doyle, J.D. et al. Are AI weather models learning atmospheric physics? A sensitivity analysis of cyclone Xynthia. npj Clim Atmos Sci 8, 92 (2025). https://doi.org/10.1038/s41612-025-00949-6

Download citation

Received: 29 October 2024
Accepted: 11 February 2025
Published: 07 March 2025
DOI: https://doi.org/10.1038/s41612-025-00949-6