Sal Achieves Improved Classification Across 10 Benchmarks With Novel Learning

1 February 2026
colind88
News Feed

Scientists are tackling the limitations of backpropagation, the standard learning algorithm for artificial neural networks, with a new method called Selective Adaptive Learning (SAL). Fanping Liu, Hua Yang, and Jiasi Zou, all from ROCK AI and Renmin University of China, detail how SAL combines selective parameter activation with adaptive area partitioning to overcome issues with biological implausibility and gradient interference common in dense neural networks. This research is significant because it demonstrates competitive performance on ten benchmark datasets, even in very deep and large-scale models, and offers a more biologically inspired approach to scalable neural training, potentially paving the way for more efficient and robust artificial intelligence systems.

This research is significant because it demonstrates competitive performance on ten benchmark datasets, even in very deep and large-scale models, and offers a more biologically inspired approach to scalable neural training, potentially paving the way for more efficient and robust artificial intelligence systems.

Selective learning decouples gradients for neural networks

Specifically, SAL decomposes the parameter space into mutually exclusive, sample-dependent regions, effectively decoupling gradients across divergent semantic patterns and refining feedback alignment. This innovative approach allows for stable convergence and improved classification performance across ten standard benchmarks. The team achieved competitive convergence rates with SAL, demonstrating its efficacy in training Deep learning models. Experiments show that SAL’s core mechanism, Learned-Frozen Decoupled Routing, dynamically partitions the input space, directing samples to specialized parameter subspaces.

By isolating activation paths, the method reduces gradient interference and facilitates the use of fixed, asymmetric feedback connections, mitigating stability issues commonly found in deep Feedback Alignment networks. This decoupling enables scaling model capacity with limited computational cost, activating and updating only a single area per sample to promote global sparsity. This study unveils a biologically inspired training method that circumvents the weight symmetry requirement of backpropagation. SAL integrates an asymmetric error propagation scheme with local learning signals, allowing each layer to optimise against localised alignment objectives.

The resulting layer-wise decoupled update rules structurally permit parallel parameter updates, offering a pathway towards more efficient and scalable neural network training. Empirical evidence confirms that SAL outperforms or matches standard backpropagation baselines, particularly in deep regimes reaching up to 128 layers and large-scale models containing up to 1 billion parameters. Furthermore, the research establishes a connection between contemporary neuroscientific insights and artificial neural network design. By adopting the principle of functional specialisation observed in the cerebral cortex, SAL routes inputs to specialised sub-regions, ensuring synaptic updates remain locally relevant and globally coherent. This approach stabilises learning dynamics without relying on end-to-end signal propagation, offering a plausible alternative to backpropagation and contributing to the ongoing study of scalable neural training paradigms. The work opens new avenues for developing more efficient, biologically plausible, and scalable deep learning systems.

Selective learning via decoupled gradient alignment improves generalization

Specifically, SAL decomposes the parameter space into mutually exclusive regions dependent on the input sample, effectively decoupling gradients across divergent semantic patterns. This decoupling strategy mitigates interference and allows for refined feedback alignment, addressing the weight symmetry requirements typically associated with BP. Experiments employed a Learned-Frozen Decoupled Routing strategy, dynamically partitioning the input space into exclusive areas and directing samples to specialized parameter subspaces. Unlike Dynamic Sparse Training methods that sparsify weights based on magnitude, SAL activates and updates only a single area per sample, promoting global sparsity and reducing computational cost.

The team engineered this approach to isolate activation paths, thereby reducing gradient interference and enabling the use of fixed, asymmetric feedback connections. This configuration is particularly beneficial in deep networks, mitigating stability issues commonly observed in Feedback Alignment (FA) networks. The study pioneered an asymmetric error propagation scheme integrated with local learning signals, allowing each layer to optimise against localised alignment objectives. This layer-wise decoupling of update rules eliminates the need for inter-layer synchronisation, structurally enabling parallel parameter updates.

Furthermore, the method achieved numerical consistency and competitive accuracy even in deep regimes, scaling to models with up to 128 layers and 1 billion parameters. This innovative methodology enables scalable neural network training, offering a plausible alternative inspired by biological learning mechanisms and contributing to the understanding of efficient and robust deep learning systems. Experiments revealed that on the Digits dataset, SAL-16 achieved an accuracy of 71.63%, a 33.1 percentage point increase compared to the 38.53% attained by the baseline model. Similarly, on the Semeion dataset, SAL-16 reached 72.03%, representing a 36.87 percentage point improvement over the baseline’s 35.16%. These results suggest SAL effectively captures intrinsic data features, enhancing the model’s representation and classification capabilities.

The team measured convergence rates and classification performance using two-layer fully connected networks with a hidden layer size of 256. Models were trained for 25 epochs using stochastic gradient descent with a batch size of 16 and a learning rate of 0.0001, with results averaged over five independent runs. Varying the number of areas, denoted as nareas, in the first layer from 1 to 16, researchers observed a positive correlation between nareas and performance across datasets including CIFAR-10, Digits, and MNIST. This indicates that tuning the nareas parameter allows the model to effectively leverage its structural capacity.

Further investigations into network depth revealed that SAL maintains stability even at greater depths, where baseline models often degrade. Using the Semeion dataset, the baseline model showed steady performance improvements with increasing depth, while SAL exhibited performance scaling proportional to model depth. Specifically, tests with depths of 4, 16, 64, and 128 layers, employing residual connections and tanh activation functions, demonstrated SAL’s resilience. The performance gap between SAL and the baseline increased with network depth, indicating SAL’s ability to mitigate optimization difficulties.

Measurements confirm that SAL also excels in large-scale models. Experiments varying network width, with hidden layer dimensions of 1024, 2048, and 4096, showed consistent improvements in convergence and learning efficiency. At a width of 4096, the SAL method involved approximately 1 billion activation parameters and over 4 billion total parameters, yet still delivered superior performance on datasets like Semeion. Data shows SAL consistently improves the convergence upper bound and learning efficiency across various network widths, demonstrating its scalability and adaptability to complex data structures.

👉 More information
🗞 SAL: Selective Adaptive Learning for Backpropagation-Free Training with Sparsification
🧠 ArXiv: https://arxiv.org/abs/2601.21561

Selective learning decouples gradients for neural networks

Selective learning via decoupled gradient alignment improves generalization

Share This

colind88

Related Posts

REACH OUT!