Generalization Bounds for Two-Layer Neural Networks with ReLU
A rigorous mathematical examination of generalization bounds for two-layer neural networks with ReLU activation. This project derives tight, width-independent complexity bounds by exploiting the scale-invariant properties of ReLU networks, moving beyond traditional empirical Rademacher complexity measures that fail for overparameterized models.
Overview
This rigorous mathematical exercise derives and analyzes generalization bounds for two-layer neural networks with ReLU activation, progressively building from naive bounds to tight, width-independent complexity measures:
- Naive bound: Classical empirical Rademacher complexity approach (demonstrates why standard bounds fail for overparameterized models)
- Symmetrization inequality: Specialized for ReLU networks, exploiting structure
- Scale-invariant complexity measure: Leverages ReLU positive homogeneity property
- Width-independent bound: Final tight generalization bound that holds regardless of network width
Theoretical Foundation
Two-layer neural networks computing functions of the form:
where is the ReLU activation function and parameters are bounded by and .
Part A: Naive Width-Dependent Bound
Starting with standard empirical Rademacher complexity analysis:
This bound demonstrates the fundamental problem: it depends linearly on network width , making it vacuous for overparameterized networks where . This gap between theory and practice motivated the subsequent refinements.
Part B: Symmetrization Inequality for ReLU
Key insight exploiting ReLU structure: Derivation uses:
- ReLU decomposition:
- Distributional symmetry: and are identically distributed
- Rademacher averaging: Cancellation of terms through symmetry arguments
This establishes tighter bounds by leveraging the activation function's inherent structure.
Part C: Scale-Invariant Complexity Measure
Leveraging the positive homogeneity of ReLU ( for ), we introduce scale-invariant parameterization:
This transformation yields a width-independent bound:
where the complexity no longer scales linearly with , making it meaningful for overparameterized networks.
Course & Academic Context
- Course: Learning Theory: Two-Layer Neural Networks (Exam)
- Institution: Université Paris-Dauphine -- PSL, Department of MIDO
- Supervisor: Katia Meziani
- Program: Master 2 ISF (Initial Track)
- Academic Year: 2025/2026