机器学习漫谈(2)–深度学习泛化误差界的分析

深度学习泛化界的分析方法

在机器学习漫谈(1) 中我们提到了二分类问题的泛化界, 但是对于一个具体的方法,分析它的假设空间误差界是一件挺复杂的事情,特别是对于像神经网络这样通过SGD 方法来优化得到$\hat{f}$的就更加困难.

对于深度学习而言主要的难点有两个

巨大的参数体量
深度网络的优化和泛化界有很强的关联性

第一个问题直接导致了网络是over-parameterized, 那么它的假设空间也是相当巨大的,对于训练数据基本上可以做到完美的拟合. 这对于传统的泛化界分析方法(VC-dimension)是难以处理的.

第二个问题就是通过深度网络得到的$\hat{f}$ 往往很依赖算法(SGD),而初始化,以及参数的动态更新机制都会影响最终得到的$\hat{f}$.

深度学习泛化界的分析方法

Parameter norm-based

基于范数的泛化界限是一类代表性结果，使用参数范数来控制假设空间的Rademacher复杂度。这些界限不明确地依赖于参数数量。已经提出和使用了各种参数范数，例如

路径范数¹^,²^,³^,⁴
$l^{p,q}$范数⁵
谱范数⁶
Fisher-Rao范数⁷^,⁸

Uniform stability-based

“统一稳定性方法”也被广泛用于推导泛化界限⁹^,¹⁰ 。这些界限，取决于优化轨迹，通常采用“训练速度更快，泛化效果更好”的形式¹¹。然而，统一稳定性理论的应用通常依赖于损失函数的光滑性假设，而最近的研究表明，即使在合理的步长下，DNN上的梯度下降也不能使用（即使是局部的）L-光滑性进行分析¹²。虽然有尝试绕过L-光滑性条件，但目前的分析只能在SGLD¹³^,¹⁴^,¹⁵上进行，而不能在真正的GD或SGD¹⁶上进行。

Other generalization theories

另一个值得注意的泛化界限研究领域采用信息论¹⁷。用互信息（MI）被用来衡量深度学习模型和算法的信息传输和信息损失。此外，用于限制泛化误差的其他技术和方法包括模型压缩、边缘理论、路径长度估计和优化算法的线性稳定性。

Footnotes:

Chao Ma, Lei Wu, and Weinan E. A priori estimates of the population risk for two-layer neural networks. arXiv preprint arXiv:1810.06397, 2018.

Chao Ma, Qingcan Wang, and Weinan E. A priori estimates of the population risk for residual networks. arXiv preprint arXiv:1903.02154, 2019.

Zhong Li, Chao Ma, and Lei Wu. Complexity measures for neural networks with general activation functions using path-based norms. arXiv preprint arXiv:2009.06132, 2020.

⁴

Weinan E, Chao Ma, and Lei Wu. The barron space and the flow-induced function spaces for neural network models. Constructive Approximation, pages 1–38, 2021.

⁵

Noah Golowich, Alexander Rakhlin, and Ohad Shamir. Size-independent sample complexity of neural networks. In Conference On Learning Theory, pages 297–299. PMLR, 2018.

⁶

Peter Bartlett, Dylan J Foster, and Matus Telgarsky. Spectrally-normalized margin bounds for neural networks. arXiv preprint arXiv:1706.08498, 2017.

⁷

Tengyuan Liang, Tomaso Poggio, Alexander Rakhlin, and James Stokes. Fisher-rao metric, geometry, and complexity of neural networks. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 888–896. PMLR, 2019.

⁸

Zhuozhuo Tu, Fengxiang He, and Dacheng Tao. Understanding generalization in recurrent neural networks. In International Conference on Learning Representations, 2019.

⁹

William H Rogers and Terry J Wagner. A finite sample distribution-free performance bound for local discrimination rules. The Annals of Statistics, pages 506–514, 1978.

¹⁰

Olivier Bousquet, Yegor Klochkov, and Nikita Zhivotovskiy. Sharper bounds for uniformly stable algorithms. In Conference on Learning Theory, pages 610–626. PMLR, 2020.

¹¹

Moritz Hardt, Ben Recht, and Yoram Singer. Train faster, generalize better: Stability of stochastic gradient descent. In International Conference on Machine Learning, pages 1225– 1234. PMLR, 2016.

¹²

Jeremy M Cohen, Simran Kaur, Yuanzhi Li, J Zico Kolter, and Ameet Talwalkar. Gradient descent on neural networks typically occurs at the edge of stability. arXiv preprint arXiv:2103.00065, 2021.

¹³

Maxim Raginsky, Alexander Rakhlin, and Matus Telgarsky. Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis. In Conference on Learning Theory, pages 1674–1703. PMLR, 2017.

¹⁴

Max Welling and Yee W Teh. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681–688. Citeseer, 2011.

¹⁵

Yuchen Zhang, Percy Liang, and Moses Charikar. A hitting time analysis of stochastic gradient langevin dynamics. In Conference on Learning Theory, pages 1980–2022. PMLR, 2017.

¹⁶

Wenlong Mou, Liwei Wang, Xiyu Zhai, and Kai Zheng. Generalization bounds of sgld for non-convex learning: Two theoretical viewpoints. In Conference on Learning Theory, pages 605–638. PMLR, 2018.

¹⁷

Ravid Shwartz-Ziv and Naftali Tishby. Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810, 2017.

Table of Contents

深度学习泛化界的分析方法

Parameter norm-based

Uniform stability-based

Other generalization theories

Footnotes:

发送评论编辑评论

Table of Contents

深度学习泛化界的分析方法

Parameter norm-based

Uniform stability-based

Other generalization theories

Footnotes:

发送评论 编辑评论

发送评论编辑评论