机器学习漫谈(2)–深度学习泛化误差界的分析

1920px-Kernel_Machine.svg.png

机器学习漫谈(1) 中我们提到了二分类问题的泛化界, 但是对于一个具体的方法,分析它的假设空间误差界是一件挺复杂的事情,特别是对于像神经网络这样通过SGD 方法来优化得到$\hat{f}$的就更加困难.

对于深度学习而言主要的难点有两个

  1. 巨大的参数体量
  2. 深度网络的优化和泛化界有很强的关联性

第一个问题直接导致了网络是over-parameterized, 那么它的假设空间也是相当巨大的,对于训练数据基本上可以做到完美的拟合. 这对于传统的泛化界分析方法(VC-dimension)是难以处理的.

第二个问题就是通过深度网络得到的$\hat{f}$ 往往很依赖算法(SGD),而初始化,以及参数的动态更新机制都会影响最终得到的$\hat{f}$.

深度学习泛化界的分析方法

Parameter norm-based

基于范数的泛化界限是一类代表性结果,使用参数范数来控制假设空间的Rademacher复杂度。这些界限不明确地依赖于参数数量。已经提出和使用了各种参数范数,例如

  • 路径范数1, 2, 3, 4
  • $l^{p,q}$范数5
  • 谱范数6
  • Fisher-Rao范数7, 8

Uniform stability-based

“统一稳定性方法”也被广泛用于推导泛化界限9, 10 。这些界限,取决于优化轨迹,通常采用“训练速度更快,泛化效果更好”的形式11。然而,统一稳定性理论的应用通常依赖于损失函数的光滑性假设,而最近的研究表明,即使在合理的步长下,DNN上的梯度下降也不能使用(即使是局部的)L-光滑性进行分析12。虽然有尝试绕过L-光滑性条件,但目前的分析只能在SGLD13, 14, 15上进行,而不能在真正的GD或SGD16上进行。

Other generalization theories

另一个值得注意的泛化界限研究领域采用信息论17。用互信息(MI)被用来衡量深度学习模型和算法的信息传输和信息损失。此外,用于限制泛化误差的其他技术和方法包括模型压缩、边缘理论、路径长度估计和优化算法的线性稳定性。

Footnotes:

1

Chao Ma, Lei Wu, and Weinan E. A priori estimates of the population risk for two-layer neural networks. arXiv preprint arXiv:1810.06397, 2018.

2

Chao Ma, Qingcan Wang, and Weinan E. A priori estimates of the population risk for residual networks. arXiv preprint arXiv:1903.02154, 2019.

3

Zhong Li, Chao Ma, and Lei Wu. Complexity measures for neural networks with general activation functions using path-based norms. arXiv preprint arXiv:2009.06132, 2020.

4

Weinan E, Chao Ma, and Lei Wu. The barron space and the flow-induced function spaces for neural network models. Constructive Approximation, pages 1–38, 2021.

5

Noah Golowich, Alexander Rakhlin, and Ohad Shamir. Size-independent sample complexity of neural networks. In Conference On Learning Theory, pages 297–299. PMLR, 2018.

6

Peter Bartlett, Dylan J Foster, and Matus Telgarsky. Spectrally-normalized margin bounds for neural networks. arXiv preprint arXiv:1706.08498, 2017.

7

Tengyuan Liang, Tomaso Poggio, Alexander Rakhlin, and James Stokes. Fisher-rao metric, geometry, and complexity of neural networks. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 888–896. PMLR, 2019.

8

Zhuozhuo Tu, Fengxiang He, and Dacheng Tao. Understanding generalization in recurrent neural networks. In International Conference on Learning Representations, 2019.

9

William H Rogers and Terry J Wagner. A finite sample distribution-free performance bound for local discrimination rules. The Annals of Statistics, pages 506–514, 1978.

10

Olivier Bousquet, Yegor Klochkov, and Nikita Zhivotovskiy. Sharper bounds for uniformly stable algorithms. In Conference on Learning Theory, pages 610–626. PMLR, 2020.

11

Moritz Hardt, Ben Recht, and Yoram Singer. Train faster, generalize better: Stability of stochastic gradient descent. In International Conference on Machine Learning, pages 1225– 1234. PMLR, 2016.

12

Jeremy M Cohen, Simran Kaur, Yuanzhi Li, J Zico Kolter, and Ameet Talwalkar. Gradient descent on neural networks typically occurs at the edge of stability. arXiv preprint arXiv:2103.00065, 2021.

13

Maxim Raginsky, Alexander Rakhlin, and Matus Telgarsky. Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis. In Conference on Learning Theory, pages 1674–1703. PMLR, 2017.

14

Max Welling and Yee W Teh. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681–688. Citeseer, 2011.

15

Yuchen Zhang, Percy Liang, and Moses Charikar. A hitting time analysis of stochastic gradient langevin dynamics. In Conference on Learning Theory, pages 1980–2022. PMLR, 2017.

16

Wenlong Mou, Liwei Wang, Xiyu Zhai, and Kai Zheng. Generalization bounds of sgld for non-convex learning: Two theoretical viewpoints. In Conference on Learning Theory, pages 605–638. PMLR, 2018.

17

Ravid Shwartz-Ziv and Naftali Tishby. Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810, 2017.

暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇