Robust Training

2024/5/6 19:32:06

文献阅读:DeepNet: Scaling Transformers to 1,000 Layers

文献阅读:DeepNet: Scaling Transformers to 1,000 Layers 1. 文章简介2. 核心技术点 1. DeepNet整体结构2. 参数初始化考察3. DeepNorm考察 3. 实验考察 1. 可行性考察2. 有效性考察 4. 结论 & 思考 文献链接:https://arxiv.org/abs/2203.00555 1…