llama1-3 模型结构详解 - Zhang #209
Replies: 2 comments
-
d 输入 tokens 的数量,大小为 batch_size * seq_len |
Beta Was this translation helpful? Give feedback.
0 replies
-
MQA 和 GQA如果这样可以减少KV内存,不就意味着Q和KV的隐藏层长度是不一样的,一般的attention里面,QKV的linea layer是一样大小的,MQA 和 GQA的话,是不是Q和KV的linear layer是不一样大小的。 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
llama1-3 模型结构详解 - Zhang
从事 LLM 推理部署、视觉算法开发、模型压缩部署以及算法SDK开发工作,终身学习践行者。Transformerllama1-3 模型结构代码如何实现,模型结构分析。
https://www.armcvai.cn/2024-10-21/llama1-3-model.html
Beta Was this translation helpful? Give feedback.
All reactions