Input Parameters
Hidden size:
Intermediate size:
Vocab size:
Number of key-value heads:
Number of attention (query) heads:
Number of hidden layers:
Include bias?
No
Yes
Is MOE ?
No
Yes
Total expert number :
Total active experts (shared + specifics):
Calculate
Transformer total number of parameters Calculator
Enter model hyperparameters in the console and press calculate (curently working for classic transformer/LLM architecture with GQA and GLU)