Input Parameters

Hidden size:

Intermediate size:

Vocab size:

Number of key-value heads:

Number of attention (query) heads:

Number of hidden layers:

Include bias?

Is MOE ?

Total expert number :

Total active experts (shared + specifics):

Transformer total number of parameters Calculator

Input Parameters

Transformer total number of parameters Calculator

Enter model hyperparameters in the console and press calculate (curently working for classic transformer/LLM architecture with GQA and GLU)