Source code: https://github.com/lopuhin/transformer-lm
Vocabulary size: 50000 tokens.
Model parameters:
{ "batch_size": 32, "epochs": 10, "g_accum_gradients": 1, "hparams": { "gradient_checkpointing": false, "n_ctx": 64, "n_embed": 768, "n_head": 12, "n_hidden": 768, "n_layer": 8, "n_vocab": 50000 }, "lr": 0.00025 }
© Anastasiia Lopukhina, Konstantin Lopukhin