Quoting Andrej Karpathy
Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with $8/hour/TPUv3 back then, for a total cost of approx. $43K. It achieves 0.256525 CORE score, which is an
Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with $8/hour/TPUv3 back then, for a total cost of approx. $43K. It achieves 0.256525 CORE score, which is an ensemble metric introduced in the DCLM paper over 22 evaluations like ARC/MMLU/etc.
As of the last few improvements merged into nanochat (many of them originating in modded-nanogpt repo), I can now reach a higher CORE score in 3.04 hours (~$73) on a single 8XH100 node. This is a 600X cost reduction over 7 years, i.e. the cost to train GPT-2 is falling approximately 2.5X every year.
— Andrej Karpathy (https://twitter.com/karpathy/status/2017703360393318587)
Tags: andrej-karpathy (https://simonwillison.net/tags/andrej-karpathy), gpt-2 (https://simonwillison.net/tags/gpt-2), generative-ai (https://simonwillison.net/tags/generative-ai), ai (https://simonwillison.net/tags/ai), llms (https://simonwillison.net/tags/llms), openai (https://simonwillison.net/tags/openai)
No comments yet.