ggml.ai joins Hugging Face to ensure the long-term progress of Local AI
ggml.ai joins Hugging Face to ensure the long-term progress of Local AI (https://github.com/ggml-org/llama.cpp/discussions/19759)
I don’t normally cover acquisition news like this, but I have some thoughts.
It’s hard to overstate the impact Georgi Gerganov has had on the local model space. Back in March 2023 his release of llama.cpp (https://github.com/ggml-org/llama.cpp) made it possible to run a local LLM on consumer hardware. The original README (https://github.com/ggml-org/llama.cpp/blob/775328064e69db1ebd7e19ccb59d2a7fa6142470/README.md?plain=1#L7) said:
The main goal is to run the model using 4-bit quantization on a MacBook. […] This was hacked in an evening - I have no idea if it works correctly.
I wrote about trying llama.cpp out at the time in Large language models are having their Stable Diffusion moment (https://simonwillison.net/2023/Mar/11/llama/#llama-cpp):
I used it to run the 7B LLaMA model on my laptop last night, and then this morning upgraded to the 13B model—the one that Facebook claim is competitive with GPT-3.
Meta’s original LLaMA release (https://github.com/meta-llama/llama/tree/llama_v1) depended on PyTorch and their FairScale (https://github.com/facebookresearch/fairscale) PyTorch extension for running on multiple GPUs, and required CUDA and NVIDIA hardware. Georgi’s work opened that up to a much wider range of hardware and kicked off the local model movement that has continued to grow since then.
Hugging Face are already responsible for the incredibly influential Transformers (https://github.com/huggingface/transformers) library used by the majority of LLM releases today. They’ve proven themselves a good steward for that open source project, which makes me optimistic for the future of llama.cpp and related projects.
This section from the announcement looks particularly promising:
Going forward, our joint efforts will be geared towards the following objectives:
• Towards seamless “single-click” integration with the transformers (https://github.com/huggingface/transformers) library. The transformers framework has established itself as the ‘source of truth’ for AI model definitions. Improving the compatibility between the transformers and the ggml ecosystems is essential for wider model support and quality control.
• Better packaging and user experience of ggml-based software. As we enter the phase in which local inference becomes a meaningful and competitive alternative to cloud inference, it is crucial to improve and simplify the way in which casual users deploy and access local models. We will work towards making llama.cpp ubiquitous and readily available everywhere, and continue partnering with great downstream projects.
Given the influence of Transformers, this closer integration could lead to model releases that are compatible with the GGML ecosystem out of the box. That would be a big win for the local model ecosystem.
I’m also excited to see investment in “packaging and user experience of ggml-based software”. This has mostly been left to tools like Ollama (https://ollama.com) and LM Studio (https://lmstudio.ai). ggml-org released LlamaBarn (https://github.com/ggml-org/LlamaBarn) last year - “a macOS menu bar app for running local LLMs” - and I’m hopeful that further investment in this area will result in more high quality open source tools for running local models from the team best placed to deliver them.
Via @ggerganov (https://twitter.com/ggerganov/status/2024839991482777976)
Tags: open-source (https://simonwillison.net/tags/open-source), transformers (https://simonwillison.net/tags/transformers), ai (https://simonwillison.net/tags/ai), generative-ai (https://simonwillison.net/tags/generative-ai), llama (https://simonwillison.net/tags/llama), local-llms (https://simonwillison.net/tags/local-llms), llms (https://simonwillison.net/tags/llms), hugging-face (https://simonwillison.net/tags/hugging-face), llama-cpp (https://simonwillison.net/tags/llama-cpp)