You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
Michael Yang 291bb97e3d client request options 3 years ago
api client request options 3 years ago
app quit sooner with single instance lock 3 years ago
cmd generate progress 3 years ago
docs Move python docs to separate file 3 years ago
llama remove prompt cache 3 years ago
server generate progress 3 years ago
web set version at build time 3 years ago
.dockerignore update `Dockerfile` 3 years ago
.gitignore use `Makefile` for dependency building instead of `go generate` 3 years ago
.prettierrc.json move .prettierrc.json to root 3 years ago
Dockerfile fix dockerfile 3 years ago
LICENSE `proto` -> `ollama` 3 years ago
Makefile update app to use go binary 3 years ago
README.md update api documentation 3 years ago
go.mod progress 3 years ago
go.sum progress 3 years ago
main.go add llama.cpp go bindings 3 years ago
models.json Update models.json 3 years ago

README.md

ollama

Ollama

Run large language models with llama.cpp.

Note: certain models that can be run with this project are intended for research and/or non-commercial use only.

Features

  • Download and run popular large language models
  • Switch between multiple models on the fly
  • Hardware acceleration where available (Metal, CUDA)
  • Fast inference server written in Go, powered by llama.cpp
  • REST API to use with your application (python, typescript SDKs coming soon)

Install

  • Download for macOS
  • Download for Windows (coming soon)
  • Docker: docker run -p 11434:11434 ollama/ollama

You can also build the binary from source.

Quickstart

Run the model that started it all.

ollama run llama

Example models

💬 Chat

Have a conversation.

ollama run vicuna "Why is the sky blue?"

🗺️ Instructions

Ask questions. Get answers.

ollama run orca "Write an email to my boss."

👩‍💻 Code completion

Sometimes you just need a little help writing code.

ollama run replit "Give me react code to render a button"

📖 Storytelling

Venture into the unknown.

ollama run nous-hermes "Once upon a time"

Advanced usage

Run a local model

ollama run ~/Downloads/vicuna-7b-v1.3.ggmlv3.q4_1.bin

Building

make

To run it start the server:

./ollama server &

Finally, run a model!

./ollama run ~/Downloads/vicuna-7b-v1.3.ggmlv3.q4_1.bin

API Reference

POST /api/pull

Download a model

curl -X POST http://localhost:11343/api/pull -d '{"model": "orca"}'

POST /api/generate

Complete a prompt

curl -X POST http://localhost:11434/api/generate -d '{"model": "orca", "prompt": "hello!", "stream": true}'