ollama

Commit Graph

Author	SHA1	Message	Date
Daniel Hiltgen	d88c527be3	Build multiple CPU variants and pick the best This reduces the built-in linux version to not use any vector extensions which enables the resulting builds to run under Rosetta on MacOS in Docker. Then at runtime it checks for the actual CPU vector extensions and loads the best CPU library available	2 years ago
Daniel Hiltgen	052b33b81b	DRY out the Dockefile.build	2 years ago
Daniel Hiltgen	8da7bef05f	Support multiple variants for a given llm lib type In some cases we may want multiple variants for a given GPU type or CPU. This adds logic to have an optional Variant which we can use to select an optimal library, but also allows us to try multiple variants in case some fail to load. This can be useful for scenarios such as ROCm v5 vs v6 incompatibility or potentially CPU features.	2 years ago
Jeffrey Morgan	b24e8d17b2	Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896 ) * increase minimum cuda overhead and fix minimum overhead for multi-gpu * fix multi gpu overhead * limit overhead to 10% of all gpus * better wording * allocate fixed amount before layers * fixed only includes graph alloc	2 years ago
Jeffrey Morgan	f83881390f	revert submodule back to `328b83de23b33240e28f4e74900d1d06726f5eb1`	2 years ago
Daniel Hiltgen	ac70ab6761	Merge pull request #1914 from dhiltgen/smarter_cuda_detection Smarter GPU Management library detection	2 years ago
Daniel Hiltgen	3c49c3ab0d	Harden GPU mgmt library lookup When there are multiple management libraries installed on a system not every one will be compatible with the current driver. This change improves our management library algorithm to build up a set of discovered libraries based on glob patterns, and then try all of them until we're able to load one without error.	2 years ago
Daniel Hiltgen	9754ae4c89	Support optional override of the target archictures This can help speed up incremental builds when you're only testing one archicture, like amd64. E.g. BUILD_ARCH=amd64 ./scripts/build_linux.sh && scp ./dist/ollama-linux-amd64 test-system:	2 years ago
Jeffrey Morgan	224fbf2795	update submodule to commit `1fc2f265ff9377a37fd2c61eae9cd813a3491bea` until its main branch is fixed	2 years ago
Jeffrey Morgan	2c6e8f5248	Update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6` (#1885 ) * update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6` * unblock condition variable in `update_slots` when closing server	2 years ago
Jeffrey Morgan	34344d801c	clean up cmake `build` directory when cross compiling macOS builds	2 years ago
Robin Glauser	e868c8a5c7	Update api.md (#1878 ) Fixed assistant in the example response.	2 years ago
Jeffrey Morgan	c336693f07	calculate overhead based number of gpu devices (#1875 )	2 years ago
Daniel Hiltgen	e89dc1d54b	Merge pull request #1874 from dhiltgen/correct_cuda_min Set corret CUDA minimum compute capability version	2 years ago
Daniel Hiltgen	1961a81f03	Set corret CUDA minimum compute capability version If you attempt to run the current CUDA build on compute capability 5.2 cards, you'll hit the following failure: cuBLAS error 15 at ggml-cuda.cu:7956: the requested functionality is not supported	2 years ago
Jeffrey Morgan	8a8c7e7f8d	only build for metal on `arm64`	2 years ago
Jeffrey Morgan	6df83e6daa	update rough cuda overhead estimate to 15% + 384MiB	2 years ago
Michael Yang	62023177f6	Merge pull request #1614 from jmorganca/mxyng/fix-set-template fix: set template without triple quotes	2 years ago
Jeffrey Morgan	6164f378f2	revert cuda overhead to 20%	2 years ago
Jeffrey Morgan	f387e9631b	use runner if cuda alloc won't fit	2 years ago
Jeffrey Morgan	6566387ae3	add `TODO` for cuda overhead	2 years ago
Jeffrey Morgan	37708931fb	update cuda overhead to 20% to fix crashes when switching between models and large context sizes	2 years ago
Jeffrey Morgan	f6cb0a553c	update cuda overhead to 15% or 400MiB	2 years ago
Jeffrey Morgan	2680078c13	fix build on linux	2 years ago
Jeffrey Morgan	f1b7e5f560	update overhead to 15%	2 years ago
Jeffrey Morgan	cb534e6ac2	use 10% vram overhead for cuda	2 years ago
Jeffrey Morgan	58ce2d8273	better estimate scratch buffer size	2 years ago
Jeffrey Morgan	18ddf6d57d	fix windows build	2 years ago
Michael Yang	61e6502449	Merge pull request #1818 from jmorganca/mxyng/fix-alt-prompt fix(cmd): history in alt prompt	2 years ago
Jeffrey Morgan	08f1e18965	Offload layers to GPU based on new model size estimates (#1850 ) * select layers based on estimated model memory usage * always account for scratch vram * dont load +1 layers * better estmation for graph alloc * Update gpu/gpu_darwin.go Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go * add overhead for cuda memory * Update llm/llm.go Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com> * fix build error on linux * address comments --------- Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>	2 years ago
Bruce MacDonald	7e8f7c8358	remove ggml automatic re-pull (#1856 )	2 years ago
Bruce MacDonald	3f3eb19a3b	document response in modelfile template variables (#1428 )	2 years ago
Daniel Hiltgen	059ae4585e	Merge pull request #1834 from dhiltgen/old_cuda Detect very old CUDA GPUs and fall back to CPU	2 years ago
Daniel Hiltgen	6347f501ca	Merge pull request #1828 from dhiltgen/fix_llava Accept windows paths for image processing	2 years ago
Jeffrey Morgan	5feec959ad	dont use `-Wall` in static build (#1833 )	2 years ago
Jeffrey Morgan	dbdd50b283	add `-DCMAKE_SYSTEM_NAME=Darwin` cmake flag (#1832 )	2 years ago
Daniel Hiltgen	d74ce6bd4f	Detect very old CUDA GPUs and fall back to CPU If we try to load the CUDA library on an old GPU, it panics and crashes the server. This checks the compute capability before we load the library so we can gracefully fall back to CPU mode.	2 years ago
Guilherme Baptista	57942b4676	Update README.md - Community Integrations - Ollama for Ruby (#1830 )	2 years ago
Daniel Hiltgen	e0d05b0f1e	Accept windows paths for image processing This enhances our regex to support windows style paths. The regex will match invalid path specifications, but we'll still validate file existence and filter out mismatches	2 years ago
Daniel Hiltgen	2d9dd14f27	Merge pull request #1697 from dhiltgen/win_docs Add windows native build instructions	2 years ago
Jeffrey Morgan	1caa56128f	add cuda lib path for nvidia container toolkit	2 years ago
Michael Yang	0101e76dbe	Merge pull request #1797 from sublimator/nd-allow-extension-origins-still-needs-explicit-listing-2024-01-05 fix: allow extension origins (still needs explicit listing), fixes #1686	2 years ago
Michael Yang	2ef9352b94	fix(cmd): history in alt mode	2 years ago
Michael Yang	5580ae2472	fix: set template without triple quotes	2 years ago
Bruce MacDonald	3a9f447141	only pull gguf model if already exists (#1817 )	2 years ago
Patrick Devine	9c2941e61b	switch api for ShowRequest to use the name field (#1816 )	2 years ago
Patrick Devine	238ac5e765	Add unit tests for Parser (#1815 )	2 years ago
Bruce MacDonald	4f4980b66b	simplify ggml update logic (#1814 ) - additional information is now available in show response, use this to pull gguf before running - make gguf updates cancellable	2 years ago
Patrick Devine	22e93efa41	add show info command and fix the modelfile	2 years ago
Patrick Devine	2909dce894	split up interactive generation	2 years ago

1 2 3 4 5 ...

1760 Commits (d88c527be392ff4a05648f6e2cbd8f69241714ca) All Branches Search

1760 Commits (d88c527be392ff4a05648f6e2cbd8f69241714ca)

All Branches