1760 Commits (d88c527be392ff4a05648f6e2cbd8f69241714ca)
 

Author SHA1 Message Date
Daniel Hiltgen d88c527be3 Build multiple CPU variants and pick the best 2 years ago
Daniel Hiltgen 052b33b81b DRY out the Dockefile.build 2 years ago
Daniel Hiltgen 8da7bef05f Support multiple variants for a given llm lib type 2 years ago
Jeffrey Morgan b24e8d17b2
Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896) 2 years ago
Jeffrey Morgan f83881390f revert submodule back to `328b83de23b33240e28f4e74900d1d06726f5eb1` 2 years ago
Daniel Hiltgen ac70ab6761
Merge pull request #1914 from dhiltgen/smarter_cuda_detection 2 years ago
Daniel Hiltgen 3c49c3ab0d Harden GPU mgmt library lookup 2 years ago
Daniel Hiltgen 9754ae4c89 Support optional override of the target archictures 2 years ago
Jeffrey Morgan 224fbf2795 update submodule to commit `1fc2f265ff9377a37fd2c61eae9cd813a3491bea` until its main branch is fixed 2 years ago
Jeffrey Morgan 2c6e8f5248
Update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6` (#1885) 2 years ago
Jeffrey Morgan 34344d801c clean up cmake `build` directory when cross compiling macOS builds 2 years ago
Robin Glauser e868c8a5c7
Update api.md (#1878) 2 years ago
Jeffrey Morgan c336693f07
calculate overhead based number of gpu devices (#1875) 2 years ago
Daniel Hiltgen e89dc1d54b
Merge pull request #1874 from dhiltgen/correct_cuda_min 2 years ago
Daniel Hiltgen 1961a81f03 Set corret CUDA minimum compute capability version 2 years ago
Jeffrey Morgan 8a8c7e7f8d only build for metal on `arm64` 2 years ago
Jeffrey Morgan 6df83e6daa update rough cuda overhead estimate to 15% + 384MiB 2 years ago
Michael Yang 62023177f6
Merge pull request #1614 from jmorganca/mxyng/fix-set-template 2 years ago
Jeffrey Morgan 6164f378f2 revert cuda overhead to 20% 2 years ago
Jeffrey Morgan f387e9631b use runner if cuda alloc won't fit 2 years ago
Jeffrey Morgan 6566387ae3 add `TODO` for cuda overhead 2 years ago
Jeffrey Morgan 37708931fb update cuda overhead to 20% to fix crashes when switching between models and large context sizes 2 years ago
Jeffrey Morgan f6cb0a553c update cuda overhead to 15% or 400MiB 2 years ago
Jeffrey Morgan 2680078c13 fix build on linux 2 years ago
Jeffrey Morgan f1b7e5f560 update overhead to 15% 2 years ago
Jeffrey Morgan cb534e6ac2 use 10% vram overhead for cuda 2 years ago
Jeffrey Morgan 58ce2d8273 better estimate scratch buffer size 2 years ago
Jeffrey Morgan 18ddf6d57d fix windows build 2 years ago
Michael Yang 61e6502449
Merge pull request #1818 from jmorganca/mxyng/fix-alt-prompt 2 years ago
Jeffrey Morgan 08f1e18965
Offload layers to GPU based on new model size estimates (#1850) 2 years ago
Bruce MacDonald 7e8f7c8358
remove ggml automatic re-pull (#1856) 2 years ago
Bruce MacDonald 3f3eb19a3b
document response in modelfile template variables (#1428) 2 years ago
Daniel Hiltgen 059ae4585e
Merge pull request #1834 from dhiltgen/old_cuda 2 years ago
Daniel Hiltgen 6347f501ca
Merge pull request #1828 from dhiltgen/fix_llava 2 years ago
Jeffrey Morgan 5feec959ad
dont use `-Wall` in static build (#1833) 2 years ago
Jeffrey Morgan dbdd50b283
add `-DCMAKE_SYSTEM_NAME=Darwin` cmake flag (#1832) 2 years ago
Daniel Hiltgen d74ce6bd4f Detect very old CUDA GPUs and fall back to CPU 2 years ago
Guilherme Baptista 57942b4676
Update README.md - Community Integrations - Ollama for Ruby (#1830) 2 years ago
Daniel Hiltgen e0d05b0f1e Accept windows paths for image processing 2 years ago
Daniel Hiltgen 2d9dd14f27
Merge pull request #1697 from dhiltgen/win_docs 2 years ago
Jeffrey Morgan 1caa56128f add cuda lib path for nvidia container toolkit 2 years ago
Michael Yang 0101e76dbe
Merge pull request #1797 from sublimator/nd-allow-extension-origins-still-needs-explicit-listing-2024-01-05 2 years ago
Michael Yang 2ef9352b94 fix(cmd): history in alt mode 2 years ago
Michael Yang 5580ae2472 fix: set template without triple quotes 2 years ago
Bruce MacDonald 3a9f447141
only pull gguf model if already exists (#1817) 2 years ago
Patrick Devine 9c2941e61b
switch api for ShowRequest to use the name field (#1816) 2 years ago
Patrick Devine 238ac5e765
Add unit tests for Parser (#1815) 2 years ago
Bruce MacDonald 4f4980b66b
simplify ggml update logic (#1814) 2 years ago
Patrick Devine 22e93efa41 add show info command and fix the modelfile 2 years ago
Patrick Devine 2909dce894 split up interactive generation 2 years ago