4366 Commits (a3f7dd3e98df803695f1ae165bc61a1b52142449)
 

Author SHA1 Message Date
Parth Sareen aea6fb9b58
tools: remove newline stripping (#10869) 10 months ago
RAPID ARCHITECT 012cf65340
readme: add AWS Strands Agents SDK example to community integrations (#10865) 10 months ago
Min Yoo a45231af47
readme: Add macLlama to community integrations (#10790) 10 months ago
Daniel Hiltgen 2307fc2bcd
tests: drop llama3.2-vision embedding tests (#10837) 10 months ago
frob 6623898198
docs: remove unsupported quantizations (#10842) 10 months ago
frob eda472df1b
server: add hint to the error message when model path access fails (#10843) 10 months ago
Jesse Gross f18e0cb550 ml: Improve slog formatting for BackendMemory 10 months ago
Parth Sareen e8b981fa5d
tools: refactor tool call parsing and enable streaming (#10415) 10 months ago
Parth Sareen 884d26093c
llama: add minimum memory for grammar (#10820) 10 months ago
Jesse Gross 1f371ea92f ml: Panic rather than return error on tensor allocation failure 11 months ago
Jesse Gross 73d6a82cce ollamarunner: Memory usage reporting 12 months ago
Jesse Gross 6db8a3771c ggml: Report graph memory for failed allocations 11 months ago
Daniel Hiltgen d950ff12c0
sched: fix runner leak during reloading unload (#10819) 10 months ago
Michael Yang adff143bcd
fix: mllama quality (#10807) 10 months ago
Bruce MacDonald fbe6ae285a
server: improve tensor quantization fallback logic (#10806) 10 months ago
Daniel Hiltgen fdd4d479a3
integration: add qwen2.5-vl (#10815) 10 months ago
Michael Yang 61aeaf7e81
remove support for multiple ggufs in a single file (#10722) 10 months ago
Daniel Hiltgen 7359b02707
win: detect background upgrade in progress (#10785) 10 months ago
Michael Yang c890011322
feat: port qwen2 model (#10782) 10 months ago
Michael Yang e0ed984cde
feat: qwen3 dense and sparse models (#10708) 10 months ago
Michael Yang 139f84cf21
fix cmakelists (#10804) 10 months ago
Michael Yang 375839ea2d
chore: disable debug in binary libraries (#10788) 10 months ago
Michael Yang 69b2fe9282
fix: qwen25vl assign samebatch in multimodal input (#10789) 10 months ago
Michael Yang 9ed8bf14cb
ml: add more rope options (#10775) 10 months ago
DarkCaster e6a800ca11
llama: fix incorrect initialization of C.struct_common_sampler_cparams.penalty_present (#10779) 10 months ago
Michael Yang ff180c3466
fix llama and mistral3 models (#10774) 11 months ago
Jesse Gross 3fe74fba42 llm: Use first layer as memory buffer in estimation 11 months ago
Daniel Hiltgen 1a0cfd080a
avoid kv truncation during create (#10761) 11 months ago
Jesse Gross 94ab428e3f ggml: Seperate tensor load from backend creation 12 months ago
Jesse Gross d755577473 llm: Estimate projector memory correctly for Ollama engine 11 months ago
Jesse Gross a2cc8571c5 llm: Consistently track unassigned model data 11 months ago
Ronald Wilson 7edfdd2f5f
readme: add TinyNotepad to community integrations (#10763) 11 months ago
Michael Yang 333e360422
model: handle multiple eos tokens (#10577) 11 months ago
Daniel Hiltgen 27da2cddc5
Fix lingering Q4_0 help reference (#10720) 11 months ago
Bruce MacDonald feb8923ada
cmd: add ellipses to truncated show metadata (#10717) 11 months ago
Jesse Gross fe623c2cf4 ollamarunner: Multi-modal worst case graph 12 months ago
Jesse Gross 3c14461d5d ollamarunner: Separate text and multimodal graphs 11 months ago
Jesse Gross 499ae7311f ollamarunner: Base cached tokens on current prompt 11 months ago
Michael Yang ef202789fa
fix pixel values padding (#10718) 11 months ago
Michael Yang 55760195e6
fix mllama conversion (#10716) 11 months ago
Bruce MacDonald bd68d3ae50
ggml: update qwen25vl vision size estimate (#10711) 11 months ago
Daniel Hiltgen ff80718e9c
fix crash in old clients with quantization progress (#10710) 11 months ago
Bruce MacDonald 0aa8b371dd
model: add Qwen2.5-VL support (#10385) 11 months ago
Michael Yang 23125648b8
chore: update mllama to use ollama engine (#10637) 11 months ago
tej 0478d440f0
Fixed over vram allcation dure to small initial layer sizes. 11 months ago
Parth Sareen 8cc33f4c2b
llama: fix memory leak for grammar (#10696) 11 months ago
Jeffrey Morgan f46df4e5d2
llama: fix defrag patch to defragment when no slots are available (#10695) 11 months ago
Daniel Hiltgen c6bcdc4223
Revert "remove cuda v11 (#10569)" (#10692) 11 months ago
Jeffrey Morgan 4b903f088a
llama: fix crash on snowflake embedding model (#10690) 11 months ago
Jeffrey Morgan c7f4ae7b9c
server: add webp image input support (#10653) 11 months ago