48 Commits (main)

Author SHA1 Message Date
Michael Yang fcec04bf42
gptoss: fix memory calc (#11700) 8 months ago
Jesse Gross 8253ad4d2b ggml: Prevent kv cache quanitization on gpt-oss 8 months ago
Michael Yang fa7776fd24
gpt-oss (#11672) 8 months ago
Jeffrey Morgan ba04902670
fs/ggml: add multiplier in graph estimates (#11208) 9 months ago
Jeffrey Morgan 3944602f51
fs/ggml: add missing architecture to OllamaEngineRequired() (#11206) 9 months ago
Michael Yang 73b642e6f3
add new gemma model (#11204) 9 months ago
Michael Yang 0a066cfd91
Reapply "feat: incremental gguf parser (#10822)" (#11114) (#11119) 9 months ago
Jeffrey Morgan a6e64fbdf2
Revert "feat: incremental gguf parser (#10822)" (#11114) 10 months ago
Michael Yang a6fbfc880c
gguf: fix write order (#11068) 10 months ago
Michael Yang 6b04cad7e8
feat: incremental gguf parser (#10822) 10 months ago
Jesse Gross 94ab428e3f ggml: Seperate tensor load from backend creation 12 months ago
Bruce MacDonald bd68d3ae50
ggml: update qwen25vl vision size estimate (#10711) 11 months ago
Bruce MacDonald 0aa8b371dd
model: add Qwen2.5-VL support (#10385) 11 months ago
Michael Yang 23125648b8
chore: update mllama to use ollama engine (#10637) 11 months ago
Daniel Hiltgen 9d6df90805
Follow up to #10363 (#10647) 11 months ago
Daniel Hiltgen af31ccefc0
fix data race in WriteGGUF (#10598) 11 months ago
Daniel Hiltgen 424810450f
Move quantization to new backend (#10363) 11 months ago
Jesse Gross 7073600797 ggml: Reduce log level of "key not found" 11 months ago
Michael Yang a7835c6716
fix: write gguf padding (#10510) 11 months ago
Devon Rifkin 6ed8898590 ggml: fix crash for array head counts 11 months ago
Michael Yang f0ad49ea17 memory 11 months ago
Michael Yang f0c66e6dea llama4 1 year ago
Michael Yang ced7d0e53d fix parameter count 11 months ago
Michael Yang a0dba0f8ae default slice values 11 months ago
Michael Yang 5e20b170a7 update comment 11 months ago
Michael Yang d26c18e25c fix token type 11 months ago
Michael Yang 8d376acc9b zero means zero 11 months ago
Michael Yang 5d0279164c generic ggml.array 11 months ago
Michael Yang 4892872c18 convert: change to colmajor 11 months ago
Michael Yang 2fec73eef6 fix write gguf padding 12 months ago
Bruce MacDonald 6bd0a983cd model: support for mistral-small in the ollama runner 1 year ago
Michael Yang 3b96a93672 fs: move ml.Config to fs package 1 year ago
Jesse Gross f66216e399 ggml: Support heterogeneous KV cache layer sizes in memory estimation 1 year ago
Michael Yang 8d76fa23ef count non-repeating vision layers 1 year ago
Michael Yang 65b88c544f fix divide by zero 1 year ago
Michael Yang a422ba39c9 roughly count gemma3 graph 1 year ago
Michael Yang d2ec22371e count all vision tensors 1 year ago
Michael Yang 033cec232a count gemma3 vision tensors 1 year ago
Patrick Devine 4bed739259
add verbose mode to the show command (#9640) 1 year ago
Daniel Hiltgen ab39e08eb9 llm: auto detect models that require Ollama Engine (#1) 1 year ago
Patrick Devine 5f74d1fd47 gemma2 impl 1 year ago
Daniel Hiltgen 1fdb351c37
New engine: vision models and auto-fallback (#9113) 1 year ago
Michael Yang 53d2990d9b model: add bos token if configured 1 year ago
Michael Yang b16367b4b2 fix: add back bf16 support 1 year ago
Michael Yang 58245413f4
next ollama runner (#7913) 1 year ago