Daniel Hiltgen
20c3266e94
Reduce default parallelism to 1 ( #11330 )
The current scheduler algorithm of picking the paralellism based on available
VRAM complicates the upcoming dynamic layer memory allocation algorithm. This
changes the default to 1, with the intent going forward that parallelism is
explicit and will no longer be dynamically determined. Removal of the dynamic
logic will come in a follow up.
9 months ago
Daniel Hiltgen
34088dbcfb
API/CLI context enhancements ( #11331 )
* API: expose context size of loaded models
* CLI: add context UX
This adds a column in the ps output to show the models context size.
9 months ago
Parth Sareen
43107b15b9
add `tool_name` to api.md ( #11326 )
9 months ago
Parth Sareen
1f91cb0c8c
template: add tool result compatibility ( #11294 )
9 months ago
Daniel Hiltgen
12d8ad0d38
ci: modularization ( #11324 )
switch a few constants to variables
9 months ago
Jesse Gross
592d21e7db
Revert "ggml: Temporarily disable reporting UUIDs"
The root cause was an unclean upgrade - this code is fine.
This reverts commit 45f216a9c7 .
9 months ago
Jeffrey Morgan
5a08b01f5b
readme: update Ollama icon size
9 months ago
Daniel Hiltgen
4f473e224c
int: add performance integration tests ( #11173 )
usage example:
go test --tags=integration,perf -count 1 ./integration -v -timeout 1h -run TestModelsPerf 2>&1 | tee int.log
cat int.log | grep MODEL_PERF_HEADER | cut -f2- -d: > perf.csv
cat int.log | grep MODEL_PERF_DATA | cut -f2- -d: >> perf.csv
9 months ago
Daniel Hiltgen
9d60bb44cf
doc: add NVIDIA blackwell to supported list ( #11307 )
9 months ago
Vincent RAMPAL
f371260e75
Update base image to Ubuntu 24.04 LTS ( #9681 )
9 months ago
Daniel Hiltgen
c9e6d7719e
doc: Update link for mac install ( #11288 )
Favor the dmg now.
9 months ago
Daniel Hiltgen
2c4ce40334
mimic logs for layers on new engine ( #11278 )
This adds some extra logs to make the new engine a bit more consistent
with the llama engine.
9 months ago
XuKecheng
5d8c173529
readme: add NativeMind to community integrations ( #11242 )
9 months ago
Jeffrey Morgan
44b17d2bfa
tools: fix parsing tool calls with empty arguments, missing required fields ( #11233 )
9 months ago
Attogram Project
3b8b692218
readme: add ollama-bash-toolshed to community integrations ( #11224 )
9 months ago
Michael Yang
4129af9205
chore: cleanup comments + unused vars ( #11225 )
9 months ago
Jesse Gross
45f216a9c7
ggml: Temporarily disable reporting UUIDs
This is causing segfaults, so disable it. Currently UUIDs are only
used for debugging purposes, although they planned to be used in
additional ways in the future.
Bug #11211
9 months ago
Michael Yang
d0b32def60
skip quantizing per_layer_token_embd ( #11207 )
this tensor isn't compatible with cuda when quantized to q4_K so skip it
9 months ago
Daniel Hiltgen
11ffc36157
ci: multi-stage release process ( #11001 )
9 months ago
Jeffrey Morgan
ba04902670
fs/ggml: add multiplier in graph estimates ( #11208 )
9 months ago
Jeffrey Morgan
3944602f51
fs/ggml: add missing architecture to OllamaEngineRequired() ( #11206 )
9 months ago
Michael Yang
73b642e6f3
add new gemma model ( #11204 )
* update patches
* cherry pick metal mean kernel
* cherry pick cuda mean kernel
* gemma3n
9 months ago
Daniel Hiltgen
ad118d8b13
ci: arm sbsa fixes ( #11194 )
9 months ago
Daniel Hiltgen
f08534137b
ci: include dependencies
9 months ago
Daniel Hiltgen
4b4a90f233
ci: pick up arm sbsa cuda libs ( #11192 )
9 months ago
Daniel Hiltgen
03274a6b2f
ci: recombine linux amd64 binaries ( #11188 )
Glue the rocm and archive builds back together.
9 months ago
Devon Rifkin
cc6463ebca
Merge pull request #10238 from ollama/drifkin/array-head-count-simple
ggml: fix crash for array head counts
9 months ago
Daniel Hiltgen
405d2f628f
ci: rocm parallel builds on windows ( #11187 )
The preset CMAKE_HIP_FLAGS isn't getting used on Windows.
This passes the parallel flag in through the C/CXX flags, along
with suppression for some log spew warnings to quiet down the build.
9 months ago
Devon Rifkin
a3f7dd3e98
Merge branch 'main' into drifkin/array-head-count-simple
9 months ago
Daniel Hiltgen
c85c0ebf89
CI: switch windows to vs 2022 ( #11184 )
* CI: switch windows to vs 2022
* ci: fix regex match
9 months ago
Daniel Hiltgen
10a8e04a8d
avoid context overflow ( #11175 )
For smaller context models, make sure we do not exceed the training size.
9 months ago
Daniel Hiltgen
1c6669e64c
Re-remove cuda v11 ( #10694 )
* Re-remove cuda v11
Revert the revert - drop v11 support requiring drivers newer than Feb 23
This reverts commit c6bcdc4223 .
* Simplify layout
With only one version of the GPU libraries, we can simplify things down somewhat. (Jetsons still require special handling)
* distinct sbsa variant for linux arm64
This avoids accidentally trying to load the sbsa cuda libraries on
a jetson system which results in crashes.
* temporary prevent rocm+cuda mixed loading
9 months ago
Devon Rifkin
b2b270ad5d
Merge branch 'main' into drifkin/array-head-count-simple
9 months ago
AJ
2bb69b40c7
readme: add ai-hub to community integrations ( #11169 )
9 months ago
Daniel Hiltgen
65bff664cb
build speedups ( #11142 )
Enable parallel building of the GPU architectures.
9 months ago
Michael Yang
c088ac0e79
convert: utility for merging tensors ( #11069 )
9 months ago
Michael Yang
0a066cfd91
Reapply "feat: incremental gguf parser ( #10822 )" ( #11114 ) ( #11119 )
* Reapply "feat: incremental gguf parser (#10822 )" (#11114 )
This reverts commit a6e64fbdf2 .
* fix older ggufs
9 months ago
Jesse Gross
87b7af6cee
ggml: Check return status for computation.
We don't check the return status after computing the graph, which
can silently lead to bad outputs if we try to keep going and future
computation succeeds. This appears to happens in certain cases on
Apple M2 devices.
Fixes #11070
9 months ago
Daniel Hiltgen
f2527b08fb
int: add coverage for older models ( #11137 )
Verified these fail on 0.9.1 and pass on HEAD.
9 months ago
Jeffrey Morgan
8bcb3125c1
benchmark: remove unused benchmark test ( #11120 )
Removes a test under benchmark/ that is unused
10 months ago
Jeffrey Morgan
6baf1e31e2
Revert "Revert "ggml: Export GPU UUIDs" ( #11115 )" ( #11117 )
Reverts PR #11115 . The original change was mistakingly reverted instead of #10822
10 months ago
Jeffrey Morgan
ed567ef43b
Revert "ggml: Export GPU UUIDs" ( #11115 )
This reverts commit aaa7818000 .
10 months ago
Jeffrey Morgan
a6e64fbdf2
Revert "feat: incremental gguf parser ( #10822 )" ( #11114 )
This reverts commit 6b04cad7e8 .
10 months ago
曹家巧
60cfa2a203
cache: fix comment function name in cache.go ( #11110 )
10 months ago
Jeffrey Morgan
55bbf3b4a1
tools: return empty arguments object instead of null ( #11113 )
10 months ago
Jeffrey Morgan
6bda1d2479
tools: fix parsing tool calls without any parameters ( #11101 )
Fixes issue where tool calls that don't expect any parameters were
not being parsed. This also fixes two additional issues: one where
2+ tool calls would not be correctly parsed, and cases where tool calls
with invalid parameters would still get parsed
10 months ago
Jeffrey Morgan
9e125d884c
model: treat 'user defined' tokens as special tokens ( #11077 )
10 months ago
Michael Yang
a6fbfc880c
gguf: fix write order ( #11068 )
* ggml: test write gguf order
* ggml: fix write tensor order
10 months ago
NGC13009
502028968d
readme: add ollama-launcher to community integrations ( #11080 )
10 months ago
Phil
5a8eb0e151
readme: add GPTranslate to community integrations ( #11071 )
10 months ago