ollama

Commit Graph

Author	SHA1	Message	Date
Michael Yang	fa7776fd24	gpt-oss (#11672 ) * bf16 * tests * gpt-oss * enable gptoss for engine * rough estimate * convert to mxfp4 * handle safetensors U8 * clamp glu/linear * update tokenizer * MXFP4 support This implements the Open Compute Microscaling (MX) FP4 format as a tensor type with backend implementations focusing on mulmat and mulmatid on CPU, CUDA, and Metal. * Unit tests for MXFP4 support This exercises various operations and shapes on both CPU and GPU (if detected on the system) * cuda graph * unit test adjustments * cuda: optimize memory access Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4 * mac: fix crash on old macos versions cblas_sgemm is only supported on v13.3 and up, however bf16 is only supported on v14+ so we were falling back to ggml-blas and crashing on bf16 tensors. Checking for the function being null seems to be the simplest way to condittionally avoid registering the backend. * server: Minimum context length for gptoss This model requires a minimum context length of 8192 to function effectively. Users can set higher values through all normal mechanisms but lower values will be silently reset. * ggml: Multiply by numParallel for gptoss sliding window When computing the graph size estimate, the context size is already multiplied by numParallel so estimates reflect that. However, since sliding window models use a smaller, fixed context size, they need to manually take numParallel into account. * gpt-oss integration includes harmony parser and thinking levels, etc. * fix sync * fix tests * fix lint --------- Co-authored-by: Daniel Hiltgen <daniel@ollama.com> Co-authored-by: Jesse Gross <jesse@ollama.com> Co-authored-by: Devon Rifkin <drifkin@drifkin.net>	8 months ago
Michael Yang	6c733bf0a6	s#x/exp/maps#maps# (#11506 )	8 months ago
Michael Yang	4129af9205	chore: cleanup comments + unused vars (#11225 )	9 months ago
Michael Yang	73b642e6f3	add new gemma model (#11204 ) * update patches * cherry pick metal mean kernel * cherry pick cuda mean kernel * gemma3n	9 months ago
Michael Yang	c088ac0e79	convert: utility for merging tensors (#11069 )	9 months ago
Michael Yang	45f56355d5	feat: uneven splits (#11048 ) The current splitDim function only operates on tensors that are split evenly which isn't always the case, e.g. a QKV tensor. This change allows the function to be used for arbitrary splits	10 months ago
Michael Yang	adff143bcd	fix: mllama quality (#10807 ) * fix mllama convert - transform attn_gate and ffn_gate - swap attention heads for vision models * fix mllama the mlp gate which was applied in the wrong place	10 months ago
Jesse Gross	94ab428e3f	ggml: Seperate tensor load from backend creation Currently, when the backend is created, the tensors are loaded at the same time, which is a slow operation. This separates them to be two steps: - Create backend, including enumerating tensors and memory allocation - Loading tensor data This allows more flexibility in managing model loading.	12 months ago
Michael Yang	333e360422	model: handle multiple eos tokens (#10577 ) * get eos_token_id from generation_config.json * refactor * include both ids and strings in trace * comments * remove special case for gemma3 special vocab (#10743)	11 months ago
Michael Yang	55760195e6	fix mllama conversion (#10716 ) cross attention Q and K projections needs to have their heads swapped, similar to non-cross attention Q and K tensors	11 months ago
Bruce MacDonald	0aa8b371dd	model: add Qwen2.5-VL support (#10385 )	11 months ago
Michael Yang	23125648b8	chore: update mllama to use ollama engine (#10637 )	11 months ago
Michael Yang	b585a58121	chore: remove unused ZipReader type (#10621 )	11 months ago
Daniel Hiltgen	424810450f	Move quantization to new backend (#10363 ) * Move quantization logic to GGML via new backend This moves the model aware logic to Go code and calls GGMLs quantization code for model creation. * Remove "add model quantizations" This is no longer needed now that quantization is implemented in Go+GGML code directly.	11 months ago
湛露先生	7e5c8eee5c	file close check and close. (#10554 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	11 months ago
Michael Yang	7ba9fa9c7d	fixes for maverick	11 months ago
Michael Yang	8bf11b84c1	chunked attention	12 months ago
Michael Yang	f0c66e6dea	llama4	1 year ago
Michael Yang	dc1e81f027	convert: use -1 for read all	11 months ago
Michael Yang	4892872c18	convert: change to colmajor	11 months ago
Michael Yang	2fec73eef6	fix write gguf padding	12 months ago
Bruce MacDonald	6bd0a983cd	model: support for mistral-small in the ollama runner Mistral is a popular research lab making open source models. This updates the forward pass of llama architecture models to support both llama models and mistral models by accounting for additional metadata present in mistral models, and finding the correct dimensions for the output projection.	1 year ago
Bruce MacDonald	9876c9faa4	chore(all): replace instances of interface with any (#10067 ) Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.	1 year ago
Bruce MacDonald	61a8825216	convert: return name of unsupported architecture (#9862 ) When a model's architecture cannot be converted return the name of the unsupported arch in the error message.	1 year ago
Patrick Devine	80c7ce381b	fix: change default context size for gemma3 (#9744 )	1 year ago
jmorganca	83f0ec8269	all: address linter errors	1 year ago
Michael Yang	63a394068c	use 2d pooling	1 year ago
Patrick Devine	2e54d72fc3	fix gemma3 1b conversion	1 year ago
Michael Yang	6b32a2d549	compat with upstream gguf	1 year ago
Michael Yang	d368c039f0	skip repacking vision tensors	1 year ago
Patrick Devine	9b54267e69	fix configs	1 year ago
Michael Yang	46bb0169c4	update model	1 year ago
Patrick Devine	c62861f4fa	fix conversion	1 year ago
Michael Yang	0df1800436	set non-causal attention	1 year ago
Patrick Devine	631fecc6d9	temporary work around for converting spm	1 year ago
Michael Yang	4b037a97dc	add gemma vision encoder	1 year ago
Patrick Devine	5f74d1fd47	gemma2 impl	1 year ago
Michael Yang	58245413f4	next ollama runner (#7913 ) feat: add new Ollama engine using ggml through cgo This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this. - `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go` - `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go` - `ml.Tensor` defines the interface for a tensor and tensor operations This is the first implementation of the new engine. Follow up PRs will implement more features: - non-greedy sampling (#8410) - integration with Ollama and KV caching (#8301) - more model support (#9080) with more coming soon Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>	1 year ago
Josh	93a8daf285	convert: import support for command-r models from safetensors (#6063 ) --------- Co-authored-by: Patrick Devine <patrick@infrahq.com>	1 year ago
Bruce MacDonald	f6f3713001	convert: qwen2 from safetensors (#8408 ) Add native support for converting Qwen2 family models (including Qwen2.5) from safetensors to gguf format so we can run it.	1 year ago
Stefan Weil	abfdc4710f	all: fix typos in documentation, code, and comments (#7021 )	1 year ago
Michael Yang	4456012956	fix unmarshaling merges	1 year ago
Patrick Devine	c7cb0f0602	image processing for llama3.2 (#6963 ) Co-authored-by: jmorganca <jmorganca@gmail.com> Co-authored-by: Michael Yang <mxyng@pm.me> Co-authored-by: Jesse Gross <jesse@ollama.com>	1 year ago
Patrick Devine	84b84ce2db	catch when model vocab size is set correctly (#6714 )	2 years ago
Patrick Devine	608e87bf87	Fix gemma2 2b conversion (#6645 )	2 years ago
Patrick Devine	6c1c1ad6a9	throw an error when encountering unsupport tensor sizes (#6538 )	2 years ago
Michael Yang	60e47573a6	more tokenizer tests	2 years ago
Michael Yang	eae3af6807	clean up convert tokenizer	2 years ago
Michael Yang	3eb08377f8	detect chat template from configs that contain lists	2 years ago
Patrick Devine	0c819e167b	convert safetensor adapters into GGUF (#6327 )	2 years ago

1 2

92 Commits (main)