15 Commits (6dcc5dfb9c0a033e4e8dde627d55580600418fb6)

Author SHA1 Message Date
Michael Yang 333e360422
model: handle multiple eos tokens (#10577) 11 months ago
Michael Yang 54055a6dae fix test 11 months ago
Parth Sareen a53d744b01
llama: remove model loading for grammar (#10096) 12 months ago
Parth Sareen 42a14f7f63
sample: add error handling for empty logits (#9740) 1 year ago
Parth Sareen 108fe02165
sample: make mutations in transforms explicit (#9743) 1 year ago
Parth Sareen 5c0b663969
sample: separate softmax and temperature transforms (#9732) 1 year ago
ParthSareen 4aeb67ef4c sample: do all sorting in topK 1 year ago
ParthSareen 3ba91634c1 sample: simplify top_k=0 sorting 1 year ago
ParthSareen 1b7433b71e sample: use container/heap for top_k 1 year ago
Parth Sareen 7e34f4fbfa
sample: add numerical stability to temperature/softmax transform (#9631) 1 year ago
Jeffrey Morgan e093db92c4
sample: temporarily use grammars for constrained generation in new engine (#9586) 1 year ago
Parth Sareen 0682dae027
sample: improve ollama engine sampler performance (#9374) 1 year ago
Parth Sareen c245b0406f
sample: remove transforms from greedy sampling (#9377) 1 year ago
Parth Sareen 0b7e1676eb
sample: add sampling package for new engine (#8410) 1 year ago
Michael Yang 58245413f4
next ollama runner (#7913) 1 year ago