ndjson

ndjson

curl http://localhost:11434/api/generate -d '{
                                                                                                                            "model": "llama3.2",
                                                                                                                            "prompt": "Where is Dublin? Answer in a six words"
                                                                                                                          }'
{"model":"llama3.2","created_at":"2025-01-14T17:48:33.15898Z","response":"Located","done":false}
{"model":"llama3.2","created_at":"2025-01-14T17:48:33.183229Z","response":" on","done":false}
{"model":"llama3.2","created_at":"2025-01-14T17:48:33.206942Z","response":" the","done":false}
{"model":"llama3.2","created_at":"2025-01-14T17:48:33.230918Z","response":" east","done":false}
{"model":"llama3.2","created_at":"2025-01-14T17:48:33.254533Z","response":" coast","done":false}
{"model":"llama3.2","created_at":"2025-01-14T17:48:33.278113Z","response":" Ireland","done":false}
{"model":"llama3.2","created_at":"2025-01-14T17:48:33.301689Z","response":".","done":false}
{"model":"llama3.2","created_at":"2025-01-14T17:48:33.3255Z","response":"","done":true,"done_reason":"stop","context":[128006,9125,128007,271,38766,1303,33025,2696,25,6790,220,2366,18,271,128009,128006,882,128007,271,9241,374,33977,30,22559,304,264,4848,4339,128009,128006,78191,128007,271,48852,389,279,11226,13962,14990,13],"total_duration":2392671125,"load_duration":575523041,"prompt_eval_count":34,"prompt_eval_duration":1649000000,"eval_count":8,"eval_duration":167000000}

I was playing around with ollama API to explore the API capabilities and noticed the HTTP response was streaming JSON that prompted me to look into the response headers.

curl -v http://localhost:11434/api/generate -d '{
                                                                                                                            "model": "llama3.2",
                                                                                                                            "prompt": "Where is Dublin? Answer in a six words"
                                                                                                                          }'
* Host localhost:11434 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:11434...
* connect to ::1 port 11434 from ::1 port 49217 failed: Connection refused
*   Trying 127.0.0.1:11434...
* Connected to localhost (127.0.0.1) port 11434
> POST /api/generate HTTP/1.1
> Host: localhost:11434
> User-Agent: curl/8.7.1
> Accept: */*
> Content-Length: 250
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 250 bytes
< HTTP/1.1 200 OK
< Content-Type: application/x-ndjson
< Date: Tue, 14 Jan 2025 17:49:29 GMT
< Transfer-Encoding: chunked
<
...

The content type is application/x-ndjson and quick search hinted it’s a new line separated JSON that can be used in streaming protocols. Also the Transfer-Encodingis chunked and fits well with for LLM responses over the wire.

curl http://localhost:11434/api/generate -d '{
                                                                                                                                   "model": "llama3.2",
                                                                                                                                   "prompt": "Where is Dublin? Answer in a six words"
                                                                                                                                 }' | jq .response
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1399    0  1149  100   250   3291    716 --:--:-- --:--:-- --:--:--  3997
"Located"
" on"
" the"
" east"
" coast"
" Ireland"
"."
""

Also jqcould handle new line delimited json.

JSON Streaming formats

While researching further on JSON streaming there are several other approaches to stream JSON objects. Notable ones are ndjson, jsonl, json-seq. All these formats are useful for processing and parallelising large JSON objects without loading entire dataset into the memory.

Syntax

It’s quite interesting to see the different use cases of different variations of JSON formats.