Generate a completion - GENXT Confidential LLM API

This endpoint generates a text completion based on a given prompt using a specified AI model. It can optionally accept base64-encoded images for multimodal model interactions. The endpoint supports streaming responses where the result is delivered in a series of messages; this can be disabled to receive a single response object.

curl --location --request POST 'https://api.genxt.ai/api/generate' \ --header 'Content-Type: application/json' \ --data-raw '{ "model": "string", "prompt": "string", "images": [ "string" ], "options": {}, "stream": true }'

{ "model": "string", "created_at": "2019-08-24T14:15:22Z", "response": "string", "done": true, "total_duration": 0, "load_duration": 0, "eval_count": 0, "eval_duration": 0, "context": [ 0 ] }

Request

Authorization

Provide your bearer token in the

Authorization

header when making requests to protected resources.

Example:

Authorization: Bearer ********************

Body Params application/json

model

string

required

The name of the AI model to use for generating the completion.

prompt

string

optional

The text prompt based on which the model will generate a completion.

images

array[string <base64>]

optional

A list of base64-encoded images to be processed by multimodal models like llava.

options

object

optional

Optional parameters for model configuration as defined in the model's documentation.

stream

boolean

optional

If true, responses are streamed as a series of JSON objects; if false, a single JSON response is returned.

Default:

true

Examples

Responses

🟢200Successfully generated completion with model statistics and response data.

application/json

Body

model

string

optional

created_at

string <date-time>

optional

response

string

optional

done

boolean

optional

total_duration

integer <int64>

optional

Total time taken for the generation process in nanoseconds.

load_duration

integer <int64>

optional

Time spent loading the model into memory.

eval_count

integer <int64>

optional

Number of tokens in the generated response.

eval_duration

integer <int64>

optional

Time spent in nanoseconds generating the response.

context

array[integer]

optional

An encoding of the conversational context, useful for maintaining state across requests.