Skip to main content

LLM Parameters

temperature

  • Type: Optional, float, 0.0 to 2.0
  • Default: 1.0
  • Description: This setting influences the variety in the model's responses. Lower values lead to more predictable and typical responses, while higher values encourage more diverse and less common responses. At 0, the model always gives the same response for a given input.

top_p

  • Type: Optional, float, 0.0 to 1.0
  • Default: 1.0
  • Description: This setting limits the model's choices to a percentage of likely tokens: only the top tokens whose probabilities add up to P. A lower value makes the model's responses more predictable, while the default setting allows for a full range of token choices. Think of it like a dynamic Top-K.

top_k

  • Type: Optional, integer, 0 or above
  • Default: 0
  • Description: This limits the model's choice of tokens at each step, making it choose from a smaller set. A value of 1 means the model will always pick the most likely next token, leading to predictable results. By default this setting is disabled, making the model to consider all choices.

frequency_penalty

  • Type: Optional, float, -2.0 to 2.0
  • Default: 0
  • Description: This limits the model's choice of tokens at each step, making it choose from a smaller set. A value of 1 means the model will always pick the most likely next token, leading to predictable results. By default this setting is disabled, making the model to consider all choices.

presence_penalty

  • Type: Optional, float, -2.0 to 2.0
  • Default: 0.0
  • Description: Adjusts how often the model repeats specific tokens already used in the input. Higher values make such repetition less likely, while negative values do the opposite. Token penalty does not scale with the number of occurrences. Negative values will encourage token reuse.

repeatation_penalty

  • Type: Optional, float, 0.0 to 2.0
  • Default: 1.0
  • Description: Helps to reduce the repetition of tokens from the input. A higher value makes the model less likely to repeat tokens, but too high a value can make the output less coherent (often with run-on sentences that lack small words). Token penalty scales based on original token's probability.

min_p

  • Type: Optional, float, 0.0 to 1.0
  • Default: 0.0
  • Description: Represents the minimum probability for a token to be considered, relative to the probability of the most likely token. (The value changes depending on the confidence level of the most probable token.) If your Min-P is set to 0.1, that means it will only allow for tokens that are at least 1/10th as probable as the best possible option.

top_a

  • Type: Optional, float, 0.0 to 1.0
  • Default: 0.0
  • Description: Consider only the top tokens with "sufficiently high" probabilities based on the probability of the most likely token. Think of it like a dynamic Top-P. A lower Top-A value focuses the choices based on the highest probability token but with a narrower scope. A higher Top-A value does not necessarily affect the creativity of the output, but rather refines the filtering process based on the maximum probability.

seed

  • Type: Optional, integer
  • Description: If specified, the inferencing will sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed for some models.

max_tokens

  • Type: Optional, integer, 1 or above
  • Description: This sets the upper limit for the number of tokens the model can generate in response. It won't produce more than this limit. The maximum value is the context length minus the prompt length.

logit_bias

  • Type: Optional, map
  • Description: Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.

logprobs

  • Type: Optional, boolean
  • Description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.

top_logprobs

  • Type: Optional, integer
  • Description:An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.

response_format

  • Type: Optional, map
  • Description: Forces the model to produce specific output format. Setting to { "type": "json_object" } enables JSON mode, which guarantees the message the model generates is valid JSON. Note: when using JSON mode, you should also instruct the model to produce JSON yourself via a system or user message.

stop

  • Type: Optional, array
  • Description:Stop generation immediately if the model encounter any token specified in the stop array.