GGUF

Inference with GGUF models

Text-generation model

Command usage

usage: nexa run [-h] [-t TEMPERATURE] [-m MAX_NEW_TOKENS] [-k TOP_K] [-p TOP_P] [-sw [STOP_WORDS ...]] [-pf] [-st] model_path

Options

  • -t, --temperature: Temperature for sampling

  • -m, --max_new_tokens: Maximum number of new tokens to generate

  • -k, --top_k: Top-k sampling parameter

  • -p, --top_p: Top-p sampling parameter

  • -sw, --stop_words: List of stop words for early stopping

  • -hf, --huggingface: Load model from Hugging Face Hub, use Hugging Face repo_id as model_path

  • -pf, --profiling: Enable profiling logs for the inference process

  • -st, --streamlit: Run the inference in Streamlit UI

Streamlit Interface

Image-generation model

Command Usage

usage: nexa gen-image [-h] [-i2i] [-ns NUM_INFERENCE_STEPS] [-np NUM_IMAGES_PER_PROMPT] [-H HEIGHT] [-W WIDTH] [-g GUIDANCE_SCALE] [-o OUTPUT] [-s RANDOM_SEED] [--lora_dir LORA_DIR] [--wtype WTYPE] [--control_net_path CONTROL_NET_PATH] [--control_image_path CONTROL_IMAGE_PATH] [--control_strength CONTROL_STRENGTH] [-st] model_path

Options

  • -i2i, --img2img: Whether to run image-to-image generation

  • -ns, --num_inference_steps: Number of inference steps

  • -np, --num_images_per_prompt: Number of images to generate per prompt

  • -H, --height: Height of the output image

  • -W, --width: Width of the output image

  • -g, --guidance_scale: Guidance scale for diffusion

  • -o, --output: Output path for the generated image

  • -s, --random_seed: Random seed for image generation

  • --lora_dir: Path to directory containing LoRA files

  • --wtype: Weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)

  • --control_net_path: Path to Control Net model

  • --control_image_path: Path to image condition for Control Net

  • --control_strength: Strength to apply Control Net

  • -st, --streamlit: Run the inference in Streamlit UI

Streamlit Interface

Vision-language model

Command Usage

usage: nexa vlm [-h] [-t TEMPERATURE] [-m MAX_NEW_TOKENS] [-k TOP_K] [-p TOP_P] [-sw [STOP_WORDS ...]] [-pf] [-st] model_path

Options

  • -t, --temperature: Temperature for sampling

  • -m, --max_new_tokens: Maximum number of new tokens to generate

  • -k, --top_k: Top-k sampling parameter

  • -p, --top_p: Top-p sampling parameter

  • -sw, --stop_words: List of stop words for early stopping

  • -pf, --profiling: Enable profiling logs for the inference process

  • -st, --streamlit: Run the inference in Streamlit UI

Streamlit Interface

Automatic Speech Recognition model

Command Usage

usage: nexa asr [-h] [-o OUTPUT_DIR] [-b BEAM_SIZE] [-l LANGUAGE] [--task TASK] [-c COMPUTE_TYPE] [-st] model_path

Options

  • -o, --output_dir: Output directory for transcriptions

  • -b, --beam_size: Beam size to use for transcription

  • -l, --language: The language spoken in the audio. It should be a language code like 'en' or 'fr'.

  • --task: Task to execute (transcribe or translate)

  • -c, --compute_type: Type to use for computation (e.g., default, float16, int8, int8_float16)

  • -st, --streamlit: Run the inference in Streamlit UI

Streamlit Interface

Last updated