GGUF

Inference with GGUF models

Text-generation model

Command usage

usage: nexa run [-h] [-t TEMPERATURE] [-m MAX_NEW_TOKENS] [-k TOP_K] [-p TOP_P] [-sw [STOP_WORDS ...]] [-pf] [-st] model_path

Options

-t, --temperature: Temperature for sampling
-m, --max_new_tokens: Maximum number of new tokens to generate
-k, --top_k: Top-k sampling parameter
-p, --top_p: Top-p sampling parameter
-sw, --stop_words: List of stop words for early stopping
-hf, --huggingface: Load model from Hugging Face Hub, use Hugging Face repo_id as model_path
-pf, --profiling: Enable profiling logs for the inference process
-st, --streamlit: Run the inference in Streamlit UI

Streamlit Interface

Image-generation model

Command Usage

usage: nexa gen-image [-h] [-i2i] [-ns NUM_INFERENCE_STEPS] [-np NUM_IMAGES_PER_PROMPT] [-H HEIGHT] [-W WIDTH] [-g GUIDANCE_SCALE] [-o OUTPUT] [-s RANDOM_SEED] [--lora_dir LORA_DIR] [--wtype WTYPE] [--control_net_path CONTROL_NET_PATH] [--control_image_path CONTROL_IMAGE_PATH] [--control_strength CONTROL_STRENGTH] [-st] model_path

Options

-i2i, --img2img: Whether to run image-to-image generation
-ns, --num_inference_steps: Number of inference steps
-np, --num_images_per_prompt: Number of images to generate per prompt
-H, --height: Height of the output image
-W, --width: Width of the output image
-g, --guidance_scale: Guidance scale for diffusion
-o, --output: Output path for the generated image
-s, --random_seed: Random seed for image generation
--lora_dir: Path to directory containing LoRA files
--wtype: Weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)
--control_net_path: Path to Control Net model
--control_image_path: Path to image condition for Control Net
--control_strength: Strength to apply Control Net
-st, --streamlit: Run the inference in Streamlit UI

Streamlit Interface

Vision-language model

Command Usage

usage: nexa vlm [-h] [-t TEMPERATURE] [-m MAX_NEW_TOKENS] [-k TOP_K] [-p TOP_P] [-sw [STOP_WORDS ...]] [-pf] [-st] model_path

Options

-t, --temperature: Temperature for sampling
-m, --max_new_tokens: Maximum number of new tokens to generate
-k, --top_k: Top-k sampling parameter
-p, --top_p: Top-p sampling parameter
-sw, --stop_words: List of stop words for early stopping
-pf, --profiling: Enable profiling logs for the inference process
-st, --streamlit: Run the inference in Streamlit UI

Streamlit Interface

Automatic Speech Recognition model

Command Usage

usage: nexa asr [-h] [-o OUTPUT_DIR] [-b BEAM_SIZE] [-l LANGUAGE] [--task TASK] [-c COMPUTE_TYPE] [-st] model_path

Options

-o, --output_dir: Output directory for transcriptions
-b, --beam_size: Beam size to use for transcription
-l, --language: The language spoken in the audio. It should be a language code like 'en' or 'fr'.
--task: Task to execute (transcribe or translate)
-c, --compute_type: Type to use for computation (e.g., default, float16, int8, int8_float16)
-st, --streamlit: Run the inference in Streamlit UI

Streamlit Interface

PreviousInference NextONNX

Last updated 5 hours ago