⬆️Upload Model

🤝 Share your model and connect with developers, researchers, and users for support and collaboration.

To upload models, you will need to create an account with Nexa AI Hub. Currently, you can upload model through web interface. After uploading the model, you will have control over what files to include in your model, and how to create a tag for your model to make your model more discoverable, read more below.

Upload Method

Step 0. To upload models to the Hub, visit Nexa Hub and make sure you registered an account for Nexa AI Hub.

Step 1. Click "Upload your model"

Step 2. Fill-in model name, parameters, model type, and license (optional)

Step 3. Fill-in Model tag name, see the tag section on how to tag your model

Step 4. Edit model tag to make your model more discoverable

Step 5. Upload files

Step 6. Edit README to add more descriptions for your model

About Model Tag

Model Tag Name

We recommend that model tag names should make sense. In official models in Model Hub, we add model precision in the model tag name.

Model PrecisionBits per Weight (BPW) Approximation

gguf-q2_K

2

gguf-q3_K_L

3

gguf-q3_K_M

3

gguf-q3_K_S

3

gguf-q4_0

4

gguf-q4_1

4

gguf-q4_K_M

4

gguf-q4_K_S

4

gguf-q5_0

5

gguf-q5_1

5

gguf-q5_K_M

5

gguf-q5_K_S

5

gguf-q6_K

6

gguf-q8_0

8

onnx-int4

4

onnx-int8

8

onnx-bf16

16

onnx-fp16

16

onnx-fp32

32

File Format Tag

  • GGUF

GGUF is an optimized binary format designed for efficient model loading and saving, particularly suited for inference tasks. It is compatible with GGML and other executors. Developed by @ggerganov, the creator of llama.cpp (a widely-used C/C++ LLM inference framework), GGUF forms the foundation of the Nexa SDK's GGML component.

  • ONNX

ONNX is an open standard format for representing machine learning models. It establishes a common set of operators and a unified file format, enabling AI developers to utilize models across various frameworks, tools, runtimes, and compilers. ONNX shows unique performance advantages on devices with limited ram(mobile, IoT). The Nexa SDK's ONNX component is built upon the onnxruntime framework.

RAM

This metric indicates the minimum random access memory (RAM) necessary for local model execution. You can calculate the approximate RAM required according to the formula:

RAM=ParametersBPW/8RAM = Parameters * BPW / 8

File Size

Displays the total storage space required for the model.

Last updated