A Step-by-Step Coding Guide to Efficiently Fine-Tune Qwen3-14B Using Unsloth AI on Google Colab with Mixed Datasets and LoRA Optimization

«`html

A Step-by-Step Coding Guide to Efficiently Fine-Tune Qwen3-14B Using Unsloth AI on Google Colab with Mixed Datasets and LoRA Optimization

Fine-tuning large language models (LLMs) like Qwen3-14B often requires significant resources, time, and memory, which can hinder rapid experimentation and deployment. Unsloth AI facilitates fast and efficient fine-tuning of state-of-the-art models while minimizing GPU memory usage, utilizing advanced techniques such as 4-bit quantization and Low-Rank Adaptation (LoRA). This tutorial provides a practical implementation on Google Colab for fine-tuning Qwen3-14B using a combination of reasoning and instruction-following datasets.

Installing Required Libraries

We begin by installing essential libraries for fine-tuning the Qwen3 model using Unsloth AI. This installation is optimized for Google Colab to ensure compatibility and reduce overhead:

%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
    !pip install --no-deps unsloth

Loading the Qwen3-14B Model

Next, we load the Qwen3-14B model using FastLanguageModel from the Unsloth library, optimized for efficient fine-tuning:

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen3-14B",
    max_seq_length = 2048,
    load_in_4bit = True,
    load_in_8bit = False,
    full_finetuning = False,
)

Applying LoRA for Efficient Fine-Tuning

LoRA is applied to the Qwen3 model to inject trainable adapters into specific transformer layers:

model = FastLanguageModel.get_peft_model(
    model,
    r = 32,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 32,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

Loading Datasets

We load two datasets from the Hugging Face Hub:

from datasets import load_dataset

reasoning_dataset = load_dataset("unsloth/OpenMathReasoning-mini", split="cot")
non_reasoning_dataset = load_dataset("mlabonne/FineTome-100k", split="train")

Generating Conversations for Fine-Tuning

This function transforms raw question-answer pairs into a suitable format:

def generate_conversation(examples):
    problems  = examples["problem"]
    solutions = examples["generated_solution"]
    conversations = []
    for problem, solution in zip(problems, solutions):
        conversations.append([
            {"role": "user", "content": problem},
            {"role": "assistant", "content": solution},
        ])
    return {"conversations": conversations}

Preparing the Fine-Tuning Dataset

We prepare the fine-tuning dataset by converting the reasoning and instruction datasets into a consistent chat format:

reasoning_conversations = tokenizer.apply_chat_template(
    reasoning_dataset["conversations"],
    tokenize=False,
)

from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(non_reasoning_dataset)

non_reasoning_conversations = tokenizer.apply_chat_template(
    dataset["conversations"],
    tokenize=False,
)

import pandas as pd

chat_percentage = 0.75
non_reasoning_subset = pd.Series(non_reasoning_conversations).sample(
    int(len(reasoning_conversations) * (1.0 - chat_percentage)),
    random_state=2407,
)

data = pd.concat([
    pd.Series(reasoning_conversations),
    pd.Series(non_reasoning_subset)
])
data.name = "text"

Creating a Hugging Face Dataset

We convert the prepared data into a Hugging Face Dataset:

from datasets import Dataset

combined_dataset = Dataset.from_pandas(pd.DataFrame(data))
combined_dataset = combined_dataset.shuffle(seed=3407)

Setting Up the Trainer

The fine-tuning trainer is initialized with specific hyperparameters:

from trl import SFTTrainer, SFTConfig

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=combined_dataset,
    eval_dataset=None,
    args=SFTConfig(
        dataset_text_field="text",
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=30,
        learning_rate=2e-4,
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        report_to="none",
    )
)

Starting the Training Process

We commence the fine-tuning of the Qwen3-14B model:

trainer.train()

Saving the Fine-Tuned Model

Finally, we save the fine-tuned model and tokenizer:

model.save_pretrained("qwen3-finetuned-colab")
tokenizer.save_pretrained("qwen3-finetuned-colab")

In conclusion, Unsloth AI makes fine-tuning large LLMs like Qwen3-14B feasible with limited resources. This tutorial illustrated how to load a 4-bit quantized version of the model, apply structured chat templates, mix multiple datasets for better generalization, and train using TRL’s SFTTrainer. Unsloth’s tools significantly lower the barrier to fine-tuning at scale.

Check out the COLAB NOTEBOOK. Feel free to follow us on Twitter and join our 95k+ ML SubReddit. Subscribe to our Newsletter.

«`