«`html
A Step-by-Step Coding Guide to Efficiently Fine-Tune Qwen3-14B Using Unsloth AI on Google Colab with Mixed Datasets and LoRA Optimization
Fine-tuning large language models (LLMs) like Qwen3-14B often requires significant resources, time, and memory, which can hinder rapid experimentation and deployment. Unsloth AI facilitates fast and efficient fine-tuning of state-of-the-art models while minimizing GPU memory usage, utilizing advanced techniques such as 4-bit quantization and Low-Rank Adaptation (LoRA). This tutorial provides a practical implementation on Google Colab for fine-tuning Qwen3-14B using a combination of reasoning and instruction-following datasets.
Installing Required Libraries
We begin by installing essential libraries for fine-tuning the Qwen3 model using Unsloth AI. This installation is optimized for Google Colab to ensure compatibility and reduce overhead:
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
!pip install unsloth
else:
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
!pip install --no-deps unsloth
Loading the Qwen3-14B Model
Next, we load the Qwen3-14B model using FastLanguageModel from the Unsloth library, optimized for efficient fine-tuning:
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/Qwen3-14B",
max_seq_length = 2048,
load_in_4bit = True,
load_in_8bit = False,
full_finetuning = False,
)
Applying LoRA for Efficient Fine-Tuning
LoRA is applied to the Qwen3 model to inject trainable adapters into specific transformer layers:
model = FastLanguageModel.get_peft_model(
model,
r = 32,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_alpha = 32,
lora_dropout = 0,
bias = "none",
use_gradient_checkpointing = "unsloth",
random_state = 3407,
use_rslora = False,
loftq_config = None,
)
Loading Datasets
We load two datasets from the Hugging Face Hub:
from datasets import load_dataset
reasoning_dataset = load_dataset("unsloth/OpenMathReasoning-mini", split="cot")
non_reasoning_dataset = load_dataset("mlabonne/FineTome-100k", split="train")
Generating Conversations for Fine-Tuning
This function transforms raw question-answer pairs into a suitable format:
def generate_conversation(examples):
problems = examples["problem"]
solutions = examples["generated_solution"]
conversations = []
for problem, solution in zip(problems, solutions):
conversations.append([
{"role": "user", "content": problem},
{"role": "assistant", "content": solution},
])
return {"conversations": conversations}
Preparing the Fine-Tuning Dataset
We prepare the fine-tuning dataset by converting the reasoning and instruction datasets into a consistent chat format:
reasoning_conversations = tokenizer.apply_chat_template(
reasoning_dataset["conversations"],
tokenize=False,
)
from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(non_reasoning_dataset)
non_reasoning_conversations = tokenizer.apply_chat_template(
dataset["conversations"],
tokenize=False,
)
import pandas as pd
chat_percentage = 0.75
non_reasoning_subset = pd.Series(non_reasoning_conversations).sample(
int(len(reasoning_conversations) * (1.0 - chat_percentage)),
random_state=2407,
)
data = pd.concat([
pd.Series(reasoning_conversations),
pd.Series(non_reasoning_subset)
])
data.name = "text"
Creating a Hugging Face Dataset
We convert the prepared data into a Hugging Face Dataset:
from datasets import Dataset
combined_dataset = Dataset.from_pandas(pd.DataFrame(data))
combined_dataset = combined_dataset.shuffle(seed=3407)
Setting Up the Trainer
The fine-tuning trainer is initialized with specific hyperparameters:
from trl import SFTTrainer, SFTConfig
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=combined_dataset,
eval_dataset=None,
args=SFTConfig(
dataset_text_field="text",
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=5,
max_steps=30,
learning_rate=2e-4,
logging_steps=1,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
report_to="none",
)
)
Starting the Training Process
We commence the fine-tuning of the Qwen3-14B model:
trainer.train()
Saving the Fine-Tuned Model
Finally, we save the fine-tuned model and tokenizer:
model.save_pretrained("qwen3-finetuned-colab")
tokenizer.save_pretrained("qwen3-finetuned-colab")
In conclusion, Unsloth AI makes fine-tuning large LLMs like Qwen3-14B feasible with limited resources. This tutorial illustrated how to load a 4-bit quantized version of the model, apply structured chat templates, mix multiple datasets for better generalization, and train using TRL’s SFTTrainer. Unsloth’s tools significantly lower the barrier to fine-tuning at scale.
Check out the COLAB NOTEBOOK. Feel free to follow us on Twitter and join our 95k+ ML SubReddit. Subscribe to our Newsletter.
«`