Free Developer Tool

Runs in browser

Fine-Tuning Dataset Formatter

Convert CSV or TSV data into JSONL format for fine-tuning OpenAI, Anthropic, and other LLMs. Auto-detect columns, map fields, validate rows, and download a ready-to-use training file. 100% Browser-Based

Step 1 — CSV / TSV Input

How to Use This Tool

1. Paste or Upload CSV

Paste CSV or TSV data into the text area, or drag and drop a .csv or .tsv file. The parser handles quoted fields, commas inside quotes, and escaped double quotes.

2. Map Columns

Select your Target Format and map each CSV column to the correct JSONL role. Columns named prompt, question, or input are auto-detected as the User field.

3. Download or Copy

Review the preview and validation warnings, then Download .jsonl for a ready-to-upload training file or Copy to Clipboard to paste elsewhere.

Fine-Tuning Format Reference

OpenAI Chat Format

Used for fine-tuning gpt-4o-mini, gpt-3.5-turbo, and other chat models. Each line is a complete conversation with messages array.

{"messages": [
  {"role": "system", "content": "..."},
  {"role": "user", "content": "..."},
  {"role": "assistant", "content": "..."}
]}

Anthropic Format

Used for fine-tuning Claude models. The system prompt is a top-level field, while user/assistant turns are in a messages array.

{"system": "...",
 "messages": [
  {"role": "user", "content": "..."},
  {"role": "assistant", "content": "..."}
]}

Generic Chat JSONL

Simple prompt/completion pairs for any model or framework that ingests JSONL. Useful for datasets that don't target a specific provider API.

{"prompt": "...", "completion": "..."}

Alpaca Format

Popular for open-source instruction-tuned models (LLaMA, Mistral, etc.). Uses instruction, optional input context, and output.

{"instruction": "...",
 "input": "...",
 "output": "..."}

Dataset Quality Tips

Issue Why It Matters Fix
Empty required fields Training on blank rows teaches the model nothing Remove or fill rows flagged by warnings
Very long rows Exceeds context windows; truncated during training Split into smaller examples
Inconsistent style Mixed formality confuses learned behavior Normalize tone before fine-tuning
Too few examples OpenAI recommends 50–100 minimum; more is better Augment with paraphrased or synthetic rows

Common Use Cases

Customer Support Datasets

Export your support ticket history as CSV and convert it to OpenAI Chat format to fine-tune a model that mirrors your team's tone and domain expertise.

Code Instruction Tuning

Convert code review or Q&A spreadsheets into Alpaca format for open-source models. Map the task description to Instruction and the solution to Output.

Multilingual Translation

Use Generic Chat JSONL to convert bilingual CSV data (source / target columns) into prompt-completion pairs for translation fine-tuning tasks.

cta-image

Build Production AI Pipelines

From dataset curation to model fine-tuning and deployment — we design and build LLM workflows that are reliable, cost-effective, and production-ready.

Talk to Our AI Team