Fine-Tuning Dataset Formatter

Convert CSV or TSV data into JSONL format for fine-tuning OpenAI, Anthropic, and other LLMs. Auto-detect columns, map fields, validate rows, and download a ready-to-use training file. 100% Browser-Based

Step 1 — CSV / TSV Input

Upload File

Target Format

Preview (first 5 rows)

How to Use This Tool

1. Paste or Upload CSV

Paste CSV or TSV data into the text area, or drag and drop a .csv or .tsv file. The parser handles quoted fields, commas inside quotes, and escaped double quotes.

2. Map Columns

Select your Target Format and map each CSV column to the correct JSONL role. Columns named prompt, question, or input are auto-detected as the User field.

3. Download or Copy

Review the preview and validation warnings, then Download .jsonl for a ready-to-upload training file or Copy to Clipboard to paste elsewhere.

How were the results?

Report Bug Suggest Improvement

Fine-Tuning Format Reference

OpenAI Chat Format

Used for fine-tuning gpt-4o-mini, gpt-3.5-turbo, and other chat models. Each line is a complete conversation with messages array.

{"messages": [
  {"role": "system", "content": "..."},
  {"role": "user", "content": "..."},
  {"role": "assistant", "content": "..."}
]}

Anthropic Format

Used for fine-tuning Claude models. The system prompt is a top-level field, while user/assistant turns are in a messages array.

{"system": "...",
 "messages": [
  {"role": "user", "content": "..."},
  {"role": "assistant", "content": "..."}
]}

Generic Chat JSONL

Simple prompt/completion pairs for any model or framework that ingests JSONL. Useful for datasets that don't target a specific provider API.

{"prompt": "...", "completion": "..."}

Alpaca Format

Popular for open-source instruction-tuned models (LLaMA, Mistral, etc.). Uses instruction, optional input context, and output.

{"instruction": "...",
 "input": "...",
 "output": "..."}

Dataset Quality Tips

Issue	Why It Matters	Fix
Empty required fields	Training on blank rows teaches the model nothing	Remove or fill rows flagged by warnings
Very long rows	Exceeds context windows; truncated during training	Split into smaller examples
Inconsistent style	Mixed formality confuses learned behavior	Normalize tone before fine-tuning
Too few examples	OpenAI recommends 50–100 minimum; more is better	Augment with paraphrased or synthetic rows

Common Use Cases

Customer Support Datasets

Export your support ticket history as CSV and convert it to OpenAI Chat format to fine-tune a model that mirrors your team's tone and domain expertise.

Code Instruction Tuning

Convert code review or Q&A spreadsheets into Alpaca format for open-source models. Map the task description to Instruction and the solution to Output.

Multilingual Translation

Use Generic Chat JSONL to convert bilingual CSV data (source / target columns) into prompt-completion pairs for translation fine-tuning tasks.

Build Production AI Pipelines

From dataset curation to model fine-tuning and deployment — we design and build LLM workflows that are reliable, cost-effective, and production-ready.

Talk to Our AI Team