CSV Format Guide
Everything you need to know about CSV files
What is CSV?
CSV (Comma-Separated Values) is a simple, widely-used file format for storing tabular data. Each line represents a row, and values within each row are separated by commas (or other delimiters).
Example CSV:
name,email,age,city John Doe,john@example.com,30,New York Jane Smith,jane@example.com,25,Los Angeles Bob Wilson,bob@example.com,35,Chicago
Advantages
- Human-readable plain text
- Universal compatibility
- Small file size
- Easy to create and edit
Limitations
- No data type information
- No standard specification
- Limited to flat data
- Encoding ambiguity
CSV Structure
Basic Rules
- Header Row: Optional first row containing column names
- Data Rows: Each subsequent line is a data record
- Fields: Values separated by delimiter (usually comma)
- Line Endings: CRLF (\r\n) or LF (\n)
RFC 4180 Standard
While CSV has no official standard, RFC 4180 provides common conventions:
- 1. Each record is on a separate line
- 2. Last record may or may not have an ending line break
- 3. First line may be a header
- 4. Each line should have the same number of fields
- 5. Fields containing commas, quotes, or line breaks must be quoted
- 6. Double-quotes within quoted fields are escaped by doubling them
Delimiters & Variants
| Format | Delimiter | Extension | Use Case |
|---|---|---|---|
| CSV | , (comma) | .csv | Most common format |
| TSV | \t (tab) | .tsv, .txt | Data with commas in values |
| SSV | ; (semicolon) | .csv | European locales (decimal comma) |
| PSV | | (pipe) | .txt | Data with commas and semicolons |
Quoting Rules
Quoting is essential when your data contains special characters that would otherwise break the CSV structure.
When to Quote
Field contains the delimiter:
"Hello, World",other,values
Field contains line breaks:
"Line 1 Line 2",other,values
Field contains quotes (escape with double quotes):
"She said ""Hello""",other,values
Character Encoding
Character encoding determines how text characters are stored as bytes. Using the wrong encoding can cause garbled text or data loss.
Recommended: UTF-8
- Supports all Unicode characters
- Backwards compatible with ASCII
- Most widely supported encoding
- Web standard
Other Common Encodings
- UTF-16: Windows default for some apps
- ISO-8859-1: Western European
- Windows-1252: Legacy Windows
- ASCII: Basic English only
Tip: For Excel compatibility, save as UTF-8 with BOM (Byte Order Mark) or use UTF-16 LE encoding.
Best Practices
Creating CSV Files
- Always include a header row
- Use consistent delimiter throughout
- Use UTF-8 encoding
- Quote all fields with special characters
- Avoid leading/trailing whitespace
- Use consistent date/number formats
Reading CSV Files
- Detect delimiter automatically when possible
- Handle quoted fields properly
- Trim whitespace from values
- Validate row length consistency
- Handle empty values gracefully
- Be prepared for encoding issues
Common Issues
1. Garbled Characters
Usually caused by encoding mismatch. Try opening with different encoding (UTF-8, Windows-1252, etc.) or use our encoding converter.
2. Wrong Column Alignment
Happens when delimiter detection fails or fields contain unquoted delimiters. Use our CSV validator to detect issues.
3. Excel Changes Numbers
Excel automatically interprets data (e.g., "1-2" becomes a date). Prefix with apostrophe or import as text to prevent this.
4. Missing Quotes
Fields with commas, quotes, or newlines must be quoted. Use our CSV cleaner to fix formatting issues.
Useful Tools
Need help with a specific CSV task? Browse our tools