CSV Format Guide

Everything you need to know about CSV files

What is CSV?

CSV (Comma-Separated Values) is a simple, widely-used file format for storing tabular data. Each line represents a row, and values within each row are separated by commas (or other delimiters).

Example CSV:

name,email,age,city
John Doe,john@example.com,30,New York
Jane Smith,jane@example.com,25,Los Angeles
Bob Wilson,bob@example.com,35,Chicago

Advantages

  • Human-readable plain text
  • Universal compatibility
  • Small file size
  • Easy to create and edit

Limitations

  • No data type information
  • No standard specification
  • Limited to flat data
  • Encoding ambiguity

CSV Structure

Basic Rules

  • Header Row: Optional first row containing column names
  • Data Rows: Each subsequent line is a data record
  • Fields: Values separated by delimiter (usually comma)
  • Line Endings: CRLF (\r\n) or LF (\n)

RFC 4180 Standard

While CSV has no official standard, RFC 4180 provides common conventions:

  • 1. Each record is on a separate line
  • 2. Last record may or may not have an ending line break
  • 3. First line may be a header
  • 4. Each line should have the same number of fields
  • 5. Fields containing commas, quotes, or line breaks must be quoted
  • 6. Double-quotes within quoted fields are escaped by doubling them

Delimiters & Variants

FormatDelimiterExtensionUse Case
CSV, (comma).csvMost common format
TSV\t (tab).tsv, .txtData with commas in values
SSV; (semicolon).csvEuropean locales (decimal comma)
PSV| (pipe).txtData with commas and semicolons

Quoting Rules

Quoting is essential when your data contains special characters that would otherwise break the CSV structure.

When to Quote

Field contains the delimiter:

"Hello, World",other,values

Field contains line breaks:

"Line 1
Line 2",other,values

Field contains quotes (escape with double quotes):

"She said ""Hello""",other,values

Character Encoding

Character encoding determines how text characters are stored as bytes. Using the wrong encoding can cause garbled text or data loss.

Recommended: UTF-8

  • Supports all Unicode characters
  • Backwards compatible with ASCII
  • Most widely supported encoding
  • Web standard

Other Common Encodings

  • UTF-16: Windows default for some apps
  • ISO-8859-1: Western European
  • Windows-1252: Legacy Windows
  • ASCII: Basic English only

Tip: For Excel compatibility, save as UTF-8 with BOM (Byte Order Mark) or use UTF-16 LE encoding.

Best Practices

Creating CSV Files

  • Always include a header row
  • Use consistent delimiter throughout
  • Use UTF-8 encoding
  • Quote all fields with special characters
  • Avoid leading/trailing whitespace
  • Use consistent date/number formats

Reading CSV Files

  • Detect delimiter automatically when possible
  • Handle quoted fields properly
  • Trim whitespace from values
  • Validate row length consistency
  • Handle empty values gracefully
  • Be prepared for encoding issues

Common Issues

1. Garbled Characters

Usually caused by encoding mismatch. Try opening with different encoding (UTF-8, Windows-1252, etc.) or use our encoding converter.

2. Wrong Column Alignment

Happens when delimiter detection fails or fields contain unquoted delimiters. Use our CSV validator to detect issues.

3. Excel Changes Numbers

Excel automatically interprets data (e.g., "1-2" becomes a date). Prefix with apostrophe or import as text to prevent this.

4. Missing Quotes

Fields with commas, quotes, or newlines must be quoted. Use our CSV cleaner to fix formatting issues.

Useful Tools

Need help with a specific CSV task? Browse our tools