Base64 Documentation

Learn about Base64 encoding, how it works, and common use cases.

What is Base64 Encoding?

Base64 is a group of binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation. The name "Base64" comes from the specific MIME content transfer encoding.

Base64 encoding schemes are commonly used when there is a need to encode binary data that needs to be stored and transferred over media that are designed to deal with ASCII text. This is to ensure that the data remains intact without modification during transport.

How Base64 Encoding Works

Base64 takes every 3 bytes (24 bits) of binary data and encodes them into 4 ASCII characters (6 bits each). The process works as follows:

  1. Take 3 bytes (24 bits) of binary data
  2. Split into four 6-bit groups
  3. Map each 6-bit group to a Base64 character from the alphabet
  4. If the input data length is not divisible by 3, add = padding

The Base64 Alphabet

ValueCharValueCharValueCharValueChar
0A16Q32g48w
1B17R33h49x
2C18S34i50y
3D19T35j51z
4E20U36k520
5F21V37l531
6G22W38m542
7H23X39n553
8I24Y40o564
9J25Z41p575
10K26a42q586
11L27b43r597
12M28c44s608
13N29d45t619
14O30e46u62+
15P31f47v63/

Encoding Example

Let's encode the string Man:

  • ASCII bytes: 77 97 110 → Binary: 01001101 01100001 01101110
  • Split into 6-bit groups: 010011 010110 000101 101110
  • Values: 19, 22, 5, 46 → Characters: T, W, F, u
  • Result: TWFu

URL-Safe Base64

Standard Base64 uses + and / characters, which have special meaning in URLs. URL-safe Base64 replaces these with - and _ respectively, and often omits the trailing = padding.

This variant is used in JWT (JSON Web Tokens), URL parameters, and filename-safe contexts.

Character Sets and Encoding

When encoding text to Base64, you must first convert the text to bytes. The choice of character set (charset) determines how each character maps to bytes:

  • UTF-8: The universal standard, supports all Unicode characters. Multi-byte for non-ASCII.
  • UTF-16 LE: Uses 2 bytes per character (little-endian). Common in Windows environments.
  • ISO-8859-1 (Latin-1): Single-byte, covers Western European languages.
  • GBK: Chinese national standard for Simplified Chinese, extends GB2312.
  • Big5: Traditional Chinese character encoding, used in Taiwan and Hong Kong.
  • Shift-JIS: Japanese character encoding, widely used in legacy systems.
  • CP850: IBM code page 850, used in DOS and legacy Western European systems.

Data URLs

Data URLs allow you to embed file contents directly in HTML, CSS, or JavaScript using Base64 encoding:

data:[<mediatype>][;base64],<data>

Examples:

<!-- HTML image embedding -->
<img src="data:image/png;base64,iVBORw0KGgo...">

/* CSS background */
.icon {
  background-image: url(data:image/svg+xml;base64,PHN2Zy4uLg==);
}

Code Examples

JavaScript

// Encode string to Base64 (UTF-8)
function encodeToBase64(str) {
  const bytes = new TextEncoder().encode(str);
  let binary = '';
  bytes.forEach(b => binary += String.fromCharCode(b));
  return btoa(binary);
}

// Decode Base64 to string (UTF-8)
function decodeFromBase64(b64) {
  const binary = atob(b64);
  const bytes = new Uint8Array(binary.length);
  for (let i = 0; i < binary.length; i++) {
    bytes[i] = binary.charCodeAt(i);
  }
  return new TextDecoder('utf-8').decode(bytes);
}

console.log(encodeToBase64('Hello, World!')); // SGVsbG8sIFdvcmxkIQ==
console.log(decodeFromBase64('SGVsbG8sIFdvcmxkIQ==')); // Hello, World!

Python

import base64

# Encode
text = "Hello, World!"
encoded = base64.b64encode(text.encode('utf-8')).decode('ascii')
print(encoded)  # SGVsbG8sIFdvcmxkIQ==

# Decode
decoded = base64.b64decode(encoded).decode('utf-8')
print(decoded)  # Hello, World!

# URL-safe Base64
url_safe = base64.urlsafe_b64encode(text.encode()).decode()
print(url_safe)  # SGVsbG8sIFdvcmxkIQ==

PHP

<?php
// Encode
$text = "Hello, World!";
$encoded = base64_encode($text);
echo $encoded; // SGVsbG8sIFdvcmxkIQ==

// Decode
$decoded = base64_decode($encoded);
echo $decoded; // Hello, World!
?>

Java

import java.util.Base64;

// Encode
String text = "Hello, World!";
String encoded = Base64.getEncoder().encodeToString(text.getBytes("UTF-8"));
System.out.println(encoded); // SGVsbG8sIFdvcmxkIQ==

// Decode
byte[] decoded = Base64.getDecoder().decode(encoded);
String result = new String(decoded, "UTF-8");
System.out.println(result); // Hello, World!

Common Use Cases

1. Embedding Images in HTML

Instead of referencing external image files, you can embed images directly in HTML to reduce HTTP requests:

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==" alt="1x1 pixel">

2. Email Attachments (MIME)

Email protocols (SMTP) only support 7-bit ASCII characters. Binary attachments are encoded in Base64 using MIME (Multipurpose Internet Mail Extensions).

3. HTTP Basic Authentication

The HTTP Basic Auth scheme encodes credentials as Base64:

Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=
# Decoded: username:password

4. JSON Web Tokens (JWT)

JWTs use URL-safe Base64 encoding (without padding) for their header and payload sections:

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0.dozjgNryP4J3jVmNHl0w5N_XgL0n3I9PlFUP0THsR8U

Size Overhead

Base64 encoding increases the data size by approximately 33% (every 3 bytes become 4 characters). For large files, this overhead should be considered when deciding whether to use Base64 encoding.

Original SizeBase64 SizeOverhead
1 KB~1.37 KB~37%
100 KB~136 KB~36%
1 MB~1.37 MB~37%
10 MB~13.6 MB~36%