System Design

URL Shortener

The mechanism behind shortening long URLs into compact, shareable links.

URL Shortening Process

URL shortening is a technique for mapping a long URL to a shorter one that redirects to the original URL. This process typically involves:

  1. Input: User submits a long URL to be shortened
  2. Generation: System generates a unique short code
  3. Storage: The system stores a mapping between the short code and original URL
  4. Redirection: When the short URL is visited, the system redirects to the original URL
URL Shortening Process Diagram

The URL shortening process flow

HTTP Redirect Status Codes

When a user visits a shortened URL, the server responds with an HTTP redirect. There are two main options:

301 Redirect (Moved Permanently)

A 301 redirect indicates that the resource has been permanently moved to a new location. Browsers cache this redirect, meaning subsequent requests for the same short URL might skip the redirector server entirely.

Pros:

  • Better for SEO as search engines update their indexes
  • Faster subsequent visits due to browser caching
  • Reduces load on the redirector server over time

Cons:

  • Cannot track click metrics effectively once cached
  • Difficult to update the destination URL if needed

302 Redirect (Found / Temporary Redirect)

A 302 redirect indicates that the resource is temporarily available at a different location. Browsers do not cache this redirect, ensuring requests always go through the redirector server.

Pros:

  • Allows for accurate click tracking and analytics
  • Destination URL can be easily changed
  • Enables A/B testing or conditional redirects

Cons:

  • Slightly slower user experience on repeated visits
  • Increased load on the redirector server
  • Less optimal for SEO in some cases

Most URL shortening services use 302 redirects because they value analytics and the ability to update destination URLs over the marginal SEO benefits of 301 redirects.

Hash-Based URL Shortening

One approach to generating codes is to use a hash function like MD5 or SHA-256 directly on the original URL.

require 'digest'

def generate_hash_code(url)
  return Digest::MD5.hexdigest(url)

  # return Digest::SHA1.hexdigest(url)    # 40 characters
  # return Digest::SHA256.hexdigest(url)  # 64 characters
end

original_url = "https://example.com/very/long/path/with/parameters?id=123§ion=products"

hash_code = generate_hash_code(original_url)
# => "b5c7deff9fbd8c3180798220aec304dc"  # 32 characters long

Using the full hash output has several important characteristics:

  • No collisions: With hash functions like MD5 (128 bits) or SHA-256 (256 bits), the probability of collision is extremely low, making it safe for most practical applications
  • Not actually "short": The full hash (32 characters for MD5, 40 for SHA-1, 64 for SHA-256) isn't particularly short, defeating the main purpose of URL shortening
  • Predictability: The same URL will always generate the same hash, which makes caching possible but removes the ability to have multiple short links to the same destination
  • Non-sequential: IDs are not in any meaningful order, making database optimization more difficult
  • Not user-friendly: Hexadecimal hash codes are not readable or memorable, and their length makes them difficult to share verbally

Because of these limitations, most URL shorteners either use truncated hashes (with collision detection) or numeric ID-based approaches with encoding, like Base62, to create genuinely short and more user-friendly URLs.

Hash-Based URL Shortening Demo

Enter a URL below to see how different hash functions would generate short codes.

Base62 Encoding

A more user-friendly approach is to use Base62 encoding, which uses alphanumeric characters (a-z, A-Z, 0-9) to represent numeric values more compactly.

Why Base62, not Base64?

Base64 includes additional characters like '+' and '/', which can cause issues in URLs because they have special meanings. Base62 avoids these problematic characters entirely, making it ideal for URL shortening.

Base62 Character Set (62 chars)

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

Base64 Character Set (64 chars)

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+/

Base62 Encoding Algorithm

Base62 encoding typically works by converting a numeric ID to a string of alphanumeric characters:

def encode_base62(num)
  chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
  base = chars.length
  result = ""
  
  # Convert number to base62
  while num > 0
    result = chars[num % base] + result
    num /= base
  end
  
  return result.empty? ? "0" : result
end

# Database ID: 10,000,000
short_code = encode_base62(10_000_000)
# => "FXsk"

With Base62 encoding, we can represent large numbers very compactly:

ID (Decimal) Base62 Representation Length
1,000 G8 2 chars
1,000,000 4c92 4 chars
1,000,000,000 15FTGg 6 chars
62^6 (≈ 56.8 billion) zzzzzz 6 chars
62^7 (≈ 3.5 trillion) zzzzzzz 7 chars

Base62 Encoding Demo

Enter a URL below to see how it would be converted to a Base62 shortened URL. This demo uses the first 16 characters (64 bits) of the MD5 hash to generate a more unique short URL.

Advantages of Base62 Encoding

  • URL-safe characters only
  • More compact representation than hexadecimal
  • Can easily convert between IDs and codes
  • Human-readable and easier to memorize than purely random strings
  • Supports sequential IDs, which are more efficient for database storage

Comparison of the two approaches

The table below shows the differences between the two main approaches to URL shortening:

Hash + collision resolution Base 62 conversion
Fixed short URL length. Short URL length is not fixed. It goes up with the ID.
Does not need a unique ID generator. This option depends on a unique ID generator.
Collision is possible and needs to be resolved. Collision is not possible because ID is unique.
It's not possible to figure out the next available short URL because it doesn't depend on ID. It is easy to figure out what is the next available short URL if ID increments by 1 for a new entry. This can be a security concern.

Key Takeaways

  • Two Main Approaches: URL shortening typically employs either hash-based methods or sequence-based encoding with Base62, each with distinct advantages.
  • Hash-Based Method: Provides fixed-length URLs but requires collision handling when different URLs produce the same hash.
  • Base62 Encoding: Avoids collisions by using unique IDs but produces URLs of varying lengths that increase with the underlying ID.
  • Redirection Strategy: Most services use 302 (temporary) redirects to enable analytics and flexible destination URL updating, despite the SEO advantages of 301 redirects.
  • Security Considerations: Sequential IDs can make short URLs predictable, potentially exposing user data, while hash-based approaches offer more privacy protection.