Security & Performance

Input Validation and Sanitization

18 min Lesson 9 of 35

Introduction to Input Validation and Sanitization

Input validation and sanitization are critical security practices that protect your web applications from malicious data and attacks. Every piece of data that enters your application from external sources—whether from users, APIs, or databases—should be validated and sanitized before being processed or stored.

Key Principle: Never trust user input. Always validate and sanitize data on the server side, even if you have client-side validation. Client-side validation can be bypassed by attackers.

Input validation ensures that data conforms to expected formats and rules, while sanitization removes or encodes potentially harmful characters. Together, they form a crucial defense layer against SQL injection, XSS attacks, and other security vulnerabilities.

Server-Side Validation: The First Line of Defense

Server-side validation is mandatory for security because client-side validation can be easily bypassed. Attackers can disable JavaScript, modify HTTP requests, or use automated tools to send malicious data directly to your server.

<?php
// Client-side validation alone is NOT enough
// This must ALWAYS be validated on the server
if ($_SERVER['REQUEST_METHOD'] === 'POST') {
    // Validate email
    $email = $_POST['email'] ?? '';
    if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
        die('Invalid email address');
    }
    
    // Validate age
    $age = $_POST['age'] ?? '';
    if (!is_numeric($age) || $age < 18 || $age > 120) {
        die('Invalid age');
    }
    
    // Validate username
    $username = $_POST['username'] ?? '';
    if (!preg_match('/^[a-zA-Z0-9_]{3,20}$/', $username)) {
        die('Invalid username format');
    }
}
?>

Key principles for server-side validation include checking data types, validating lengths, enforcing format requirements, and ensuring values fall within acceptable ranges. Never rely solely on client-side validation for security.

Whitelisting vs Blacklisting

When validating input, you can use two approaches: whitelisting (allowing only known good values) or blacklisting (blocking known bad values). Whitelisting is generally more secure because it's impossible to anticipate all possible attack vectors.

Best Practice: Always prefer whitelisting over blacklisting. Define what is allowed rather than trying to block what is forbidden.

<?php
// WHITELIST APPROACH (Recommended)
$allowed_colors = ['red', 'green', 'blue', 'yellow'];
$color = $_POST['color'] ?? '';

if (in_array($color, $allowed_colors, true)) {
    // Color is valid
    echo "Selected color: " . htmlspecialchars($color);
} else {
    // Invalid color
    echo "Invalid color selection";
}

// BLACKLIST APPROACH (Less secure)
$forbidden_words = ['script', 'alert', 'onerror'];
$input = $_POST['comment'] ?? '';

foreach ($forbidden_words as $word) {
    if (stripos($input, $word) !== false) {
        die('Forbidden content detected');
    }
}
// Problem: Attackers can find ways around blacklists
?>

Whitelisting works by explicitly defining acceptable values, formats, or patterns. For example, if you expect a country code, you can validate against an array of valid ISO country codes. For username formats, you can use regular expressions to allow only alphanumeric characters and underscores.

Warning: Blacklists are easily bypassed. Attackers can use encoding, obfuscation, or variations that aren't in your blacklist. For example, blocking "script" won't prevent "<ScRiPt>" or "scr\x69pt".

Data Type Validation

Ensuring that data matches the expected type is fundamental to input validation. PHP provides several functions for type validation and casting.

<?php
// Integer validation
$user_id = $_POST['user_id'] ?? '';
if (filter_var($user_id, FILTER_VALIDATE_INT) === false) {
    die('Invalid user ID');
}
$user_id = (int)$user_id; // Type cast

// Float validation
$price = $_POST['price'] ?? '';
if (filter_var($price, FILTER_VALIDATE_FLOAT) === false) {
    die('Invalid price');
}
$price = (float)$price;

// Boolean validation
$subscribe = $_POST['subscribe'] ?? '';
$subscribe = filter_var($subscribe, FILTER_VALIDATE_BOOLEAN, FILTER_NULL_ON_FAILURE);
if ($subscribe === null) {
    die('Invalid boolean value');
}

// Email validation
$email = $_POST['email'] ?? '';
if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
    die('Invalid email');
}

// URL validation
$website = $_POST['website'] ?? '';
if (!filter_var($website, FILTER_VALIDATE_URL)) {
    die('Invalid URL');
}

// IP address validation
$ip = $_POST['ip'] ?? '';
if (!filter_var($ip, FILTER_VALIDATE_IP)) {
    die('Invalid IP address');
}
?>

Always validate that numeric fields actually contain numbers before using them in calculations or database queries. Use strict comparison operators when checking validation results to avoid type juggling issues.

Sanitizing User Input

Sanitization removes or encodes potentially dangerous characters from user input. While validation rejects bad input, sanitization cleans input to make it safe for use in different contexts.

<?php
// HTML output sanitization
$user_comment = $_POST['comment'] ?? '';
$safe_comment = htmlspecialchars($user_comment, ENT_QUOTES, 'UTF-8');
echo "<p>" . $safe_comment . "</p>";

// Strip HTML tags completely
$name = $_POST['name'] ?? '';
$clean_name = strip_tags($name);

// Remove excess whitespace
$text = $_POST['text'] ?? '';
$text = trim($text);
$text = preg_replace('/\s+/', ' ', $text);

// Filter for database (use prepared statements instead)
$search = $_POST['search'] ?? '';
$search = filter_var($search, FILTER_SANITIZE_STRING);

// URL encoding for use in URLs
$param = $_POST['param'] ?? '';
$safe_param = urlencode($param);
$url = "https://example.com/search?q=" . $safe_param;

// Email sanitization
$email = $_POST['email'] ?? '';
$email = filter_var($email, FILTER_SANITIZE_EMAIL);
if (filter_var($email, FILTER_VALIDATE_EMAIL)) {
    // Email is valid after sanitization
}
?>

Context Matters: The sanitization method depends on where you'll use the data. HTML output requires htmlspecialchars(), URLs need urlencode(), and SQL requires prepared statements.

File Upload Validation

File uploads are particularly dangerous and require extensive validation. Never trust the file extension or MIME type provided by the client—both can be easily spoofed.

<?php
if ($_SERVER['REQUEST_METHOD'] === 'POST' && isset($_FILES['upload'])) {
    $file = $_FILES['upload'];
    
    // Check for upload errors
    if ($file['error'] !== UPLOAD_ERR_OK) {
        die('Upload failed with error code: ' . $file['error']);
    }
    
    // Validate file size (5MB max)
    $max_size = 5 * 1024 * 1024;
    if ($file['size'] > $max_size) {
        die('File too large');
    }
    
    // Validate MIME type
    $allowed_types = ['image/jpeg', 'image/png', 'image/gif'];
    $finfo = finfo_open(FILEINFO_MIME_TYPE);
    $mime = finfo_file($finfo, $file['tmp_name']);
    finfo_close($finfo);
    
    if (!in_array($mime, $allowed_types, true)) {
        die('Invalid file type');
    }
    
    // Validate file extension
    $allowed_ext = ['jpg', 'jpeg', 'png', 'gif'];
    $ext = strtolower(pathinfo($file['name'], PATHINFO_EXTENSION));
    
    if (!in_array($ext, $allowed_ext, true)) {
        die('Invalid file extension');
    }
    
    // Generate safe filename
    $new_filename = bin2hex(random_bytes(16)) . '.' . $ext;
    $upload_path = 'uploads/' . $new_filename;
    
    // Move uploaded file
    if (!move_uploaded_file($file['tmp_name'], $upload_path)) {
        die('Failed to move uploaded file');
    }
    
    echo "File uploaded successfully: " . htmlspecialchars($new_filename);
}
?>

Security Critical: Never use the original filename provided by the user. Generate a new random filename to prevent directory traversal attacks and file overwriting.

Additional file upload security measures include storing uploaded files outside the web root, implementing virus scanning, using proper file permissions (644 for files, 755 for directories), and logging all upload attempts for security monitoring.

Regular Expressions for Validation

Regular expressions (regex) provide powerful pattern matching for complex validation scenarios. They're essential for validating formats like phone numbers, postal codes, and custom data structures.

<?php
// Username validation (alphanumeric and underscore, 3-20 chars)
$username = $_POST['username'] ?? '';
if (!preg_match('/^[a-zA-Z0-9_]{3,20}$/', $username)) {
    die('Invalid username format');
}

// Password strength validation
$password = $_POST['password'] ?? '';
// At least 8 chars, 1 uppercase, 1 lowercase, 1 number, 1 special char
if (!preg_match('/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/', $password)) {
    die('Password too weak');
}

// Phone number validation (US format)
$phone = $_POST['phone'] ?? '';
if (!preg_match('/^\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}$/', $phone)) {
    die('Invalid phone number');
}

// Credit card validation (basic format check)
$card = $_POST['card'] ?? '';
$card = preg_replace('/[\s-]/', '', $card); // Remove spaces and dashes
if (!preg_match('/^\d{13,19}$/', $card)) {
    die('Invalid card number format');
}

// Date validation (YYYY-MM-DD)
$date = $_POST['date'] ?? '';
if (!preg_match('/^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/', $date)) {
    die('Invalid date format');
}

// Hex color code validation
$color = $_POST['color'] ?? '';
if (!preg_match('/^#[0-9a-fA-F]{6}$/', $color)) {
    die('Invalid color code');
}

// URL slug validation
$slug = $_POST['slug'] ?? '';
if (!preg_match('/^[a-z0-9]+(?:-[a-z0-9]+)*$/', $slug)) {
    die('Invalid URL slug');
}
?>

Performance Tip: Compile and cache frequently used regex patterns. Use non-capturing groups (?:) when you don't need to extract matched substrings.

Validation Libraries and Frameworks

Modern PHP frameworks and libraries provide robust validation systems that simplify input validation while maintaining security. These tools offer pre-built validators, error handling, and customizable validation rules.

<?php
// Laravel Validation Example
use Illuminate\Support\Facades\Validator;

$validator = Validator::make($_POST, [
    'name' => 'required|string|max:255',
    'email' => 'required|email|unique:users,email',
    'age' => 'required|integer|min:18|max:120',
    'password' => 'required|string|min:8|confirmed',
    'phone' => 'nullable|regex:/^[0-9]{10}$/',
    'website' => 'nullable|url',
    'avatar' => 'nullable|image|max:2048',
]);

if ($validator->fails()) {
    return response()->json($validator->errors(), 422);
}

$validated = $validator->validated();

// Symfony Validation Example
use Symfony\Component\Validator\Validation;
use Symfony\Component\Validator\Constraints as Assert;

$validator = Validation::createValidator();
$violations = $validator->validate($email, [
    new Assert\NotBlank(),
    new Assert\Email(['mode' => 'strict']),
]);

if (count($violations) > 0) {
    foreach ($violations as $violation) {
        echo $violation->getMessage();
    }
}
?>

Framework Benefits: Validation frameworks provide consistent error messages, reusable validation rules, database-aware validation (unique checks), and internationalization support.

Custom Validation Functions

Creating reusable validation functions helps maintain consistency across your application and makes code more maintainable.

<?php
class InputValidator {
    public static function validateUsername($username) {
        if (strlen($username) < 3 || strlen($username) > 20) {
            return false;
        }
        return preg_match('/^[a-zA-Z0-9_]+$/', $username) === 1;
    }
    
    public static function validatePassword($password) {
        return strlen($password) >= 8 &&
               preg_match('/[A-Z]/', $password) &&
               preg_match('/[a-z]/', $password) &&
               preg_match('/[0-9]/', $password) &&
               preg_match('/[@$!%*?&]/', $password);
    }
    
    public static function sanitizeHTML($input) {
        return htmlspecialchars($input, ENT_QUOTES | ENT_HTML5, 'UTF-8');
    }
    
    public static function validateDate($date, $format = 'Y-m-d') {
        $d = DateTime::createFromFormat($format, $date);
        return $d && $d->format($format) === $date;
    }
    
    public static function validateCreditCard($number) {
        // Luhn algorithm
        $number = preg_replace('/[^0-9]/', '', $number);
        $sum = 0;
        $length = strlen($number);
        for ($i = $length - 1; $i >= 0; $i--) {
            $digit = (int)$number[$i];
            if (($length - $i) % 2 === 0) {
                $digit *= 2;
                if ($digit > 9) $digit -= 9;
            }
            $sum += $digit;
        }
        return $sum % 10 === 0;
    }
}

// Usage
$username = $_POST['username'] ?? '';
if (!InputValidator::validateUsername($username)) {
    die('Invalid username');
}
?>

Exercise: Create a validation class for a user registration form that validates username, email, password, age, and phone number. Include both validation and sanitization methods. Test with various valid and invalid inputs.

Best Practices Summary

Effective input validation and sanitization requires a layered approach. Always validate on the server side, never trust client-side validation alone. Use whitelisting over blacklisting whenever possible. Validate data types explicitly before using values in operations or queries.

Sanitize output based on context—use htmlspecialchars() for HTML output, prepared statements for SQL, and appropriate encoding for URLs and JavaScript. For file uploads, validate MIME types using server-side detection, check file sizes, and generate new filenames to prevent attacks.

Use validation frameworks when available to leverage battle-tested code and consistent error handling. Create reusable validation functions for common patterns in your application. Log validation failures for security monitoring and debugging.

Security Layers: Input validation is one layer of defense. Combine it with prepared statements, output encoding, CSRF tokens, and rate limiting for comprehensive security.

Remember that validation rules should be as strict as possible while still allowing legitimate use cases. Regularly review and update validation logic as new attack vectors emerge. Document validation rules clearly so developers understand what inputs are acceptable and why certain restrictions exist.