Input Validation and Sanitization
Introduction to Input Validation and Sanitization
Input validation and sanitization are critical security practices that protect your web applications from malicious data and attacks. Every piece of data that enters your application from external sources—whether from users, APIs, or databases—should be validated and sanitized before being processed or stored.
Input validation ensures that data conforms to expected formats and rules, while sanitization removes or encodes potentially harmful characters. Together, they form a crucial defense layer against SQL injection, XSS attacks, and other security vulnerabilities.
Server-Side Validation: The First Line of Defense
Server-side validation is mandatory for security because client-side validation can be easily bypassed. Attackers can disable JavaScript, modify HTTP requests, or use automated tools to send malicious data directly to your server.
// Client-side validation alone is NOT enough
// This must ALWAYS be validated on the server
if ($_SERVER['REQUEST_METHOD'] === 'POST') {
// Validate email
$email = $_POST['email'] ?? '';
if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
die('Invalid email address');
}
// Validate age
$age = $_POST['age'] ?? '';
if (!is_numeric($age) || $age < 18 || $age > 120) {
die('Invalid age');
}
// Validate username
$username = $_POST['username'] ?? '';
if (!preg_match('/^[a-zA-Z0-9_]{3,20}$/', $username)) {
die('Invalid username format');
}
}
?>
Key principles for server-side validation include checking data types, validating lengths, enforcing format requirements, and ensuring values fall within acceptable ranges. Never rely solely on client-side validation for security.
Whitelisting vs Blacklisting
When validating input, you can use two approaches: whitelisting (allowing only known good values) or blacklisting (blocking known bad values). Whitelisting is generally more secure because it's impossible to anticipate all possible attack vectors.
// WHITELIST APPROACH (Recommended)
$allowed_colors = ['red', 'green', 'blue', 'yellow'];
$color = $_POST['color'] ?? '';
if (in_array($color, $allowed_colors, true)) {
// Color is valid
echo "Selected color: " . htmlspecialchars($color);
} else {
// Invalid color
echo "Invalid color selection";
}
// BLACKLIST APPROACH (Less secure)
$forbidden_words = ['script', 'alert', 'onerror'];
$input = $_POST['comment'] ?? '';
foreach ($forbidden_words as $word) {
if (stripos($input, $word) !== false) {
die('Forbidden content detected');
}
}
// Problem: Attackers can find ways around blacklists
?>
Whitelisting works by explicitly defining acceptable values, formats, or patterns. For example, if you expect a country code, you can validate against an array of valid ISO country codes. For username formats, you can use regular expressions to allow only alphanumeric characters and underscores.
Data Type Validation
Ensuring that data matches the expected type is fundamental to input validation. PHP provides several functions for type validation and casting.
// Integer validation
$user_id = $_POST['user_id'] ?? '';
if (filter_var($user_id, FILTER_VALIDATE_INT) === false) {
die('Invalid user ID');
}
$user_id = (int)$user_id; // Type cast
// Float validation
$price = $_POST['price'] ?? '';
if (filter_var($price, FILTER_VALIDATE_FLOAT) === false) {
die('Invalid price');
}
$price = (float)$price;
// Boolean validation
$subscribe = $_POST['subscribe'] ?? '';
$subscribe = filter_var($subscribe, FILTER_VALIDATE_BOOLEAN, FILTER_NULL_ON_FAILURE);
if ($subscribe === null) {
die('Invalid boolean value');
}
// Email validation
$email = $_POST['email'] ?? '';
if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
die('Invalid email');
}
// URL validation
$website = $_POST['website'] ?? '';
if (!filter_var($website, FILTER_VALIDATE_URL)) {
die('Invalid URL');
}
// IP address validation
$ip = $_POST['ip'] ?? '';
if (!filter_var($ip, FILTER_VALIDATE_IP)) {
die('Invalid IP address');
}
?>
Always validate that numeric fields actually contain numbers before using them in calculations or database queries. Use strict comparison operators when checking validation results to avoid type juggling issues.
Sanitizing User Input
Sanitization removes or encodes potentially dangerous characters from user input. While validation rejects bad input, sanitization cleans input to make it safe for use in different contexts.
// HTML output sanitization
$user_comment = $_POST['comment'] ?? '';
$safe_comment = htmlspecialchars($user_comment, ENT_QUOTES, 'UTF-8');
echo "<p>" . $safe_comment . "</p>";
// Strip HTML tags completely
$name = $_POST['name'] ?? '';
$clean_name = strip_tags($name);
// Remove excess whitespace
$text = $_POST['text'] ?? '';
$text = trim($text);
$text = preg_replace('/\s+/', ' ', $text);
// Filter for database (use prepared statements instead)
$search = $_POST['search'] ?? '';
$search = filter_var($search, FILTER_SANITIZE_STRING);
// URL encoding for use in URLs
$param = $_POST['param'] ?? '';
$safe_param = urlencode($param);
$url = "https://example.com/search?q=" . $safe_param;
// Email sanitization
$email = $_POST['email'] ?? '';
$email = filter_var($email, FILTER_SANITIZE_EMAIL);
if (filter_var($email, FILTER_VALIDATE_EMAIL)) {
// Email is valid after sanitization
}
?>
File Upload Validation
File uploads are particularly dangerous and require extensive validation. Never trust the file extension or MIME type provided by the client—both can be easily spoofed.
if ($_SERVER['REQUEST_METHOD'] === 'POST' && isset($_FILES['upload'])) {
$file = $_FILES['upload'];
// Check for upload errors
if ($file['error'] !== UPLOAD_ERR_OK) {
die('Upload failed with error code: ' . $file['error']);
}
// Validate file size (5MB max)
$max_size = 5 * 1024 * 1024;
if ($file['size'] > $max_size) {
die('File too large');
}
// Validate MIME type
$allowed_types = ['image/jpeg', 'image/png', 'image/gif'];
$finfo = finfo_open(FILEINFO_MIME_TYPE);
$mime = finfo_file($finfo, $file['tmp_name']);
finfo_close($finfo);
if (!in_array($mime, $allowed_types, true)) {
die('Invalid file type');
}
// Validate file extension
$allowed_ext = ['jpg', 'jpeg', 'png', 'gif'];
$ext = strtolower(pathinfo($file['name'], PATHINFO_EXTENSION));
if (!in_array($ext, $allowed_ext, true)) {
die('Invalid file extension');
}
// Generate safe filename
$new_filename = bin2hex(random_bytes(16)) . '.' . $ext;
$upload_path = 'uploads/' . $new_filename;
// Move uploaded file
if (!move_uploaded_file($file['tmp_name'], $upload_path)) {
die('Failed to move uploaded file');
}
echo "File uploaded successfully: " . htmlspecialchars($new_filename);
}
?>
Additional file upload security measures include storing uploaded files outside the web root, implementing virus scanning, using proper file permissions (644 for files, 755 for directories), and logging all upload attempts for security monitoring.
Regular Expressions for Validation
Regular expressions (regex) provide powerful pattern matching for complex validation scenarios. They're essential for validating formats like phone numbers, postal codes, and custom data structures.
// Username validation (alphanumeric and underscore, 3-20 chars)
$username = $_POST['username'] ?? '';
if (!preg_match('/^[a-zA-Z0-9_]{3,20}$/', $username)) {
die('Invalid username format');
}
// Password strength validation
$password = $_POST['password'] ?? '';
// At least 8 chars, 1 uppercase, 1 lowercase, 1 number, 1 special char
if (!preg_match('/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/', $password)) {
die('Password too weak');
}
// Phone number validation (US format)
$phone = $_POST['phone'] ?? '';
if (!preg_match('/^\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}$/', $phone)) {
die('Invalid phone number');
}
// Credit card validation (basic format check)
$card = $_POST['card'] ?? '';
$card = preg_replace('/[\s-]/', '', $card); // Remove spaces and dashes
if (!preg_match('/^\d{13,19}$/', $card)) {
die('Invalid card number format');
}
// Date validation (YYYY-MM-DD)
$date = $_POST['date'] ?? '';
if (!preg_match('/^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/', $date)) {
die('Invalid date format');
}
// Hex color code validation
$color = $_POST['color'] ?? '';
if (!preg_match('/^#[0-9a-fA-F]{6}$/', $color)) {
die('Invalid color code');
}
// URL slug validation
$slug = $_POST['slug'] ?? '';
if (!preg_match('/^[a-z0-9]+(?:-[a-z0-9]+)*$/', $slug)) {
die('Invalid URL slug');
}
?>
Validation Libraries and Frameworks
Modern PHP frameworks and libraries provide robust validation systems that simplify input validation while maintaining security. These tools offer pre-built validators, error handling, and customizable validation rules.
// Laravel Validation Example
use Illuminate\Support\Facades\Validator;
$validator = Validator::make($_POST, [
'name' => 'required|string|max:255',
'email' => 'required|email|unique:users,email',
'age' => 'required|integer|min:18|max:120',
'password' => 'required|string|min:8|confirmed',
'phone' => 'nullable|regex:/^[0-9]{10}$/',
'website' => 'nullable|url',
'avatar' => 'nullable|image|max:2048',
]);
if ($validator->fails()) {
return response()->json($validator->errors(), 422);
}
$validated = $validator->validated();
// Symfony Validation Example
use Symfony\Component\Validator\Validation;
use Symfony\Component\Validator\Constraints as Assert;
$validator = Validation::createValidator();
$violations = $validator->validate($email, [
new Assert\NotBlank(),
new Assert\Email(['mode' => 'strict']),
]);
if (count($violations) > 0) {
foreach ($violations as $violation) {
echo $violation->getMessage();
}
}
?>
Custom Validation Functions
Creating reusable validation functions helps maintain consistency across your application and makes code more maintainable.
class InputValidator {
public static function validateUsername($username) {
if (strlen($username) < 3 || strlen($username) > 20) {
return false;
}
return preg_match('/^[a-zA-Z0-9_]+$/', $username) === 1;
}
public static function validatePassword($password) {
return strlen($password) >= 8 &&
preg_match('/[A-Z]/', $password) &&
preg_match('/[a-z]/', $password) &&
preg_match('/[0-9]/', $password) &&
preg_match('/[@$!%*?&]/', $password);
}
public static function sanitizeHTML($input) {
return htmlspecialchars($input, ENT_QUOTES | ENT_HTML5, 'UTF-8');
}
public static function validateDate($date, $format = 'Y-m-d') {
$d = DateTime::createFromFormat($format, $date);
return $d && $d->format($format) === $date;
}
public static function validateCreditCard($number) {
// Luhn algorithm
$number = preg_replace('/[^0-9]/', '', $number);
$sum = 0;
$length = strlen($number);
for ($i = $length - 1; $i >= 0; $i--) {
$digit = (int)$number[$i];
if (($length - $i) % 2 === 0) {
$digit *= 2;
if ($digit > 9) $digit -= 9;
}
$sum += $digit;
}
return $sum % 10 === 0;
}
}
// Usage
$username = $_POST['username'] ?? '';
if (!InputValidator::validateUsername($username)) {
die('Invalid username');
}
?>
Best Practices Summary
Effective input validation and sanitization requires a layered approach. Always validate on the server side, never trust client-side validation alone. Use whitelisting over blacklisting whenever possible. Validate data types explicitly before using values in operations or queries.
Sanitize output based on context—use htmlspecialchars() for HTML output, prepared statements for SQL, and appropriate encoding for URLs and JavaScript. For file uploads, validate MIME types using server-side detection, check file sizes, and generate new filenames to prevent attacks.
Use validation frameworks when available to leverage battle-tested code and consistent error handling. Create reusable validation functions for common patterns in your application. Log validation failures for security monitoring and debugging.
Remember that validation rules should be as strict as possible while still allowing legitimate use cases. Regularly review and update validation logic as new attack vectors emerge. Document validation rules clearly so developers understand what inputs are acceptable and why certain restrictions exist.