Dart Advanced Features

Regular Expressions in Dart

45 min Lesson 11 of 16

Introduction to Regular Expressions

Regular expressions (regex) are powerful patterns used to match, search, and manipulate text. In Dart, the RegExp class provides full regex support based on the ECMAScript (JavaScript) regex specification. Whether you need to validate input, extract data, or transform strings, regex is an essential tool.

Creating a RegExp

void main() {
  // Method 1: RegExp constructor
  final digitPattern = RegExp(r'\d+');

  // Method 2: With flags
  final caseInsensitive = RegExp(r'hello', caseSensitive: false);
  final multiLine = RegExp(r'^\w+', multiLine: true);
  final dotAll = RegExp(r'start.*end', dotAll: true);  // . matches newlines

  // Method 3: Unicode mode
  final unicode = RegExp(r'\p{L}+', unicode: true);  // Unicode letter categories
}
Note: Always use raw strings (r'...') for regex patterns. Without the r prefix, Dart interprets backslashes as escape characters, so '\d' would need to be written as '\\d'. Raw strings pass the backslash through literally.

Basic Pattern Syntax

Here is a quick reference of the most commonly used regex patterns in Dart:

Common Pattern Elements

void main() {
  // Character classes
  RegExp(r'\d');      // Any digit (0-9)
  RegExp(r'\D');      // Any non-digit
  RegExp(r'\w');      // Word character (a-z, A-Z, 0-9, _)
  RegExp(r'\W');      // Non-word character
  RegExp(r'\s');      // Whitespace (space, tab, newline)
  RegExp(r'\S');      // Non-whitespace
  RegExp(r'[aeiou]'); // Any vowel
  RegExp(r'[^aeiou]');// Any non-vowel
  RegExp(r'[a-zA-Z]');// Any letter

  // Quantifiers
  RegExp(r'a*');      // Zero or more 'a'
  RegExp(r'a+');      // One or more 'a'
  RegExp(r'a?');      // Zero or one 'a'
  RegExp(r'a{3}');    // Exactly 3 'a's
  RegExp(r'a{2,5}');  // Between 2 and 5 'a's
  RegExp(r'a{2,}');   // 2 or more 'a's

  // Anchors
  RegExp(r'^start');  // Matches at the beginning
  RegExp(r'end$');    // Matches at the end
  RegExp(r'\bhello\b'); // Word boundary

  // Alternation
  RegExp(r'cat|dog'); // Matches 'cat' or 'dog'
}

Matching: hasMatch, firstMatch, allMatches

Dart’s RegExp provides three primary matching methods. Each serves a different purpose depending on whether you need a boolean check, the first match, or all matches.

The Three Matching Methods

void main() {
  final pattern = RegExp(r'\d+');
  final text = 'Order 42 has 3 items worth 150 dollars';

  // hasMatch — returns true/false
  print(pattern.hasMatch(text));  // true
  print(pattern.hasMatch('no numbers here'));  // false

  // firstMatch — returns the first RegExpMatch or null
  final first = pattern.firstMatch(text);
  if (first != null) {
    print('Match: ${first.group(0)}');  // Match: 42
    print('Start: ${first.start}');      // Start: 6
    print('End: ${first.end}');          // End: 8
  }

  // allMatches — returns all matches as an Iterable
  final all = pattern.allMatches(text);
  print('Found ${all.length} matches:');
  for (final match in all) {
    print('  "${match.group(0)}" at position ${match.start}');
  }
  // Found 3 matches:
  //   "42" at position 6
  //   "3" at position 13
  //   "150" at position 26
}

Capture Groups

Capture groups allow you to extract specific parts of a match. Enclose the part you want to capture in parentheses (...).

Using Capture Groups

void main() {
  // Pattern with capture groups for date parsing
  final datePattern = RegExp(r'(\d{4})-(\d{2})-(\d{2})');
  final text = 'Event date: 2024-03-15';

  final match = datePattern.firstMatch(text);
  if (match != null) {
    print('Full match: ${match.group(0)}');  // 2024-03-15
    print('Year: ${match.group(1)}');          // 2024
    print('Month: ${match.group(2)}');         // 03
    print('Day: ${match.group(3)}');           // 15
  }

  // Non-capturing group (?:...) — groups without capturing
  final timePattern = RegExp(r'(\d{1,2}):(\d{2})(?:\s)?(AM|PM)?', caseSensitive: false);
  final time = 'Meeting at 2:30 PM';

  final timeMatch = timePattern.firstMatch(time);
  if (timeMatch != null) {
    print('Hour: ${timeMatch.group(1)}');    // 2
    print('Minute: ${timeMatch.group(2)}');  // 30
    print('Period: ${timeMatch.group(3)}');  // PM
  }
}

Named Capture Groups

Named groups make your regex more readable by giving meaningful names to captured parts, using the syntax (?<name>...).

Named Groups for Clarity

void main() {
  final urlPattern = RegExp(
    r'(?<protocol>https?)://(?<host>[\w.-]+)(?::(?<port>\d+))?(?<path>/[\w/.-]*)?'
  );

  final url = 'https://api.example.com:8080/v2/users';
  final match = urlPattern.firstMatch(url);

  if (match != null) {
    print('Protocol: ${match.namedGroup('protocol')}');  // https
    print('Host: ${match.namedGroup('host')}');          // api.example.com
    print('Port: ${match.namedGroup('port')}');          // 8080
    print('Path: ${match.namedGroup('path')}');          // /v2/users
  }

  // Parse multiple URLs
  final text = '''
Visit http://example.com or https://secure.site.org:443/login
  ''';

  for (final m in urlPattern.allMatches(text)) {
    print('${m.namedGroup('protocol')}://${m.namedGroup('host')}');
  }
  // http://example.com
  // https://secure.site.org
}
Tip: Prefer named groups over numbered groups for complex patterns. match.namedGroup('host') is far more readable than match.group(2), and the pattern remains correct even if you add or remove groups earlier in the expression.

Lookahead and Lookbehind

Lookahead and lookbehind assertions match a position without consuming characters. They check whether a pattern exists ahead or behind without including it in the match result.

Lookahead and Lookbehind

void main() {
  // Positive lookahead (?=...) — match only if followed by pattern
  final beforeDollar = RegExp(r'\d+(?=\s*dollars)');
  print(beforeDollar.firstMatch('Price is 50 dollars')?.group(0));  // 50
  print(beforeDollar.firstMatch('Count is 50 items')?.group(0));    // null

  // Negative lookahead (?!...) — match only if NOT followed by pattern
  final notDollar = RegExp(r'\d+(?!\s*dollars)');
  final matches = notDollar.allMatches('50 dollars and 30 items');
  for (final m in matches) {
    print(m.group(0));  // 30 (50 is followed by dollars, so excluded)
  }

  // Positive lookbehind (?<=...) — match only if preceded by pattern
  final afterDollar = RegExp(r'(?<=\$)\d+');
  print(afterDollar.firstMatch('Price: \$150')?.group(0));  // 150

  // Negative lookbehind (?<!...) — match only if NOT preceded by pattern
  final notAfterHash = RegExp(r'(?<!#)\b\w+\b');
  final tags = 'hello #world foo #bar';
  for (final m in notAfterHash.allMatches(tags)) {
    print(m.group(0));  // hello, foo (words not preceded by #)
  }
}
Warning: Lookbehind assertions in Dart (via the JavaScript regex engine) require fixed-length patterns in some environments. Variable-length lookbehinds like (?<=a+) may not work in all Dart platforms. Use fixed-length lookbehinds when possible, such as (?<=abc) or (?<=\d{3}).

replaceAll and replaceAllMapped

String replacement with regex is one of the most practical uses. replaceAll does simple replacement, while replaceAllMapped gives you access to each match for dynamic replacement.

Simple and Mapped Replacement

void main() {
  // Simple replaceAll with regex
  final text = 'Hello   World   Dart';
  final cleaned = text.replaceAll(RegExp(r'\s+'), ' ');
  print(cleaned);  // Hello World Dart

  // replaceAllMapped — dynamic replacement using match data
  final template = 'I have 3 cats and 12 dogs';
  final doubled = template.replaceAllMapped(
    RegExp(r'\d+'),
    (match) => '${int.parse(match.group(0)!) * 2}',
  );
  print(doubled);  // I have 6 cats and 24 dogs

  // Format phone numbers
  final phone = '5551234567';
  final formatted = phone.replaceAllMapped(
    RegExp(r'(\d{3})(\d{3})(\d{4})'),
    (m) => '(${m.group(1)}) ${m.group(2)}-${m.group(3)}',
  );
  print(formatted);  // (555) 123-4567

  // Convert camelCase to snake_case
  final camel = 'myVariableName';
  final snake = camel.replaceAllMapped(
    RegExp(r'[A-Z]'),
    (m) => '_${m.group(0)!.toLowerCase()}',
  );
  print(snake);  // my_variable_name
}

Splitting with Regex

The String.split() method accepts a Pattern, which includes RegExp. This allows complex splitting logic beyond simple delimiter strings.

Advanced String Splitting

void main() {
  // Split on any whitespace (handles multiple spaces, tabs, etc.)
  final messy = 'Hello\t\tWorld   Dart\nFlutter';
  final words = messy.split(RegExp(r'\s+'));
  print(words);  // [Hello, World, Dart, Flutter]

  // Split on punctuation
  final sentence = 'Hello,World;Dart.Flutter!Rocks';
  final parts = sentence.split(RegExp(r'[,;.!]'));
  print(parts);  // [Hello, World, Dart, Flutter, Rocks]

  // Split CSV respecting quoted fields (simplified)
  final csv = 'name,age,"city, state",score';
  final fields = RegExp(r'(?:^|,)("(?:[^"]*)"|\w[^,]*)')
      .allMatches(csv)
      .map((m) => m.group(1)!.replaceAll('"', ''))
      .toList();
  print(fields);  // [name, age, city, state, score]

  // Split and keep delimiters
  final expr = '3+5*2-1';
  final tokens = expr.split(RegExp(r'(?=[+*-])|(?<=[+*-])'));
  print(tokens);  // [3, +, 5, *, 2, -, 1]
}

Practical Example: Email Validation

While perfect email validation requires RFC 5322 compliance, a practical regex covers the vast majority of real-world email addresses.

Email Validator

class EmailValidator {
  // A practical email regex (covers 99%+ of real emails)
  static final _pattern = RegExp(
    r'^[a-zA-Z0-9.!#$%&\x27*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$',
  );

  static bool isValid(String email) => _pattern.hasMatch(email);

  static String? validate(String email) {
    if (email.isEmpty) return 'Email is required';
    if (!isValid(email)) return 'Invalid email format';
    if (email.length > 254) return 'Email too long';
    return null;  // Valid
  }
}

void main() {
  final testEmails = [
    'user@example.com',
    'first.last@company.co.uk',
    'invalid@',
    '@no-local.com',
    'spaces in@email.com',
    'valid+tag@gmail.com',
  ];

  for (final email in testEmails) {
    final error = EmailValidator.validate(email);
    print('$email: ${error ?? "VALID"}');
  }
}

Practical Example: Template Engine

Regex is perfect for building a simple template engine that replaces placeholders with values.

Simple Template Engine

/// A simple template engine using regex for placeholder replacement.
class TemplateEngine {
  // Matches {{variableName}} or {{ variableName }}
  static final _placeholder = RegExp(r'\{\{\s*(\w+)\s*\}\}');

  // Matches {{#if condition}}...{{/if}}
  static final _conditional = RegExp(
    r'\{\{#if\s+(\w+)\}\}(.*?)\{\{/if\}\}',
    dotAll: true,
  );

  static String render(String template, Map<String, dynamic> data) {
    // Process conditionals first
    var result = template.replaceAllMapped(_conditional, (match) {
      final condition = match.group(1)!;
      final content = match.group(2)!;
      final value = data[condition];
      if (value == true || (value != null && value != false && value != '')) {
        return content;
      }
      return '';
    });

    // Then replace placeholders
    result = result.replaceAllMapped(_placeholder, (match) {
      final key = match.group(1)!;
      return data[key]?.toString() ?? '';
    });

    return result;
  }
}

void main() {
  final template = '''
Hello {{ name }}!
{{#if isPremium}}Welcome back, premium member!{{/if}}
Your balance is \${{ balance }}.
  ''';

  print(TemplateEngine.render(template, {
    'name': 'Edrees',
    'isPremium': true,
    'balance': '250.00',
  }));
  // Hello Edrees!
  // Welcome back, premium member!
  // Your balance is $250.00.
}

Practical Example: Log Parser

Parsing structured log files is a classic use case for regex with named capture groups.

Log File Parser

class LogEntry {
  final DateTime timestamp;
  final String level;
  final String component;
  final String message;

  LogEntry(this.timestamp, this.level, this.component, this.message);

  @override
  String toString() => '[$level] $component: $message';
}

class LogParser {
  // Matches: [2024-03-15 14:30:00] ERROR (Auth): Login failed
  static final _pattern = RegExp(
    r'\[(?<date>\d{4}-\d{2}-\d{2})\s+(?<time>\d{2}:\d{2}:\d{2})\]\s+'
    r'(?<level>DEBUG|INFO|WARN|ERROR|FATAL)\s+'
    r'\((?<component>\w+)\):\s+(?<message>.+)',
  );

  static LogEntry? parse(String line) {
    final match = _pattern.firstMatch(line);
    if (match == null) return null;

    final dateStr = match.namedGroup('date')!;
    final timeStr = match.namedGroup('time')!;
    final timestamp = DateTime.parse('$dateStr $timeStr');

    return LogEntry(
      timestamp,
      match.namedGroup('level')!,
      match.namedGroup('component')!,
      match.namedGroup('message')!,
    );
  }

  static List<LogEntry> parseAll(String logText) {
    return logText
        .split('\n')
        .map(parse)
        .whereType<LogEntry>()
        .toList();
  }
}

void main() {
  final logText = '''
[2024-03-15 14:30:00] INFO (Auth): User logged in
[2024-03-15 14:30:05] ERROR (Database): Connection timeout
[2024-03-15 14:30:10] WARN (Cache): Cache miss for key user_123
  ''';

  final entries = LogParser.parseAll(logText);
  final errors = entries.where((e) => e.level == 'ERROR');
  print('Errors found: ${errors.length}');
  for (final error in errors) {
    print(error);  // [ERROR] Database: Connection timeout
  }
}
Tip: When building complex regex patterns in Dart, you can split them across multiple lines using string concatenation. Each segment can have a comment explaining its purpose, making the overall pattern much easier to maintain.

Performance Considerations

Regex can be powerful but also slow if misused. Here are key performance tips:

Regex Performance Best Practices

// BAD: Creating a new RegExp in a loop
for (var line in lines) {
  if (RegExp(r'\d+').hasMatch(line)) {  // Compiles regex every iteration!
    // ...
  }
}

// GOOD: Compile once, reuse
final digitPattern = RegExp(r'\d+');
for (var line in lines) {
  if (digitPattern.hasMatch(line)) {  // Reuses compiled regex
    // ...
  }
}

// BAD: Catastrophic backtracking
// final bad = RegExp(r'(a+)+b');  // Can hang on 'aaaaaaaaaaac'

// GOOD: Rewrite to avoid nested quantifiers
final good = RegExp(r'a+b');

// TIP: Use possessive quantifiers or atomic groups when available
// to prevent backtracking on patterns that should not retry
Warning: Catastrophic backtracking occurs when a regex has nested quantifiers like (a+)+ and the input does not match. The regex engine tries exponentially many combinations before giving up. Always avoid nested repetition patterns and test your regex with worst-case inputs.

Summary

Regular expressions are a versatile and powerful tool in Dart. Key takeaways:

  • Use raw strings (r'...') for regex patterns to avoid double-escaping.
  • hasMatch for boolean checks, firstMatch for the first result, allMatches for all results.
  • Capture groups (...) and named groups (?<name>...) extract specific parts of matches.
  • Lookahead (?=...) and lookbehind (?<=...) assert position without consuming characters.
  • replaceAllMapped enables dynamic, context-aware string replacement.
  • Compile regex once and reuse; avoid nested quantifiers to prevent catastrophic backtracking.