Java Regular Expressions

Java Regular Expressions

Introduction

Regular expressions (regex) are a powerful tool for pattern matching and text processing. In Java, regular expressions are supported through the java.util.regex package, which provides classes like Pattern and Matcher to work with regex patterns. In this blog post, we’ll explore the basics of regular expressions in Java, including syntax, common patterns, and practical examples.

What are Regular Expressions?

A regular expression is a sequence of characters that define a search pattern. It can be used to match strings based on certain patterns, such as specific characters, words, or sequences of characters. Regular expressions are widely used in text processing, data validation, and search operations to identify and extract information from text.

Basic Syntax

In Java, regular expressions are represented as strings, where special characters and metacharacters are used to define patterns. Here are some common metacharacters and their meanings:

  • .: Matches any single character.
  • ^: Matches the beginning of a line.
  • $: Matches the end of a line.
  • []: Matches any single character within the brackets. For example, [abc] matches either ‘a’, ‘b’, or ‘c’.
  • [^]: Matches any single character not within the brackets. For example, [^abc] matches any character except ‘a’, ‘b’, or ‘c’.
  • |: Matches either the pattern on the left or the pattern on the right.
  • \: Escapes a metacharacter or introduces a special sequence.

Common special sequences include:

  • \d: Matches any digit (equivalent to [0-9]).
  • \D: Matches any non-digit character (equivalent to [^0-9]).
  • \s: Matches any whitespace character.
  • \S: Matches any non-whitespace character.
  • \w: Matches any word character (equivalent to [a-zA-Z0-9_]).
  • \W: Matches any non-word character.

To specify the number of occurrences of a character or pattern, you can use quantifiers:

  • *: Matches zero or more occurrences of the preceding character.
  • +: Matches one or more occurrences of the preceding character.
  • ?: Matches zero or one occurrence of the preceding character.
  • {n}: Matches exactly n occurrences of the preceding character.
  • {n,}: Matches n or more occurrences of the preceding character.
  • {n,m}: Matches between n and m occurrences of the preceding character.

Common Patterns

Here are some common regex patterns that you might encounter in Java:

  • Email addresses: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
  • URLs: ^(https?|ftp)://[^\s/$.?#].[^\s]*$
  • Dates (YYYY-MM-DD): ^\d{4}-\d{2}-\d{2}$
  • Phone numbers (XXX) XXX-XXXX : ^\(\d{3}\) \d{3}-\d{4}$

You can use online regex testers like regex101 to test and debug your regular expressions.

Pattern Matching

In Java, you can use the java.util.regex.Pattern and Matcher classes to work with regular expressions.

Pattern methods

  • Pattern.compile(String regex) static method is used to compile the given regular expression into a pattern. The pattern is then used to create a matcher object that can match the pattern against a given text. The matches() method of the Matcher class is used to check if the text matches the pattern.
  • Pattern.matches(CharSequence input) method - Creates a matcher that will match the given input against this pattern.
  • Pattern.matches(String regex, CharSequence input) method - Compiles the given regular expression and matches it against the given input. returns true if the input matches the pattern, false otherwise.

Here’s an example of how to compile a regex pattern and match it against a string:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class RegexExample {
public static void main(String[] args) {
String text = "(600) 123-4567";
String regex = "^\\(\\d{3}\\) \\d{3}-\\d{4}$";

Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);

if (matcher.matches()) {
System.out.println("Phone number is valid.");
} else {
System.out.println("Phone number is invalid.");
}
}
}

Grouping and Capturing

You can use parentheses () to group patterns together and capture parts of the matched text. Here’s an example of how to extract the area code and phone number from a phone number string:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class RegexExample {
public static void main(String[] args) {
String text = "(600) 123-4567";
String regex = "^\\((\\d{3})\\) (\\d{3}-\\d{4})$";

Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);

if (matcher.matches()) {
String areaCode = matcher.group(1);
String phoneNumber = matcher.group(2);

System.out.println("Group count: " + matcher.groupCount()); //2
System.out.println("Group start index: " + matcher.start()); // 0
System.out.println("Group end index: " + matcher.end()); // 14
System.out.println("Full match: " + matcher.group(0)); // (600) 123-4567
System.out.println("Area code: " + areaCode); // 600
System.out.println("Phone number: " + phoneNumber); // 123-4567

// to simply print each group
for (int i = 0; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
} else {
System.out.println("Phone number is invalid.");
}
}
}

Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group(). The Matcher class also provides methods like start, end, and groupCount to work with groups.

In this example, we use parentheses to group the area code and phone number patterns and capture them using the group method of the Matcher class.

Matcher.find() method

The find() method of the Matcher class is used to find the next subsequence of the input sequence that matches the pattern. It returns true if a match is found, false otherwise. The start() and end() methods can be used to get the start and end indices of the matched subsequence.

1
2
3
4
5
6
7
8
9
10
11
12
 String text = "(600) 123-4567 (600) 000-1111";
String regex = "\\((\\d{3})\\) (\\d{3}-\\d{4})";

Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
String areaCode = matcher.group(1);
String phoneNumber = matcher.group(2);
System.out.println("Area code: " + areaCode); // 600
System.out.println("Phone number: " + phoneNumber); // 123-4567
}

Difference between matches() and find() methods - The matches() method attempts to match the entire input sequence against the pattern, while the find() method searches for the next subsequence of the input sequence that matches the pattern.

Conclusion

Regular expressions are a powerful tool for pattern matching and text processing in Java. By understanding the basic syntax, common patterns, and practical examples, you can leverage regex to perform complex text operations with ease. If you have any questions or feedback, feel free to leave a comment below. Happy coding!