Regular Expressions
Introduction
Regular expressions are implemented in the Regex module.
Syntax
Although there are lower-level options, regular expressions are usually compiled using the ~r
sigil.
hello_pattern = ~r/hello/
The pattern can be made case insensitive by adding an i
to the end:
insensitive_hello_pattern = ~r/hello/i
The pattern goes in between the two forward slashes. For instance, the following pattern matches most floats:
float_pattern = ~r/(\d*\.\d+|\d+)/
String interpolation is supported for dynamic pattern construction:
city = "Copenhagen"
country = "Denmark"
location_pattern = ~r/#{city} is in #{country}/
Matching
Matching is a boolean operation that evaluates to whether a pattern is present in a string. This is implemented through the match/2 function as well as the =~
shorthand.
test_suite = [
"Hello, world!",
"he said hello to the world",
"end of days"
]
contains_hello =
test_suite
|> Enum.map(fn test ->
%{
sensitive: test =~ hello_pattern,
insensitive: test =~ insensitive_hello_pattern
}
end)
test_suite = [
"0",
"3.14",
"there were 2 flowers",
"regular expression",
"2.25 was the price"
]
contains_float =
test_suite
|> Enum.map(fn test -> test =~ float_pattern end)
text =
"The Danish Queen lives in Copenhagen. Copenhagen is in Denmark. It is a medium-sized capital."
states_origin =
text
|> String.split(".")
|> Enum.any?(fn sentence -> sentence =~ location_pattern end)
Capturing Anonymous Fields
The matched substring, and all substring matches of parentheses in the pattern represents fields. Given a match, these fields can be extracted as a list (in the order they are present in the pattern):
test_suite = [
"a + b",
"a * b",
" a / b",
"a+b"
]
test_suite
|> Enum.map(fn test -> Regex.run(~r/([^ ]+) (\+|\-|\*|\/) ([^ ]+)/, test) end)
Capturing Named Fields
Each field can be named using the ?
syntax. Doing so allows us to bind these names to values at the match site through a map:
test_suite
|> Enum.map(fn test ->
Regex.named_captures(~r/(?[^ ]+) (?\+|\-|\*|\/) (?[^ ]+)/, test)
end)
Replacing Fields
The full match can be replaced by a string that may refer to the fields. This can be done by referring to the field indices:
test_suite
|> Enum.map(fn test ->
Regex.replace(~r/([^ ]+) (\+|\-|\*|\/) ([^ ]+)/, test, "\\3 \\2 \\1")
end)
But it can also be accomplished through an anonymous function that has access to the fields:
test_suite
|> Enum.map(fn test ->
Regex.replace(~r/([^ ]+) (\+|\-|\*|\/) ([^ ]+)/, test, fn _full, lhs, op, rhs ->
"(#{lhs}) #{op} (#{rhs})"
end)
end)