This lesson is in the early stages of development (Alpha version)

Regular Expressions for Biologists: Glossary

Key Points

Introduction
  • Regular expressions are a way of describing patterns in text.

  • Most text editors and many other tools include a regular expression engine for performing these kinds of searches.

  • Regular expressions are often offered as a mode of find/replace that can be turned on and off by the user.

Regex Fundamentals
  • Wrap characters in [] to define a set of valid matches for a given position.

  • Use - between two characters to define a range of characters to match.

  • ^ at the start of a set to invert it, indicating that the given characters should be excluded from a match.

Tokens and Wildcards
  • Use the  token to match a word boundary, and ^ and $ to match the beginning and end of a line respectively.

  • \ has special meaning in regular expressions, and \\ should be used to specify a literal backslash in a pattern.

  • . describes a position that could match any character.

  • When composing a regular expression, it is good practice to be as specific as possible about what you want to match.

Repeated Matches
  • ? indicates that the preceding character or set should be treated as optional in this position.

  • * indicates that the preceding character or set should appear 0 or more times in this position.

  • + indicates that the preceding character or set should appear 1 or more times in this position.

  • {2,4} indicates that the preceding character or set should appear at least twice but no more than four times in this position.

Capture Groups and References
  • Capture groups are defined within () in a regular expression.

  • The left-most capture group in a regular expression is referred to with \1 in the replacement string, the next with \2, and so on.

Alternative Matches
  • Alternative strings to match can be combined with |.

Glossary

FIXME