This lesson is still being designed and assembled (Pre-Alpha version)

The Bioconductor project: Glossary

Key Points

Introduction and setup
  • Participants will only be able to install the version of Bioconductor packages described in this lesson and reproduce their exact outputs if they use the correct version of R.

  • The files used in this lesson should be downloaded in a local path that is easily accessible from an R session.

Introduction to Bioconductor
  • R packages are but one aspect of the Bioconductor project.

  • The Bioconductor project extends and complements the CRAN repository.

  • Different types of packages provide not only software, but also annotations, experimental data, and demonstrate the use of multiple packages in integrated workflows.

  • Interoperability beteen Bioconductor packages facilitates the writing of integrated workflows and minimizes the cognitive burden on users.

  • Educational materials from courses and conferences are archived and accessible on the Bioconductor website and YouTube channel.

  • Different channels of communication enable community members to converse and help each other, both as users and package developers.

  • The Bioconductor project is governed by scientific, technical, and advisory boards, as well as a Code of Conduct committee.

Installing Bioconductor packages
  • The BiocManager package is available from the CRAN repository.

  • BiocManager::install() is used to install and update Bioconductor packages (but also from CRAN and GitHub).

  • BiocManager::valid() is used to check for available package updates.

  • BiocManager::version() reports the version of Bioconductor currently installed.

  • BiocManager::install() can also be used to update an entire R library to a specific version of Bioconductor.

Getting help
  • The browseVignettes() function is recommended to access the vignette(s) installed with each package.

  • Vignettes can also be accessed on the Bioconductor website, but beware of differences between package versions!

  • The Bioconductor main website contains general information, package documentation, and course materials.

  • The Bioconductor support site is the recommended place to contact developers and ask questions.

S4 classes in Bioconductor
  • S4 classes store information in slots, and check the validity of the information every an object is updated.

  • To ensure the continued integrity of S4 objects, users should not access slots directly, but using dedicated functions.

  • S4 generics invoke different implementations of the method depending on the class of the object that they are given.

  • The S4 class DataFrame extends the functionality of base data.frame, for instance with the capacity to hold information about each column in metadata columns.

  • The S4 class Rle extends the functionality of the base vector, for instance with the capacity to encode repetitive vectors in a memory-efficient format.

Working with biological sequences
  • The Biostrings package defines classes to represent sequences of nucleotides and amino acids.

  • The Biostrings package also defines methods to efficiently process biological sequences.

  • The BSgenome package provides genome sequences for a range of model organisms immediately available as Bioconductor objects.

Working with genomics ranges
  • The GenomicRanges package defines classes to represent ranges of coordinates on a genomic scale.

  • The GenomicRanges package also defines methods to efficiently process genomic ranges.

  • The rtracklayer package provides functions to import and export genomic ranges from and to common genomic file formats.

Glossary

AnnotationData package
Type of Bioconductor package that provides databases of molecular annotations (e.g., genes, proteins, pathways).
biocViews
Directed acyclic graphs of terms from a controlled vocabulary, used to categorize R packages in the Bioconductor repository. The biocViews can be browsed on the Bioconductor website.
ExperimentData package
Type of Bioconductor package that provides experimental datasets, immediately available as standard Bioconductor objects. This type of package is often used in package vignettes, to conveniently import data used to demonstrate the functionality of other packages as well as larger workflows. Experiment data packages can be explored on the biocViews page.
S4 class
R has three object oriented (OO) systems: S3, S4 and R5. S4 is system that defines formal classes, using an implementation that is stricter than the S3 class system. Classes define the conceptual structure of S4 objects, while S4 objects represent practical instances of their class. See S4 object.
S4 class slot
Slots can be seen as parts, elements, properties, or attributes of S4 objects. Each slot is defined by its name and the data type that it may contain.
S4 generic
Template function for S4 methods that defines the arguments considered for S4 method dispatch.
S4 method
Instance of an S4 generic for a particular combination of classes across the arguments considered for S4 method dispatch.
S4 method dispatch
Mechanism allowing R to identify and call the implementation of an S4 generic R function according to the class of object(s) given as argument(s).
S4 object
S4 objects are instances of S4 classes, in the same way that an actual car is an instance of the definition of a car that one would find in a dictionary.
Software package
Type of Bioconductor package that provides implementations of methodologies for processing experimental data.
Vignette
Document(s) in PDF or HTML format, distributed and installed alongside package code, providing long-form documentation that demonstrates the use of the package functionality in the context of an example workflow. Vignettes typically use standard datasets obtained from an ExperimentData package or the ExperimentHub package.
Workflow package
Type of Bioconductor package that exclusively provides vignettes used to demonstrate the use of multiple Bioconductor packages in the context of a large workflow.

Web resources

Bioconductor website
The official Bioconductor website.