Testing lab

Background

In a project that I’m working on, I need to generate sets of sequences. The rules about these are that their values are either 0 or 1, and for any position \(i\), not all the sequences can have the same value. I have a small function that does this and I have written some tests.

#' @param n_snps The length of the sequence
#' @param n_true_seqs The number of sequences
#' @return A matrix of 0's and 1's with number of rows equal to n_snps and number of columns equal to n_true_seqs
make_true_seqs = function(n_snps, n_true_seqs) {
    true_seqs = matrix(NA, nrow = n_true_seqs, ncol = n_snps)
    for(i in 1:n_snps) {
        snp_vals = sample(0:1, size = n_true_seqs, replace = TRUE)
        if(all(snp_vals == 0)) {
            snp_vals[sample(n_true_seqs, 1)] = 1
        } else if(all(snp_vals == 1)) {
            snp_vals[sample(n_true_seqs, 1)] = 0
        }
        true_seqs[,i] = snp_vals
    }
    return(true_seqs)
}

context("Check simulations")
source("simulate_reads.R")
test_that("true seq creation function works", {
    expect_true(all(colSums(make_true_seqs(10, 5)) > 0))
    expect_true(all(colSums(make_true_seqs(20, 3)) < 3))
    expect_true(all(make_true_seqs(15, 4) %in% c(0,1)))
})

Part 1

Make an R file called simulate_reads.R that contains the simulation function and a test file called test_simulations.R that contains the testing code. In an R session that is in the same working directory as the folder containing the two R files, run library(testthat) and test_dir('.'). The output should indicate that all the tests pass.

Check that you understand the function and what the tests do. It is probably a good idea to annotate the expect_true lines and/or get rid of the magic numbers.

Try adding additional tests (in the form of additional expect_*() lines inside the test_that function in the tist file) and check whether they pass.

Part 2

Suppose we want to modify the make_true_seqs function so that it takes an extra argument, num_nonzero_per_site, which should be a vector the same length as n_snps and that gives the number of 1’s that we want to have at each position along the sequence. Before, we required that the number of sequences with a 1 at any given position was strictly bigger than zero and strictly less than the number of sequences. The modification of the function should make it so that if we sum the number of sequences with a 1 at position i, that value should be equal to num_nonzero_per_site[i].

Write a test (or a set of tests) that you would want the modified function to pass. Assume that the modified function will have three arguments, n_snps, n_true_seqs, and num_nonzero_per_site.

Part 3

Modify make_true_seqs so that it works the way we described in Part 2. Try running your tests on the new version.