In a project that I’m working on, I need to generate sets of sequences. The rules about these are that their values are either 0 or 1, and for any position \(i\), not all the sequences can have the same value. I have a small function that does this and I have written some tests.
#' @param n_snps The length of the sequence
#' @param n_true_seqs The number of sequences
#' @return A matrix of 0's and 1's with number of rows equal to n_snps and number of columns equal to n_true_seqs
make_true_seqs = function(n_snps, n_true_seqs) {
true_seqs = matrix(NA, nrow = n_true_seqs, ncol = n_snps)
for(i in 1:n_snps) {
snp_vals = sample(0:1, size = n_true_seqs, replace = TRUE)
if(all(snp_vals == 0)) {
snp_vals[sample(n_true_seqs, 1)] = 1
} else if(all(snp_vals == 1)) {
snp_vals[sample(n_true_seqs, 1)] = 0
}
true_seqs[,i] = snp_vals
}
return(true_seqs)
}
context("Check simulations")
source("simulate_reads.R")
test_that("true seq creation function works", {
expect_true(all(colSums(make_true_seqs(10, 5)) > 0))
expect_true(all(colSums(make_true_seqs(20, 3)) < 3))
expect_true(all(make_true_seqs(15, 4) %in% c(0,1)))
})
Make an R file called simulate_reads.R
that contains the
simulation function and a test file called
test_simulations.R
that contains the testing code. In an R
session that is in the same working directory as the folder containing
the two R files, run library(testthat)
and
test_dir('.')
. The output should indicate that all the
tests pass.
Check that you understand the function and what the tests do. It is
probably a good idea to annotate the expect_true
lines
and/or get rid of the magic numbers.
Try adding additional tests (in the form of additional
expect_*()
lines inside the test_that
function
in the tist file) and check whether they pass.
Suppose we want to modify the make_true_seqs
function so
that it takes an extra argument, num_nonzero_per_site
,
which should be a vector the same length as n_snps
and that
gives the number of 1’s that we want to have at each position along the
sequence. Before, we required that the number of sequences with a 1 at
any given position was strictly bigger than zero and strictly less than
the number of sequences. The modification of the function should make it
so that if we sum the number of sequences with a 1 at position
i
, that value should be equal to
num_nonzero_per_site[i]
.
Write a test (or a set of tests) that you would want the modified
function to pass. Assume that the modified function will have three
arguments, n_snps
, n_true_seqs
, and
num_nonzero_per_site
.
Modify make_true_seqs
so that it works the way we
described in Part 2. Try running your tests on the new version.