I attempted to run diffxpy with the number of detected genes as a covariate. This has been proposed to improve performance on single cell data by Soneson et al., NMeth 2018.
Using that model, diffxpy fails reporting that the design matrix was not full rank. However, the same design matrix runs flawlessly using edgeR.
Reproducible example:
d <- DGEList(counts)
d <- suppressWarnings(edgeR::calcNormFactors(d))
design <- model.matrix(~ 1 + group + n_genes, data=obs)
d <- estimateDisp(d, design)
fit <- glmFit(d,design)
lrt <- glmLRT(fit, coef = 2)
pval <- lrt$table$PValue
padj <- p.adjust(pval, "BH")
# result
head(data.frame(pval = pval, padj = padj))
## pval padj
## 1 0.8937844 0.9994355
## 2 0.2291069 0.9994355
## 3 0.8452571 0.9994355
## 4 0.9977154 0.9994950
## 5 0.8893899 0.9994355
## 6 0.3032224 0.9994355
try:
test = de.test.wald(
data=counts.values.T,
formula_loc="~ 1 + group + n_genes",
factor_loc_totest=["group"],
coef_to_test="group",
sample_description=obs,
batch_size=100,
training_strategy="DEFAULT",
dtype="float64",
gene_names=counts.index
)
pd.DataFrame().assign(
pval=test.pval,
padj=test.qval
)
except Exception as e:
import traceback
traceback.print_exc()
## Traceback (most recent call last):
## File "<string>", line 11, in <module>
## File "/home/sturm/projects/2019/zinbwaveZinger/work/conda/diffxpy_test-bf5ddff8c39a3368b22c8ee2c4730f1c/lib/python3.7/site-packages/diffxpy/testing/tests.py", line 566, in wald
## return_type="patsy"
## File "/home/sturm/projects/2019/zinbwaveZinger/work/conda/diffxpy_test-bf5ddff8c39a3368b22c8ee2c4730f1c/lib/python3.7/site-packages/diffxpy/testing/utils.py", line 272, in constraint_system_from_star
## return_type=return_type
## File "/home/sturm/projects/2019/zinbwaveZinger/work/conda/diffxpy_test-bf5ddff8c39a3368b22c8ee2c4730f1c/lib/python3.7/site-packages/batchglm/data.py", line 252, in constraint_system_from_star
## (np.linalg.matrix_rank(dmat), dmat.shape[1])
## ValueError: constrained design matrix is not full rank: 80 81