Make names of gene-sets unique by namespace, and member genes of gene-sets unique

uniqGenesetsByNamespace(gmtList)

Arguments

gmtList

A GmtList object, probably from readGmt. The object must have namespaces defined by setNamespace.

The function make sure that

  • names of gene-sets within each namespace are unique, by merging gene-sets with duplicated names

  • genes within each gene-set are unique, by removing duplicated genes

Gene-sets with duplicated names and different desc are merged, desc are made unique, and in case of multiple values, concatenated (with | as the collapse character).

Value

A GmtList object, with unique gene-sets and unique gene lists. If not already present, a new item namespace is appended to each list element in the GmtList object, recording the namespace used to make gene-sets unique. The order of the returned GmtList object is given by the unique gene-set name of the input object.

Examples

myGmtList <- GmtList(list(list(name="GeneSet1", desc="Namespace1", genes=LETTERS[1:3]), list(name="GeneSet2", desc="Namespace1", genes=rep(LETTERS[4:6],2)), list(name="GeneSet1", desc="Namespace1", genes=LETTERS[4:6]), list(name="GeneSet3", desc="Namespace2", genes=LETTERS[1:5]))) print(myGmtList)
#> A gene-set list in GMT format with 4 genesets #> Gene-sets: #> GeneSet1 (Namespace1,n=3): A,B,C #> GeneSet2 (Namespace1,n=6): D,E,F,... #> GeneSet1 (Namespace1,n=3): D,E,F #> GeneSet3 (Namespace2,n=5): A,B,C,...
myGmtList <- setNamespace(myGmtList, namespace=function(x) x$desc) myUniqGmtList <- uniqGenesetsByNamespace(myGmtList) print(myUniqGmtList)
#> A gene-set list in GMT format with 3 genesets #> Namespaces: #> [1] Namespace1 (n=2) #> [2] Namespace2 (n=1) #> Gene-sets: #> GeneSet1 (Namespace1,n=6): A,B,C,... #> GeneSet2 (Namespace1,n=3): D,E,F #> GeneSet3 (Namespace2,n=5): A,B,C,...