-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Good afternoon! I recently ran into an issue where there is pattern discrepancy between runs with sparseOptimization set to TRUE versus FALSE. The code I ran and the output is below. With sparseOptimization set to TRUE I noticed that the ChiSq value was -nan and during the equilibration phase, the P matrix was 0. With sparseOptimization set to FALSE there seemed to be no problems, however the number of patterns learned differed in either case, i.e. SparseOptimization = TRUE gave 5 patterns while SparseOptimization = FALSE gave 6 patterns. This was true for a range of patterns that I ran (5-50)
SPARSE OPTIMIZATION ENABLED
params <- CogapsParams(nPatterns=5, nIterations=30000, seed=42,
sparseOptimization=TRUE,
distributed="genome-wide")
params <- setDistributedParams(params, nSets=6)
Hoxd10_matnp5 <- CoGAPS(Hoxd10_mat, params)
This is CoGAPS version 3.19.1
Running genome-wide CoGAPS on Hoxd10_mat (30407 genes and 380 samples) with parameters:
-- Standard Parameters --
nPatterns 5
nIterations 30000
seed 42
sparseOptimization TRUE
distributed genome-wide
-- Sparsity Parameters --
alpha 0.01
maxGibbsMass 100
-- Distributed CoGAPS Parameters --
nSets 6
cut 5
minNS 3
maxNS 9
Creating subsets...
set sizes (min, mean, max): (5067, 5067.833, 5072)
Running Across Subsets...
Data Model: Sparse, Normal
Sampler Type: Sequential
Loading Data...Done! (00:00:00)
worker 1 is starting!
worker 2 is starting!
worker 4 is starting!
worker 6 is starting!
worker 3 is starting!
worker 5 is starting!
-- Equilibration Phase --
1000 of 30000, Atoms: 13376(A), 1242(P), ChiSq: -nan, Time: 00:00:45 / 01:16:13
...
30000 of 30000, Atoms: 20636(A), 1461(P), ChiSq: -nan, Time: 00:35:40 / 01:16:38
-- Sampling Phase --
1000 of 30000, Atoms: 20671(A), 1460(P), ChiSq: -nan, Time: 00:36:54 / 01:16:28
...
29000 of 30000, Atoms: 20645(A), 1469(P), ChiSq: -nan, Time: 01:12:07 / 01:13:27
worker 2 is finished! Time: 01:12:22
30000 of 30000, Atoms: 20670(A), 1484(P), ChiSq: -nan, Time: 01:13:21 / 01:13:21
worker 1 is finished! Time: 01:13:21
worker 3 is finished! Time: 01:13:24
worker 5 is finished! Time: 01:15:26
worker 4 is finished! Time: 01:15:26
worker 6 is finished! Time: 01:19:08
Matching Patterns Across Subsets...
Running Final Stage...
Data Model: Sparse, Normal
Sampler Type: Sequential
Loading Data...Done! (00:00:00)
worker 1 is starting!
worker 2 is starting!
worker 6 is starting!
worker 4 is starting!
worker 3 is starting!
worker 5 is starting!
-- Equilibration Phase --
1000 of 30000, Atoms: 10022(A), 0(P), ChiSq: -nan, Time: 00:00:27 / 00:45:43
...
30000 of 30000, Atoms: 15174(A), 0(P), ChiSq: -nan, Time: 00:47:13 / 00:47:13
worker 1 is finished! Time: 00:47:13
worker 2 is finished! Time: 00:47:28
worker 5 is finished! Time: 00:47:34
Warning message:
In checkInputs(data, uncertainty, allParams) :
running distributed cogaps without mtx/tsv/csv/gct data
SPARSE OPTIMIZATION DISABLED
params <- CogapsParams(nPatterns=5, nIterations=30000, seed=42,
distributed="genome-wide")
params <- setDistributedParams(params, nSets=6)
Hoxd10_matnp5 <- CoGAPS(Hoxd10_mat, params)
This is CoGAPS version 3.19.1
Running genome-wide CoGAPS on Hoxd10_mat (30407 genes and 380 samples) with parameters:
-- Standard Parameters --
nPatterns 5
nIterations 30000
seed 42
sparseOptimization FALSE
distributed genome-wide
-- Sparsity Parameters --
alpha 0.01
maxGibbsMass 100
-- Distributed CoGAPS Parameters --
nSets 6
cut 5
minNS 3
maxNS 9
Creating subsets...
set sizes (min, mean, max): (5067, 5067.833, 5072)
Running Across Subsets...
worker 2 is starting!
worker 3 is starting!
Data Model: Dense, Normal
Sampler Type: Sequential
Loading Data...Done! (00:00:00)
worker 1 is starting!
worker 4 is starting!
worker 5 is starting!
worker 6 is starting!
-- Equilibration Phase --
1000 of 30000, Atoms: 4665(A), 966(P), ChiSq: 5137063, Time: 00:01:16 / 02:08:43
...
30000 of 30000, Atoms: 9933(A), 2460(P), ChiSq: 4886798, Time: 00:49:52 / 01:47:09
-- Sampling Phase --
1000 of 30000, Atoms: 10033(A), 2514(P), ChiSq: 4886740, Time: 00:51:31 / 01:46:45
...
30000 of 30000, Atoms: 9953(A), 2489(P), ChiSq: 4886819, Time: 01:34:05 / 01:34:05
worker 1 is finished! Time: 01:34:05
worker 5 is finished! Time: 01:44:52
worker 4 is finished! Time: 01:54:06
worker 2 is finished! Time: 01:54:29
worker 6 is finished! Time: 01:54:31
worker 3 is finished! Time: 01:54:38
Matching Patterns Across Subsets...
Running Final Stage...
worker 5 is starting!
worker 4 is starting!
worker 3 is starting!
worker 2 is starting!
worker 6 is starting!
Data Model: Dense, Normal
Sampler Type: Sequential
Loading Data...Done! (00:00:00)
worker 1 is starting!
-- Equilibration Phase --
1000 of 30000, Atoms: 5928(A), 0(P), ChiSq: 14908930, Time: 00:00:10 / 00:16:56
...
30000 of 30000, Atoms: 10469(A), 0(P), ChiSq: 14908930, Time: 00:08:47 / 00:18:52
-- Sampling Phase --
1000 of 30000, Atoms: 10403(A), 0(P), ChiSq: 14908930, Time: 00:09:00 / 00:18:39
...
30000 of 30000, Atoms: 10379(A), 0(P), ChiSq: 14908930, Time: 00:15:17 / 00:15:17
worker 1 is finished! Time: 00:15:17
worker 5 is finished! Time: 00:16:29
worker 3 is finished! Time: 00:19:47
worker 2 is finished! Time: 00:20:37
worker 4 is finished! Time: 00:20:38
worker 6 is finished! Time: 00:20:45
Warning message:
In checkInputs(data, uncertainty, allParams) :
running distributed cogaps without mtx/tsv/csv/gct data
After obtaining the patterns, I ran patternMarkers on patterns learned with sparseOptimization = TRUE. When I set threshold = “all”, I would get this error.
test <- patternMarkers_all(Hoxd10_matnp5, threshold = "all")
Error in colnames(markerScores)[apply(markerScores, 1, which.min)] :
invalid subscript type 'list'
This error would not trigger when threshold was set to “cut”.
PatternMarkers worked normally when run on patterns learned without sparseOptimization.
UPDATE @dimalvovs - delete rows for readability