Skip to content

SparseOptimization pattern discrepancy  #77

@rpalaganas

Description

@rpalaganas

Good afternoon! I recently ran into an issue where there is pattern discrepancy between runs with sparseOptimization set to TRUE versus FALSE. The code I ran and the output is below. With sparseOptimization set to TRUE I noticed that the ChiSq value was -nan and during the equilibration phase, the P matrix was 0. With sparseOptimization set to FALSE there seemed to be no problems, however the number of patterns learned differed in either case, i.e. SparseOptimization = TRUE gave 5 patterns while SparseOptimization = FALSE gave 6 patterns. This was true for a range of patterns that I ran (5-50)

SPARSE OPTIMIZATION ENABLED

params <- CogapsParams(nPatterns=5, nIterations=30000, seed=42, 
sparseOptimization=TRUE,
distributed="genome-wide")

params <- setDistributedParams(params, nSets=6)

Hoxd10_matnp5 <- CoGAPS(Hoxd10_mat, params)

This is CoGAPS version 3.19.1 
Running genome-wide CoGAPS on Hoxd10_mat (30407 genes and 380 samples) with parameters:

-- Standard Parameters --
nPatterns            5 
nIterations          30000 
seed                 42 
sparseOptimization   TRUE 
distributed          genome-wide 

-- Sparsity Parameters --
alpha          0.01 
maxGibbsMass   100 

-- Distributed CoGAPS Parameters -- 
nSets          6 
cut            5 
minNS          3 
maxNS          9 

Creating subsets...
set sizes (min, mean, max): (5067, 5067.833, 5072)
Running Across Subsets...

Data Model: Sparse, Normal
Sampler Type: Sequential
Loading Data...Done! (00:00:00)
    worker 1 is starting!
    worker 2 is starting!
    worker 4 is starting!
    worker 6 is starting!
    worker 3 is starting!
    worker 5 is starting!
-- Equilibration Phase --
1000 of 30000, Atoms: 13376(A), 1242(P), ChiSq: -nan, Time: 00:00:45 / 01:16:13
...
30000 of 30000, Atoms: 20636(A), 1461(P), ChiSq: -nan, Time: 00:35:40 / 01:16:38
-- Sampling Phase --
1000 of 30000, Atoms: 20671(A), 1460(P), ChiSq: -nan, Time: 00:36:54 / 01:16:28
...
29000 of 30000, Atoms: 20645(A), 1469(P), ChiSq: -nan, Time: 01:12:07 / 01:13:27
    worker 2 is finished! Time: 01:12:22
30000 of 30000, Atoms: 20670(A), 1484(P), ChiSq: -nan, Time: 01:13:21 / 01:13:21
    worker 1 is finished! Time: 01:13:21
    worker 3 is finished! Time: 01:13:24
    worker 5 is finished! Time: 01:15:26
    worker 4 is finished! Time: 01:15:26
    worker 6 is finished! Time: 01:19:08

Matching Patterns Across Subsets...
Running Final Stage...

Data Model: Sparse, Normal
Sampler Type: Sequential
Loading Data...Done! (00:00:00)
    worker 1 is starting!
    worker 2 is starting!
    worker 6 is starting!
    worker 4 is starting!
    worker 3 is starting!
    worker 5 is starting!
-- Equilibration Phase --
1000 of 30000, Atoms: 10022(A), 0(P), ChiSq: -nan, Time: 00:00:27 / 00:45:43
...
30000 of 30000, Atoms: 15174(A), 0(P), ChiSq: -nan, Time: 00:47:13 / 00:47:13
    worker 1 is finished! Time: 00:47:13
    worker 2 is finished! Time: 00:47:28
    worker 5 is finished! Time: 00:47:34
Warning message:
In checkInputs(data, uncertainty, allParams) :
  running distributed cogaps without mtx/tsv/csv/gct data

SPARSE OPTIMIZATION DISABLED

params <- CogapsParams(nPatterns=5, nIterations=30000, seed=42,
distributed="genome-wide")

params <- setDistributedParams(params, nSets=6)

Hoxd10_matnp5 <- CoGAPS(Hoxd10_mat, params)

This is CoGAPS version 3.19.1 
Running genome-wide CoGAPS on Hoxd10_mat (30407 genes and 380 samples) with parameters:

-- Standard Parameters --
nPatterns            5 
nIterations          30000 
seed                 42 
sparseOptimization   FALSE 
distributed          genome-wide 

-- Sparsity Parameters --
alpha          0.01 
maxGibbsMass   100 

-- Distributed CoGAPS Parameters -- 
nSets          6 
cut            5 
minNS          3 
maxNS          9 

Creating subsets...
set sizes (min, mean, max): (5067, 5067.833, 5072)
Running Across Subsets...

    worker 2 is starting!
    worker 3 is starting!
Data Model: Dense, Normal
Sampler Type: Sequential
Loading Data...Done! (00:00:00)
    worker 1 is starting!
    worker 4 is starting!
    worker 5 is starting!
    worker 6 is starting!
-- Equilibration Phase --
1000 of 30000, Atoms: 4665(A), 966(P), ChiSq: 5137063, Time: 00:01:16 / 02:08:43
...
30000 of 30000, Atoms: 9933(A), 2460(P), ChiSq: 4886798, Time: 00:49:52 / 01:47:09
-- Sampling Phase --
1000 of 30000, Atoms: 10033(A), 2514(P), ChiSq: 4886740, Time: 00:51:31 / 01:46:45
...
30000 of 30000, Atoms: 9953(A), 2489(P), ChiSq: 4886819, Time: 01:34:05 / 01:34:05
    worker 1 is finished! Time: 01:34:05
    worker 5 is finished! Time: 01:44:52
    worker 4 is finished! Time: 01:54:06
    worker 2 is finished! Time: 01:54:29
    worker 6 is finished! Time: 01:54:31
    worker 3 is finished! Time: 01:54:38

Matching Patterns Across Subsets...
Running Final Stage...

    worker 5 is starting!
    worker 4 is starting!
    worker 3 is starting!
    worker 2 is starting!
    worker 6 is starting!
Data Model: Dense, Normal
Sampler Type: Sequential
Loading Data...Done! (00:00:00)
    worker 1 is starting!
-- Equilibration Phase --
1000 of 30000, Atoms: 5928(A), 0(P), ChiSq: 14908930, Time: 00:00:10 / 00:16:56
...
30000 of 30000, Atoms: 10469(A), 0(P), ChiSq: 14908930, Time: 00:08:47 / 00:18:52
-- Sampling Phase --
1000 of 30000, Atoms: 10403(A), 0(P), ChiSq: 14908930, Time: 00:09:00 / 00:18:39
...
30000 of 30000, Atoms: 10379(A), 0(P), ChiSq: 14908930, Time: 00:15:17 / 00:15:17
    worker 1 is finished! Time: 00:15:17
    worker 5 is finished! Time: 00:16:29
    worker 3 is finished! Time: 00:19:47
    worker 2 is finished! Time: 00:20:37
    worker 4 is finished! Time: 00:20:38
    worker 6 is finished! Time: 00:20:45
Warning message:
In checkInputs(data, uncertainty, allParams) :
  running distributed cogaps without mtx/tsv/csv/gct data

After obtaining the patterns, I ran patternMarkers on patterns learned with sparseOptimization = TRUE. When I set threshold = “all”, I would get this error.

test <- patternMarkers_all(Hoxd10_matnp5, threshold = "all")

Error in colnames(markerScores)[apply(markerScores, 1, which.min)] : 
  invalid subscript type 'list'
This error would not trigger when threshold was set to “cut”.
PatternMarkers worked normally when run on patterns learned without sparseOptimization. 

UPDATE @dimalvovs  - delete rows for readability

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions