CoDiNA

Co-Expression Differential Network Analysis

By Deisy Morselli Gysi in r packages co-expression networks network comparisson

July 15, 2020

The usage of the Co-expression Differential Network analysis has been growing by the Biological and Medical science for the analysis of complex systems or diseases. We have developed a method that is able to compare as many networks as desired, by caracterizing both links and nodes that are common, different or specific to each network. More information can be found at https://doi.org/10.1371/journal.pone.0240523.

You can download the package from CRAN using:

install.packages('CoDiNA')

Input data

The input data for CoDiNA is a list of data.frame, containing: Node.1, Node.2 and value. It is important to mention here that the methodology should be employed only for undirected graphs. The value is the strength of the link between Node.1 and Node.2 and must any real number between -1 to 1. This value can be re-normalized by the package using the argument stretch = TRUE (by default the values are normalized).

As an example, the CoDiNA package contains 4 datasets from a Cancer study, GSE4290 (Sun et al. (2006)). Each one of this datasets was previously normalized, the control quality was done for the genes and the networks were calculate using the wTO package (Morselli Gysi et al. (2017); Morselli Gysi et al. (2017)). Each dataset consists of the Gene names and the weight only for the significative interactions and filtered for a wTO value of |0.3|.

Using the wTO output for CoDiNA

The output from the wTO package can be easily used as input for CoDiNA.

require(wTO)
## Loading required package: wTO
require(CoDiNA)
## Loading required package: CoDiNA
require(magrittr)
## Loading required package: magrittr
wTO_out = wTO.fast(Data = Microarray_Expression1, 
                   n = 100)
## There are 268 overlapping nodes, 268 total nodes and 18 individuals.
## This function might take a long time to run. Don't turn off the computer.
##  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99  100 Done!
wTO_filtered = subset(wTO_out, 
                      p.adjust(wTO_out$pval) < 0.05, 
                      select = c('Node.1', 'Node.2', 'wTO'))

Creating the Differential Network

To generate the differential network one can use the MakeDiffNet() function.

This function will return the Φ and Φ̃ classification for each one of the links. Connections that are assigned to α (a) are in agreement in all networks and it means that all networks possess that particular link with the same sign. Links classified as β (b) are the ones that also exist in all networks but at least one network contains it with a different sign. The category γ (g) contains links that does not exist in all networks, meaning that they are specific to at least one network.

This function also assigns the link into a sub-category. It is important mainly for the β and γ links to understand its differences or specificities. It is important to note that the first network is considered to be the reference for β links.

The output from this function is a data.frame containing the nodes, the original weights (or normalized), the Phi and Phi_tilde categories, a Group, which describes the sign or absence of the link, the Score_center (raw score), Score_Phi (normalized score by Φ), Score_Phi_tilde (normalized score by Φ̃), Score_internal (score of the link to its theoretical category). The first 3 scores, should be closer to 1, while for the last one, the closer to 0 the better.

DiffNet = MakeDiffNet(Data = list(CTR, OLI, AST),
                      Code = c('CTR', 'OLI', 'AST'))
## Starting now.
## CTR contains 17471 edges and 1022 nodes.
## OLI contains 64791 edges and 1697 nodes.
## AST contains 3384 edges and 1002 nodes.
## Total of nodes: 442
## Total of edges: 82558
## Coding correlations.
## Total of edges (inside the cutoff): 15950
## Starting Phi categorization.
## Coding the groups.
## Recode is done!
DiffNet
## Nodes 441 
## Links 15950
print(DiffNet) %>% 
  head()
## Nodes 441 
## Links 15950
##   Node.1 Node.2        CTR        OLI AST Phi Phi_tilde            Group
## 1   CTCF NKX6-3 -0.8861789 -0.7756813   0   g g.CTR.OLI  -CTR,-OLI,NoAST
## 2   IRF3 NKX6-3 -0.8520325  0.0000000   0   g     g.CTR -CTR,NoOLI,NoAST
## 3 NKX6-3    TDG -0.9040650 -0.8385744   0   g g.CTR.OLI  -CTR,-OLI,NoAST
## 4  BUD31 NKX6-3 -0.8016260 -0.7484277   0   g g.CTR.OLI  -CTR,-OLI,NoAST
## 5  HMGN3 NKX6-3 -0.8878049 -0.8364780   0   g g.CTR.OLI  -CTR,-OLI,NoAST
## 6 NKX6-3  PUF60 -0.9479675  0.0000000   0   g     g.CTR -CTR,NoOLI,NoAST
##   Score_center Score_Phi Score_Phi_tilde Score_internal Score_ratio
## 1    0.8327648 0.5467547       0.5235928     0.17786809    2.943714
## 2    0.8520325 0.5989745       0.5937500     0.14796748    4.012706
## 3    0.8719348 0.6529143       0.6723478     0.13278127    5.063574
## 4    0.7754832 0.3915083       0.3060555     0.22654014    1.350999
## 5    0.8625233 0.6274069       0.6366059     0.14022695    4.539826
## 6    0.9479675 0.8589800       0.8571429     0.05203252   16.473214

Clustering the nodes into Φ and Φ̃ categories

Because exclusively the information about the links is not enough to define a network, it is necessary to define the nodes accordingly to its Φ and Φ̃ categories. To do so, the function ClusterNodes() can be used. The input for this function is DiffNet, that is the output from the MakeDiffNet(), besides the external and internal cutoffs. The external cutoff is applied to the normalized Φ̃ Score, while the internal cutoff is applied to the internal Score.

The suggested values for the internal and external cutoffs are the median or the first and third quantiles of the internal and Φ̃ scores, depending on how conservative the network should be.

Using the median:

int_C = quantile(DiffNet$Score_internal, 0.5)
ext_C = quantile(DiffNet$Score_Phi, 0.5)

Nodes_Groups = ClusterNodes(DiffNet = DiffNet, 
                            cutoff.external = ext_C, 
                            cutoff.internal = int_C)
table(Nodes_Groups$Phi_tilde)
## 
##     g.AST     g.CTR g.CTR.OLI     g.OLI g.OLI.AST         U 
##        11       213         2       125         1        66

Using the first and third quantile:

int_C = quantile(DiffNet$Score_internal, 0.25)
ext_C = quantile(DiffNet$Score_Phi, 0.75)

Nodes_Groups = ClusterNodes(DiffNet = DiffNet, 
                            cutoff.external = ext_C, 
                            cutoff.internal = int_C)
table(Nodes_Groups$Phi_tilde)
## 
## g.AST g.CTR g.OLI     U 
##     8   188    64    80

Plotting the network

The visualization of the final network can be quickly done with plot. The layout of the network can be also determined from a variety that is implemented in igraph package, the Make_Cluster argument allows the nodes to be clusterized according to many clustering algorithms that are implemented in igraph can be used. The final graph can be exported as an HTML or as png. The argument path saves the network in the given path.

The plot returns the nodes and its information.

int_C = quantile(DiffNet$Score_internal, 0.25)
ext_C = quantile(DiffNet$Score_Phi, 0.75)

Graph = plot(DiffNet, 
             cutoff.external = ext_C, 
             cutoff.internal = int_C, 
             layout = 'layout_components', 
             path = 'Vis.html')
## Vis.html

The graph can also be exported as an igraph object, that can be further plotted.

g = as.igraph(Graph) 

require(igraph)
## Loading required package: igraph
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:CoDiNA':
## 
##     as.igraph, normalize
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
plot(g, 
     layout = layout.fruchterman.reingold(g), 
     vertex.label = NA)

Session Info

sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] igraph_1.2.6   magrittr_2.0.1 CoDiNA_1.1.2   wTO_1.6.3     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.6        knitr_1.33        som_0.3-5.1       R6_2.5.0         
##  [5] rlang_0.4.11      highr_0.9         stringr_1.4.0     plyr_1.8.6       
##  [9] visNetwork_2.0.9  tools_4.1.0       parallel_4.1.0    data.table_1.14.0
## [13] xfun_0.23         jquerylib_0.1.4   htmltools_0.5.1.1 yaml_2.2.1       
## [17] digest_0.6.27     bookdown_0.22     reshape2_1.4.4    htmlwidgets_1.5.3
## [21] sass_0.4.0        evaluate_0.14     rmarkdown_2.8.5   blogdown_1.3     
## [25] stringi_1.6.2     compiler_4.1.0    bslib_0.2.5.1     jsonlite_1.7.2   
## [29] pkgconfig_2.0.3

References

Morselli Gysi, Deisy, Andre Voigt, Tiago de Miranda Fragoso, Eivind Almaas, and Katja Nowick. 2017. wTO: Computing Weighted Topological Overlaps (wTO) & Consensus wTO Network. https://CRAN.R-project.org/package=wTO.

Gysi, D. M., Fragoso, T.M, Zebardast, F., Beroli, W., Busskamp, V., Almaas, E., Nowick, K. (2018). Whole transcriptomic network analysis using Co-expression Differential Network Analysis (CoDiNA). Plos One, https://doi.org/10.1371/journal.pone.0240523

Sun, Lixin, Ai-Min Hui, Qin Su, Alexander Vortmeyer, Yuri Kotliarov, Sandra Pastorino, Antonino Passaniti, et al. 2006. “Neuronal and Glioma-Derived Stem Cell Factor Induces Angiogenesis Within the Brain.” Cancer Cell 9 (4). Elsevier: 287–300.

Posted on:
July 15, 2020
Length:
7 minute read, 1413 words
Categories:
r packages co-expression networks network comparisson
See Also: