wTO: Computing Weighted Topological Overlaps (wTO) & Consensus wTO Network

Computes the Weighted Topological Overlap with positive and negative signs (wTO) networks (Nowick et al. (2009) doi:10.1073/pnas.0911376106) given a data frame containing the mRNA count/ expression/ abundance per sample, and a vector containing the interested nodes of interaction (a subset of the elements of the full data frame).

It also computes the cut-off threshold or p-value based on the individuals bootstrap or the values reshuffle per individual (Gysi et al. 2017 https://arxiv.org/abs/1711.04702). It also allows the construction of a consensus network, based on multiple wTO networks. The package includes a visualization tool for the networks.

You can download the package from CRAN using:

install.packages('wTO')

Input data

The wTO package, can be used on any kind of count data, but we highly recommend to use normalized and quality controlled data according to the data type such as RMA, MD5 for microarray, RPKM, TPM or PKM for RNA-seq, sample normalized data for metagenomics.

As an example, the package contains three data sets, two from microarray chips (Microarray_Expression1 and Microarray_Expression2), and one from abundance in metagenomics (metagenomics_abundance).

wTO

The wTO method is a method for building networks based on pairwise correlations normalized and corrected by all shared correlations. For this reason, the user can choose a set of factors of interest, called here Overlaps, those are the nodes that will be corrected and normalized by all other factors in the dataset. Those factors can be Transcription Factor, long non coding RNAs, a set of species of interest etc.

Genomic data

The wTO package contains 2 data sets that were obtained using expression arrays (Microarray_Expression1 and Microarray_Expression2), they were previously normalized and the quality control was done. We will use it to build the wTO network using the different methods implemented in the package.

First we are going to inspect those data sets.

require(wTO)
#> Loading required package: wTO
require(magrittr)
#> Loading required package: magrittr
data("ExampleGRF")
data("Microarray_Expression1")
data("Microarray_Expression2")

dim(Microarray_Expression1)
#> [1] 268  18
dim(Microarray_Expression2)
#> [1] 268  18

Microarray_Expression1[1:5,1:5]
#>               ID1      ID2      ID3      ID4      ID5
#> FAM122B  5.325653 5.039814 5.099828 5.053185 5.213816
#> DEFB108B 2.038747 1.965599 1.925807 1.977435 2.079381
#> CCSER2   4.973347 4.865783 4.818910 5.024392 4.697314
#> GPD2     5.453287 5.595471 5.223886 5.130226 5.370672
#> HECW1    4.350837 4.279759 4.218375 4.472152 4.408025
Microarray_Expression2[1:5,1:5]
#>               ID1      ID2      ID3      ID4      ID5
#> FAM122B  5.532142 6.395654 5.159082 5.806040 5.339848
#> DEFB108B 2.456210 1.993044 2.251673 2.440018 2.493610
#> CCSER2   5.164945 4.923511 4.979691 5.080116 5.014569
#> GPD2     5.742455 5.649180 5.430411 6.007418 5.662126
#> HECW1    4.595407 5.243644 4.802716 4.957706 4.738554

head(ExampleGRF)
#>          x
#> 1    ACAD8
#> 2   ANAPC2
#> 3  ANKRD22
#> 4   ANKRD2
#> 5 ARHGAP35
#> 6    ASH1L

Please, note that the individuals are in the columns and the gene expressions are in the rows. Moreover, the row.names() are the names of the genes. The list of genes that will be used for measuring the interactions are in ExampleGRF. There should always be more than 2 of them contained in the expression set. If there are no common nodes to be measured, the method will retun an error.

sum(ExampleGRF$x %in% row.names(Microarray_Expression1))
#> [1] 168

Running the wTO

We can run the wTO package with 3 modes. The first one is running the wTO without resampling. For that we can use the wTO() . The second one, wTO.Complete(), gives you the whole diagnosis plot, hard-threshold on the ωi, j, the ωi, j, |ωi, j| values and p-values. The last mode, wTO.fast(), just returns the ωi, j values and p-value.

Using the wTO() function:

To use the wTO() function, the first step is to compute the correlation among the nodes of interest using CorrelationOverlap() and then use it as input for the wTO(). In the first function the user is allowed to choose the method for correlation between Pearson (‘p’) or Spearman (‘s’). The second function allows the choice between absolute values (‘abs’) or signed values (‘sign’). Please, keep in mind that the result of the wTO() function is a matrix, and it can be easily converted to an edge list using the function wTO.in.line().

wTO_p_abs = CorrelationOverlap(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 'p') %>% wTO(., sign = 'abs')

wTO_p_abs[1:5,1:5]
#>         ZNF333 ZNF28 ANKRD22   ZFR TRIM33
#> ZNF333   0.352 0.237   0.269 0.242  0.241
#> ZNF28    0.237 0.287   0.209 0.206  0.239
#> ANKRD22  0.269 0.209   0.299 0.199  0.252
#> ZFR      0.242 0.206   0.199 0.328  0.258
#> TRIM33   0.241 0.239   0.252 0.258  0.361
wTO_p_abs %<>% wTO.in.line()
head(wTO_p_abs)
#>     Node.1  Node.2   wTO
#> 1:  ZNF333   ZNF28 0.237
#> 2:  ZNF333 ANKRD22 0.269
#> 3:   ZNF28 ANKRD22 0.209
#> 4:  ZNF333     ZFR 0.242
#> 5:   ZNF28     ZFR 0.206
#> 6: ANKRD22     ZFR 0.199

wTO_s_abs = CorrelationOverlap(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 's') %>% wTO(., sign = 'abs') %>% wTO.in.line()
head(wTO_s_abs)
#>     Node.1  Node.2   wTO
#> 1:  ZNF333   ZNF28 0.236
#> 2:  ZNF333 ANKRD22 0.258
#> 3:   ZNF28 ANKRD22 0.215
#> 4:  ZNF333     ZFR 0.264
#> 5:   ZNF28     ZFR 0.187
#> 6: ANKRD22     ZFR 0.193

wTO_p_sign = CorrelationOverlap(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 'p') %>% wTO(., sign = 'sign') %>% wTO.in.line()
head(wTO_p_sign)
#>     Node.1  Node.2    wTO
#> 1:  ZNF333   ZNF28 -0.099
#> 2:  ZNF333 ANKRD22 -0.185
#> 3:   ZNF28 ANKRD22  0.076
#> 4:  ZNF333     ZFR -0.117
#> 5:   ZNF28     ZFR -0.077
#> 6: ANKRD22     ZFR -0.036

wTO_s_sign = CorrelationOverlap(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 's') %>% wTO(., sign = 'sign') %>% wTO.in.line()
head(wTO_s_sign)
#>     Node.1  Node.2    wTO
#> 1:  ZNF333   ZNF28 -0.064
#> 2:  ZNF333 ANKRD22 -0.143
#> 3:   ZNF28 ANKRD22  0.029
#> 4:  ZNF333     ZFR -0.164
#> 5:   ZNF28     ZFR -0.011
#> 6: ANKRD22     ZFR  0.024
Using the wTO.Complete() function:

The usage of the function wTO.Complete() is straight-forward. No plug-in-functions() are necessary. The arguments parsed to the wTO.Complete() functions are the number k of threads that should be used for computing the ωi, j, the amount of replications, n, the expression matrix, Data, the Overlapping nodes, the correlation method (Pearson or Spearman) for the method_resampling that should be Bootstrap, BlockBootstrap or Reshuffle, the p-value correction method, pvalmethod (any from the p.adjust.methods), if the correlation should be saved, the δ is the expected difference, expected.diff, between the resampled values and the ωi, j and also if the diagnosis plot should be plotted.

wTO_s_sign_complete = wTO.Complete(k = 5, n = 250, Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 'p', method_resampling = 'Bootstrap', pvalmethod = 'BH', savecor = TRUE, expected.diff = 0.2, plot = TRUE)
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> Simulations are done.
#> Computing p-values
#> Computing cutoffs
#> Done!

The diagnosis plot shows the quality of the resampling (first two plots). The closer the purple line to the black line, the better. The ωi, j vs |ωi, j| shows the amount of ωi, j being affected by cancellations on the heuristics of the method, the more similar to a smile plot, the better. The last two plots show the relashionship between p-values and the ωi, j. It is expected that higher ω’s presents lower p-values.

The resulting object from the wTO.Complete() function is a list containing: * wTO an edge list of informations such as the signed and unsigned ωi, j values its raw and adjusted p-values. * Correlation values, also as an edge list * Quantiles, the quantiles from the empirical distribution and the calculated ω’s from the original data, for both signed and unsigned networks.

wTO_s_sign_complete
#> $wTO
#>        Node.1  Node.2 wTO_sign wTO_abs pval_sig pval_abs  Padj_sig
#>     1: ZNF333   ZNF28   -0.099   0.237    0.168    0.004 0.3607366
#>     2: ZNF333 ANKRD22   -0.185   0.269    0.188    0.016 0.3607366
#>     3: ZNF333     ZFR   -0.117   0.242    0.180    0.012 0.3607366
#>     4: ZNF333  TRIM33    0.007   0.241    0.136    0.008 0.3607366
#>     5: ZNF333   RIMS3   -0.325   0.409    0.144    0.000 0.3607366
#>    ---                                                            
#> 14024: ANAPC2   SBNO2   -0.147   0.298    0.156    0.000 0.3607366
#> 14025: ANAPC2  ZNF528   -0.142   0.222    0.152    0.016 0.3607366
#> 14026:  TIGD7   SBNO2   -0.297   0.354    0.128    0.004 0.3607366
#> 14027:  TIGD7  ZNF528   -0.099   0.219    0.212    0.032 0.3607366
#> 14028:  SBNO2  ZNF528    0.141   0.311    0.368    0.024 0.4030531
#>          Padj_abs
#>     1: 0.01167541
#>     2: 0.02395390
#>     3: 0.02083624
#>     4: 0.01712559
#>     5: 0.00000000
#>    ---           
#> 14024: 0.00000000
#> 14025: 0.02395390
#> 14026: 0.01167541
#> 14027: 0.03670450
#> 14028: 0.03016774
#> 
#> $Correlation
#>          Node.1   Node.2          Cor
#>     1:  FAM122B DEFB108B  0.366857931
#>     2:  FAM122B   CCSER2  0.278870911
#>     3: DEFB108B   CCSER2 -0.252482453
#>     4:  FAM122B     GPD2 -0.005649124
#>     5: DEFB108B     GPD2 -0.107064848
#>    ---                               
#> 35774:   TRIM23   ZNF528  0.054249174
#> 35775:   ZNF559   ZNF528 -0.218309729
#> 35776:   ANAPC2   ZNF528 -0.013821370
#> 35777:    TIGD7   ZNF528  0.011807143
#> 35778:    SBNO2   ZNF528  0.092317502
#> 
#> $Quantiles
#>                         0.1%  2.5%   10%  90% 97.5% 99.9%
#> Empirical.Quantile     -0.56 -0.46 -0.34 0.37  0.48  0.56
#> Quantile               -0.50 -0.40 -0.28 0.32  0.43  0.52
#> Empirical.Quantile.abs  0.21  0.24  0.27 0.47  0.53  0.57
#> Quantile.abs            0.17  0.19  0.21 0.41  0.47  0.53
#> 
#> attr(,"class")
#> [1] "wTO"  "list"
Using the wTO.fast() function:

The wTO.fast() function is a simplified verion of the wTO.Complete() function, that doesn’t return diagnosis, correlation, nor the quantiles, but allows the user to choose the method for correlation, the sign of the ω to be calculated and the resampling method should be one of the two Bootrastap or BlockBootstrap. The p-values are the raw p-values and if the user desires to calculate its correction it can be easily done as shown above.

fast_example = wTO.fast(Data = Microarray_Expression1, Overlap = ExampleGRF$x, method = 's', sign = 'sign', delta = 0.2, n = 250, method_resampling = 'Bootstrap')
#> There are 168 overlapping nodes, 268 total nodes and 18 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> ..........................................................................................................................................................................................................................................................Done!

head(fast_example)
#>     Node.1  Node.2    wTO  pval
#> 1:  ZNF333   ZNF28 -0.064 0.264
#> 2:  ZNF333 ANKRD22 -0.143 0.236
#> 3:   ZNF28 ANKRD22  0.029 0.188
#> 4:  ZNF333     ZFR -0.164 0.232
#> 5:   ZNF28     ZFR -0.011 0.256
#> 6: ANKRD22     ZFR  0.024 0.264

fast_example$adj.pval = p.adjust(fast_example$pval)

Metagenomic data

Along with the expression data, the wTO package also includes a metagenomics dataset that is the abundance of some OTU’s in bacterias collected since 1997. More about this data can be found at \[<https://www.ebi.ac.uk/metagenomics/projects/ERP013549>\].

The OTU (Operational Taxonomic Units) contains the taxonomy of the particular OTU and from Week1 to Week98, the abundance of that particular OTU in that week.

data("metagenomics_abundance")
metagenomics_abundance[2:10, 1:10]
#>                                                                                                            OTU
#> 2                            Root;k__Archaea;p__Euryarchaeota;c__Thermoplasmata;o__E2;f__MarinegroupII;g__;s__
#> 3                   Root;k__Bacteria;p__Actinobacteria;c__Acidimicrobiia;o__Acidimicrobiales;f__OCS155;g__;s__
#> 4         Root;k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Microbacteriaceae;g__;s__
#> 5                  Root;k__Bacteria;p__Bacteroidetes;c__Cytophagia;o__Cytophagales;f__Flammeovirgaceae;g__;s__
#> 6            Root;k__Bacteria;p__Bacteroidetes;c__Cytophagia;o__Cytophagales;f__Flammeovirgaceae;g__JTB248;s__
#> 7                          Root;k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__;g__;s__
#> 8            Root;k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Cryomorphaceae;g__;s__
#> 9  Root;k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Cryomorphaceae;g__Fluviicola;s__
#> 10        Root;k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Flavobacteriaceae;g__;s__
#>    Week1 Week2 Week3 Week4 Week5 Week6 Week7 Week8 Week9
#> 2      1     6     0     0     0     1     0     1     0
#> 3      0     0     0     0     0     0     0     5     0
#> 4      0     0     0     0     0     0     0     0     0
#> 5      0     1     0     0     0     0     0     0     0
#> 6      0     0     0     0     0     0     0     0     0
#> 7      0     1     0     0     0     0     0     1     0
#> 8      0     0     0     0     0     0     0     1     0
#> 9      0     0     0     0     0     0     0     1     0
#> 10     0     1     0     0     0     0     0     7     0

Before we are able to define the network, we have first to understand the patterns of autocorrelation of each species, and then define the lag, that will be used for the BlockBootstrap resampling in the wTO.Complete() or fast.wTO() functions. To define the lag, we use autocorrelation function acf().

row.names(metagenomics_abundance) = metagenomics_abundance$OTU
metagenomics_abundance = metagenomics_abundance[,-1]
par(mfrow = c(3,3))
for ( i in 1:nrow(metagenomics_abundance)){
  acf(t(metagenomics_abundance[i,]))
}

Because most of them have only a high autocorrelation with itself or maximum 2 weeks, we will use a lag of 2 for the blocks used in the bootstrap.

The functions wTO.fast() and wTO.Complete() are able to accomodate the lag parameter, therefore, they will be used here.

Meta_fast = wTO.fast(Data = metagenomics_abundance, Overlap = row.names(metagenomics_abundance), method = 'p', sign = 'sign', n = 250, method_resampling = 'BlockBootstrap', lag = 2)
#> There are 67 overlapping nodes, 67 total nodes and 98 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> ..........................................................................................................................................................................................................................................................Done!

Meta_Complete = wTO.Complete(k = 1, n = 250, Data = metagenomics_abundance, Overlap = row.names(metagenomics_abundance), method = 's' , method_resampling = 'BlockBootstrap', lag = 2 )
#> There are 67 overlapping nodes, 67 total nodes and 98 individuals.
#> This function might take a long time to run. Don't turn off the computer.
#> Simulations are done.
#> Computing p-values
#> Computing cutoffs
#> Done!