NIA Array Analysis Tool

Normalization of Input File

Our software can normalize data coming from 1-color arrays (e.g., with radioactive label). Although the same normalization method can be used for 2-color arrays, we never tested it with 2-color arrays. Most companies that produce 2-color arrays also provide software for image analysis and data normalization. It is better to use commercial software for normalization of 2-color arrays than our tool.

Our normalization tool implements the non-parametric method that equalizes multiple quantiles of the probability distribution of gene expression.
Step1: log-transform all the data
Step2: in each column estimate 15 quantiles that correspond to ratios: 1/30, 3/30, ... 29/30.
Step3: estimate 15 target quantiles as average quantiles across all columns.
Step4: transform data using a piece-linear function that converts actual quantiles in each column into target quantiles. Data above the highest quantile is transformed based on the linear function between two highest quantiles.
Step5: back-transform data with exponent function.

In some cases original data has non-uniform distribution at the lower end; thus estimated quantiles may be not reliable. Thus, for each column we determine the lowest reliable quantile based on the following condition: the quantile is reliable if the difference between it and the next higher quantile is not >2 times greater and not <2 times smaller than between corresponding target quantiles. Then transformation of data below the smallest reliable quantile is based on the linear function for this quantile.

Split-Normalization of Input File

Split-normalization means that mormalization is done for a subset of genes at a time. The major reason for using split-normalization is that all genes do not fit on one array. In this case the full set of genes may be represented by 2 or more arrays that require independent normalization. For split-normalization the input file must have array indexes in the second column as shown below:

Example of an input file ready for split-normalization

Geneid
Array
index
Control
rep1
Control
rep2
Control
rep3
Treatment1
rep1
Treatment1
rep2
Treatment1
rep3
Treatment2
rep1
Treatment2
rep2
Treatment2
rep3
1
1
180
314
296
433
182
311
397
566
361
2
1
6780
17085
9223
18468
8623
15019
17588
24026
21732
3
1
16592
15161
10476
16790
9752
10316
13885
19448
14564
4
1
2896
239
101
59
29
53
185
198
124
5
2
5496
4283
6635
4912
4459
3175
8050
5973
6297
6
2
9708
18958
7701
19469
5279
13767
9876
23492
15672
7
2
5342
1548
2494
1222
1490
891
2566
1438
3155

If you suspect a gradient of background (e.g. from left to right) on one array, then you can try splitting this array into several portions (e.g., vertical sections) and do split-normalization. However, if you have non-uniform background we suggest you using other software for normalization.