FILES For each experiment, tab delimited file(s) is produced to represent the results of the study. This file has a header line: \t The columns are formatted as follows: localID \t sequence\t geneID( or transcriptID) \t affinity_in_sample1 \t affinity_in_sample 2 etc. The file name is chosen as follows: Exp_RBP.txt Each experiment has an associated README file that contains more information about its data file(s). The README file contains the following information: - the source of the data (table 1, supplementary figure 1, GEO XXX etc.) is provided. - the description of columns and sometimes some background information about the study - how sequences corresponding to given IDs are retrieved --which database and which ID is used etc. SEQUENCE RETRIEVAL GEO datasets For experiments with Gene Expression Omnibus (GEO) accession numbers, the corresponding data set is downloaded from GEO website in series matrix format (http://www.ncbi.nlm.nih.gov/geo/). Given gene (or transcript) IDs are used to retrieve sequences corresponding to those genes. Signal intensities (or log ratios, p-values etc.) and sequences are then combined in a tab delimited text file, which compactly represents the experimental results. ArrayExpress datasets Similar to the procedure above, array design and intensities are downloaded from the associated website. Gene IDs (GenBank or RefSeq) are used to retrieve the corresponding sequences. If there is any quantitative data on these genes, it's included in the tab delimited text file. If these genes are known to be enriched in a condition with relation to the control, but no other quantitative information is provided, then a default value of 1 is reported as affinity. If there are control genes, those are used as the background and an intensity value of 0 is reported for its affinity. Papers that give a set of genes Gene IDs are used to retrieve corresponding sequences. If the IDs are GenBank accession numbers or RefSeq IDs, or GI numbers, BioPerl is used to download sequences from NCBI. Final Step For human and mouse experiments, all the transcripts for human and mouse are downloaded from NCBI. Previously retrieved sequences for all the datasets are blasted against all the transcripts (reference set of transcripts from NCBI) and the transcript with the largest match is retrieved. For more information about a specific experiment, please refer to its README file.