Thursday, October 6, 2011

How to extract promoter sequences using UCSC Genome Browser

Go to http://genome.ucsc.edu/

Click Genomes
Under Position or search term type in the name of the gene of interest, e.g. ATP13A2
Submit

Under RefSeq (or UCSC) genes select the RNA isoform. Usually, they have the same promoters, thus it doesn't matter which one you chose. However, they may have alternative transcription start sites and thus alternative promoters and in this case it is important which one you chose.

Click e.g. on ATP13A2 at chr1:17312453-17338423
Under RefSeq genes track there are 3 RNA isoforms for this gene. Arrows on the transcripts mark the direction of transcription. It is possible either to scroll through the genome with the arrows at the top, or with the mouse (click and drag). It is possible also to zoom/unzoom. In the case of this gene the start of the isoforms is on the left side of the UCSC browser screen. They all have the same start site and promoter region.

Click on one of them, on the side with the blue it is marked which isoform has been chosen previously.
Click Tools/Table Browser
Set Output format to Sequence
Click Get Output 
Click Genomic

Select Promoter/Upstream by 1000 bases, under default the region is 1kb long.
Get Sequence

This is the resulting output in FASTA format:
>hg19_refGene_NM_022089 range=chr1:17338424-17339423 5'pad=0 3'pad=0 strand=- repeatMasking=none
GGGCATCTCTGTTTTGTTCACTCTCACGTTCCTGGCTCCCACAGTGGTGC
TGGCACTCCATAAATTCATACTGGACAAATGACCAGTGATCCTGCCCACG
TCTATCCTTGGCCCTAACGTGAACCTTCCCTTGTTTGCCCTAGGATCTCA
CAGATCCACTCTCCCCTTTGGAGCTCCTCTGTCCAGAGGTCCTGGAGACA
GGGAACACTATGCCTGTGCCGGTCACTATGGGGTAGCAGGATAAATGCTG
CACGCAGGTAAGACATCTCTGGTGCCTTTCAGGGGTCTTCATGAATCCCC
CCAGTGCAGGTGTGTCTGCAGGTCACGCTGTGAGGCTTCTCTTTCTCTGG
CATTTCAAGGCCTCCAGTGCATCATGGCAGACTCTCCCTGGGTCTCTGTG
GGCCCACATCTGACCCTGTGAATCTCTGGGCAGGTGTGTCTTCCTTCTTT
GCATTCATCAAGCAGCGCCAGGCACTGTCTCAAGTGCTTGATACGCATTT
TCTTTTTTTTTTTTTTTTGAGACAGAGTTTCGCTCTTTTTGCCCATGCTG
GAGTGCAATGGCATGATCCCGGCTCACAGCAACCTCCGCCTCCTGGGTTC
AAGCGATTCTCCTGGCTTAGTCTCCGAGTAGCTGGGATTACAGGCATGCG
CCACCACGCCCAGTTGACAAGCATTTTCTTGTGTGATCCTCGCAGCAGTT
CTCTGTGAAGCAGGCATTGCTATCCTACAGGTTGGGAAAAGAGGGCCAGA
GAGGATCAGTGACTTGCTCACGGTCGCGCGGCCTGGAAGACATGGAGAGC
TGGACCAGCACGCACAGTCCCTAACCACTGGGACGTGCTGGCGGGGGCTA
CCTCCGTGAGGTGTGTGTCTCCGGTCGCCCCGCCCCCGGTGTGTGCGGAG
GAGCAGGCGGGGACTACAAGTCCCGGCAGCCCCGGCGCGGGCGCTGCGAG
GGCCGCAGAGGGCCGGGCGGGGCTTGCGGCGCGCACGGAGGGACTGCGGC

No comments:

Post a Comment