Skip to content

Genome Transcriptome Proteome Metabolome Interactome Phenome Functome

    Genome Transcriptome Proteome Metabolome Interactome Phenome Functome

    A crisis in postgenomic nomenclature. Fields and Johnston (2002)

    genome size of human and important model systems

    Smallest genome for an organism that can live on its own: 1300 genes of Pelagbacter ubique…has the largest biomass
    Others have genomes that are smaller (ex.Microplasm has 500 genes) however needs others to survive)

    Genomics – initially, acquisition of sequence •where (and what) are the genes? Post-genomics – functional analysis •got genes, what do they do? •how do they work? •how do you generate an integrated whole? I use GENOMICS to encompass all of this

    Differences: lie in number of genes, nature of genes or spatial organization/regulation/etc. Answer thought at the present time is the regulation (when turned off on)/organization etc.
    Arabdiopsisas a bit more genes than us (25 000)

    From this know: how many necessary, how amny required for basic survival, how many needed for free-survival etc. To understand a gene we need (at least):
    •Sequence (Regulatory regions, Coding region) •Time/cell type of transcription/processing/translation •Stability of message and gene product •post-translational processing •3D structure •Catalytic/structural function •Cellular location (including movement) •Biochemical pathway(s) and regulation •Physical interactions •Genetic interactions

    microarray = track transcriptomes
    Publications (through Sept. 2002)
    1800 1600 1400 1200 1000 800 600 400 200 0

    Genomics 83477 Proteomics 37261 Microarray 52353 through 2012
    Genomics Microarray Whole genome Transcriptome Proteomics Proteome

    -NCBI, PubMed, search titles, abstracts, keywords

    Growth of primary sequence database
    Currently >126 billion bp traditional + >191 billion bp in whole genome

    How is it generated?

    Budding yeast chromsome I, Bussey et al., 1995

    Currently >155 billion bp tradiional + > 418 billion bp in whole genomeWhole genome sequence is currently expanded asi t is easier to do now than targeted

    Sub-cloning fragments again and again, and than putting the genome back together. Budding yeast the most understood organism genetically; however alter discover twice the amount of genes than initially thought

    Became proof of principle for shot-gun sequencing
    Fleischmann et al., Science 269:496-512, 1995

    Dideoxy sequencing gel

    Shot gun sequencing for individual molecule

    Haemophilus influenzae – shotgun sequenced 1995 (1.8 mbp)

    Single molecule sequencing Helicose, single molecule fluorescent Single molecule, single well Proton detection $1000 dollar exome
    exomes = protein coding region Now 1000 dollar for the whole genome

    Didn’t sub-clone anything; went rather for isolating DNA, chop it and than sent robots to sequence them – didn’t care where it cam from, just sequnce it. Therefore, after while create overlaps, and sequence it enough and enough coverage,than can overlap it and get the whole sequence. This is how its done now = hence why it is so cheap. Don’t even bother with the coning part anymore.

    Only has to run four an hour; takes some computers than to put it together, and than go back if you missed a bit

    Sequencing the first human genome took $ billion? dollars.

    First individual sequences (Craig Venter and James Watson) were released a few years ago. Many are available now Genomes can be sequenced for a thousand dollars, some say for as little as $60. To some extent it depends on the accuracy required. All genomes will be available in the future – and cheap.

    Never get a 100% of it because every time create a gamete it has 40 – 60 unique mistakes.

    Can we identify all genes by computer analysis of the raw DNA sequence? NO! • We cannot unambiguously identify all genes, some are always missed (same for splice variants, even with lots of ESTs) • Even if sure that it is a gene, we often do not know its function. • Even for small fully sequenced genomes we have large numbers of genes with no substantial associated phenotype/function.

    Human diversity SNP = single nucleotide polymorphism

    Differ about one out of every 1500 base pairs between two people. Average 40.6 overlapping areas vs. 7.5 for earlier ones. Total to normal 3.0-3.8 million snps About 2.7-3.8 million were known Few hundred thousand novel (new mutations) are seen; getting significantly lower over time.

    Gene ontology – formal method for annotating genes
    by Biological process, Function and Cell component. This allows for easier cross comparisons of species/databases (data from 2008)

    For genomes in hand 1. Smallest, Carsonella ruddii 160 kb, 182 genes

    total genes Unknown process Unknown function

    S. cerevisiae S. pombe ~6000 ~5000 1793 2409 1064 2084 2151

    Mycoplasma genitalium 580 kb, 517 genes
    300 genes are essential ones

    2. Pelagibacter smallest free living 1300 genes

    Unknown cell component 1102 GO – gene ontology consortium

    2. One of largest, human (some plants larger) 3.2 x 109 bp, 23000 genes

    Protein Domain Architecture Way complex creatures differ from less complex = more functional domains, more regulation binding sites, more than one catalytic activity

    Differ by:
    Used to think is just junk

    How do we figure out what a gene does? Low throughput One gene = Many PhDs and a lot of money e.g. PTEN discovered in 1997 7162 papers on it and associated proteins as of 2012

    •Intergenic space (remnants of transposons?) •+/- Introns •Gene numbers •Gene families •Alternative splicing •Protein domain architecture •Regulation
    Think now almost 100 % of genes are alternatively spliced

    General strategies High throughput Automated testing and Whole genome analysis High through put analysis must always be followed by extensive low through put follow up High through put provides focus and leads Sequence analysis – hints of function homologs (paralogs and orthologs) conserved domain analysis (catalysis, interactors, regulators)
    Many species have the same genes, so therfore can say same function in the different species. Orthologs = different species are the same genes Paralogs = same gene within the same species

    Phenotypic analysis • individual, pedigree or population need a mutant • classical genetics • single or whole genome deletion sets • population analysis, SNP hunting
    Find a mutation; mutation defines the gene Restricted to systems with good genetics

    Expression analysis • Microarrays – RNA, RNA-seq • Proteomics – expressed and modified proteins
    Microarrays = look at transcripts being produced; put probes on a chip and hybridize the cDNA of the system In RNA-seq: get the sequence, frequency of occurence and all the alternative splices within the cell; don’t need to know anything unlike in microarrays
    Proteomics: where, when and what and for many of the proteins we currently don’t know

    If can find what proteins associate/bind with your original proteinmeans they are modifiers, regulators, or part of the same pathway. This is not always true, but in many instances can help. Build-a-mutant,

    Guilt by association
    Which proteins are in which complexes

    6000 gene deletions (bar coded to detect on microarray), lots of hands
    Subject them to any phenotypic test you can devise Technique is sensitive to relatively small contributions to fitness

    Genes and gene products with correlated expression, physical interaction, or colocalization may work in similar pathways •coIP, affinity strategies, often coupled to mass spectrometry •two hybrid •synthetic lethal screening •localization

    - -not going to look at specific data here

    Synthetic lethal screening: if two proteins can interact with one another, means the two pathways are probably connected

    Often only pick up the mutations that are lethal or extreme, often minor mutationsa re overlooked next to these big ones.

    Microarrays allow us to look at the expression of all genes at once. Cluster analysis lets us extract information from the data set

    Transcriptional profiling: basic tenet, pattern of gene expression at specific moment in time, reflects true physiological state of cell

    RNA-seq which provides information on expression level , splice variants and SNPs is a strongly competing technology at present

    Figure 2. Hierarchical cluster. A portion of a hierarchical cluster, which can be easily navigated, is shown. Red indicates up-regulation in the experimental sample, and green indicates down-regulation in the experimental sample, with respect to the control. The intensity of the color indicates the magnitude of up- or Sherlock et al., 2001 down-regulation

    Directly tackling the proteome

    Tag, isolate complexes of associated proteins and MS id them Argument: if they are stuck together they probably work together

    Separation coupled with automated MS for ID Ho et al., 2002 DNA damage path Blue – known Red – previously unknown

    There is probably a complete may for yeast cells.

    How do we figure out what a gene does?

    Take the high throughput data, analyse it extensively (data mining) – systems biology
    Try to predict something; testable hypothesis

    Then, choose a target One gene = Many PhDs and a lot of money
    Tong et al., 2001 synthetic lethality

    Asking about paired reactions, between different mutations in different pathways. Don’t know anything about the physical relationship, simple whatever pathway this protein is in it interacts with these other proteins. Output is looking at fitness (how well the cell works).

    •cheap sequence for all (or any we care about) organisms •an absolute phylogeny • linkage disequilibrium for all human (or other) genes making a substantial contribution to any given phenotype • full knowledge of biochemical pathways and protein interactions and modifications • prediction of cell and organism function from sequence (given the environmental variables) • things we cannot yet predict



    term papers to buy
    research papers