1. General Information

Details of the used methods to calculate the presented data can be found in the corresponding publication by Müller et al. 2012.
This page is intended to give an easy access and especially the opertunity to visualize the proteome data in the UCSC archaeal genome browser. Hence all available files follow in principle the UCSC bed and gff format. In addition we provide large data files like the databases used for the peptide search and orginial output formats like the rcd file of RNAcode.

2. UCSC Data Integration

Three alternatives to load the data into the UCSC are possible:
  1. UCSC Track Hub
    • go to the UCSC trackhub page
    • open the "My Hubs" tab and paste
      "http://www.bioinf.uni-leipzig.de/Publications/SUPPLEMENTS/12-023/pyloriHub/hub.txt" into the URL field
    • click the Add Hub button.
    • if everything worked out you should see a new track hub which you can now load using the button "Load Selected Hubs"
  2. Direct Link
    • Click here to directly load the RNAcode and proteome UCSC tracks into the UCSC browser.
  3. Manual bed File Uplaod
We suggest to switch the "Base Position" track to full in order to show a genome sequence translation on top of the main UCSC graphic. This can be done in the "Track controls" under the main graphic which displays a genomic region and selected tracks.

Please note that the direct data upload (posibility 2 and 3) may take a while due to the huge amount of data. Hence, we suggest to use possibility 1) the UCSC track hub integration.

3. Data Download

3.1 Proteome Data

3.1.1 Database Construction

In order to generate a comprehensive database for the Mascot analysis the Helicobacter pylori genome has been translated in all six reading-frames. For each frame nucleotide triplets are trans- lated into the corresponding amino acid. If a triplet contains non-canonical nucleotides, i.e. other than A, C, G and T, it is translated into X, which has no encoding in the amino acid space. The amino acid chain is terminated if a triplet encodes a canonical stop codon. All chains shorter than six amino acids are rejected. The database contained this six-frame translation, all NCBI annotated amino acid sequences and a set of decoy sequences. The decoy was generated by reversing the annotated sequences.

Data SetFile
Six-frame translation:multi fasta file
Complete database:multi fasta file
Extended database:multi fasta file

3.1.2 Identification tables

Data SetExcel FileCSV Files
Protein identification list:xlsx file
Peptide identification list:xlsx filetar.gz file
Signal peptide identification list:xlsx filetar.gz file

Supporting Material for novel protein annotations and corrections including validation by MS/MS spectra is summarized int the supplemental PDF

3.1.3 Peptide mapping

The experimentally determined peptide fragments (PFs) were mapped with tblastn to the H. pylori genome. Only perfect and full length sequence matches were used for subsequent analysis.

Data SetSequence FileMapping File
Proteome data:multi sequence fileUCSC bed file
Extended database search:multi sequence fileUCSC bed file
Signal peptide cleavage data:multi sequence fileUCSC bed file
Jungblut et al. proteome data:multi sequence fileUCSC bed file
Jungblut et al. signal peptide cleavage data:multi sequence fileUCSC bed file

3.2 RNAcode Data

System call: RNAcode -o OUTPUT.rcd --stop-early -p 0.05 INPUT

Data SetFile
Genome wide RNAcode predictions full data set: rcd file
Genome wide RNAcode predictions full data set: UCSC bed file
Short ORF canidates based on RNAcode predictions: UCSC bed file

How to cite

If you use the data of this web site please reference:
Stephan A. Müller et al., Identification of new protein coding sequences and signal peptidase cleavage sites of Helicobacter pylori strain 26695 by proteogenomics, Journal of Proteomics, accepted.

Last modified: 2013-07-16 10:29 sven