DATABASES |
|
NetPath (http://www.netpath.org): |
NetPath is currently one of the
largest open-source repository of human signaling pathways that is
all set
to become
a community standard to meet the challenges in functional genomics
and systems biology. Signaling networks are the key to deciphering
many of the complex networks that govern the machinery inside the
cell. Several signaling molecules play an important role in disease
processes that are a direct result of their altered functioning and
are now recognized as potential therapeutic targets. Understanding
how to restore the proper functioning of these pathways that have
become deregulated in disease, is needed for accelerating biomedical
research. This resource is aimed at demystifying the biological pathways
and highlights the key relationships and connections between them.
Apart from this, pathways provide a way of reducing the dimensionality
of high throughput data, by grouping thousands of genes, proteins
and metabolites at functional level into just several hundreds of
pathways for an experiment. Identifying the active pathways that
differ between two conditions can have more explanatory power than
just a simple list of differentially expressed genes and proteins.
A thorough data-mining of scientific literature was carried out to catalog
all the significant molecular interactions in single ligand-stimulated,
receptor-mediated signaling pathways. Apart from protein-protein interactions,
enzyme-substrate reactions, protein translocation events and gene regulation
were also cataloged. Each of the pathway reactions are linked to their
experimental evidence in the form of PubMed IDs of the respective articles
from which they were mined. However, taking into account the heterogeneity
inherent in experimental validation of different pathway reactions and
the diversity of publicly available data, a set of very stringent criteria
were applied for curation and for the generation of the pathway maps.
These pathways are freely available for download in various formats such
as, BioPAX, PSI-MI and SBML. The availability of data in different formats
allows interoperability between various pathway analysis software tools
such as Cytoscape and VISIBIOweb. In order to provide a better visual
interface of the molecular reactions in NetPath, pathway maps were generated
using PathVisio, which is an improved visualization tool incorporating
features of GenMAPP. These pathway maps are available through another
resource called NetSlim (http:www.netpath.org/netslim) that was also
developed at the Institute. The NetSlim versions of various pathways
can be downloaded in .gpml, .GenMAPP, .png and pdf formats.
|
|
Human Protein Reference Database (http://www.hprd.org/): |
The Human Protein Reference Database
(HPRD) represents a centralized platform to visually depict and integrate
information pertaining to each protein in the human proteome. It
contains manually curated scientific information pertaining to the
biology of most human proteins. The HPRD is a result of an international
collaborative effort between the Institute of Bioinformatics and
the Pandey lab at Johns Hopkins University in Baltimore, USA. The
National Center for Biotechnology Information provides link to HPRD
through its human protein databases (e.g. Entrez Gene, RefSeq protein)
pertaining to genes and proteins.
All the information in HPRD has been manually
curated by critical reading from published literature by expert
biologists who read, interpret and analyze the published data.
This resource depicts information on human protein functions including
protein–protein interactions, post-translational modifications,
enzyme-substrate relationships and disease associations. The protein–protein
interaction and subcellular localization data from HPRD have been
used to develop a human protein interaction network. Information
regarding proteins involved in human diseases is also annotated
and linked to Online Mendelian Inheritance in Man (OMIM) database.
HPRD was created using an object oriented database
in Zope, an open source web application server that provides versatility
in query functions and allows data to be displayed dynamically.
As HPRD continues to evolve with newer entries, the number of unannotated
genes and proteins is rapidly reducing consequently allowing us
to expand the scope of our curation data. The data from HPRD can
be freely accessed and used by academic users while commercial
entities are required to obtain a license for use.
|
|
Goals: |
• |
The main goal in creating HPRD was to curate the world's
literature on known and well characterized proteins which will inturn
create a centralized knowledgebase of protein data. |
• |
Create a more robust curation system. Curation systems need be continually
updated to include current research being done. |
• |
Enable future discoveries and empower scientists in their work. As
we move into Next Generation Sequencing technologies, world is less
focused on individual genes and instead the focus is more on high throughput
studies, involving thousands of genes at a time. |
• |
Study systems biology approaches and aid in biomarker discovery. |
• |
Perform complex queries involving multiple features of proteins. |
|
|
Highlights of HPRD are as follows: |
• |
From 10,000 protein–protein interactions
(PPIs) annotated for 3,000 proteins in 2003, HPRD has grown to over
39,194 unique PPIs annotated
for 30,047 proteins including more than 6,360 isoforms by the end
of 2012. |
• |
More than 50% of molecules annotated in HPRD have at least one PPI
and 10% have more than 10 PPIs. |
• |
Experiments for PPIs are broadly grouped into three categories namely
in vitro, in vivo and yeast two hybrid (Y2H). Sixty percent of PPIs
annotated in HPRD are supported by a single experiment whereas 26%
of them are found to have two of the three experimental methods annotated. |
• |
HPRD contains 18,000 manually curated Post-Translational Modifications
(PTM) data belonging to 26 different types of modifications. |
• |
All the phosphorylation based motifs for any protein of interest
can be analyzed using PhosphoMotifFinder in HPRD. This tool connects
the proteomic data in HPRD to over 320 experimentally proven phosphorylation
based motifs curated from literature. Phosphorylation is the leading
type of modification of protein contributing to 63% of PTM data annotated
in HPRD. |
• |
HPRD data is available for download in tab delimited and XML file
formats. |
• |
HPRD also integrates data from Human Proteinpedia, a community portal
for integrating human protein data. |
|
Milestones achieved and comparison with other
publicly available databases: |
• |
HPRD is currently one of the richest sources of various aspects of
PPI data as compared to other publicly available databases as shown
in a comparative study. |
• |
This is the only completely manually curated database that assimilates
PPIs, PTMs, subcellular localization, tissue expression, biological
motifs and domains derived from variety of experimental platforms. |
• |
HPRD database gets nearly 1,48,000 hits in a year and about 400 visitors
per day. |
• |
To date, it has been cited nearly 1,827 times by the scientific community
in literature. |
• |
To the best of our knowledge, data from HPRD, Human Proteinpedia
and RAPID databases are the only datasets from India that have been
incorporated into NCBI databases such as Entrez Gene and RefSeq. |
|
|
Human Proteinpedia (http://www.humanproteinpedia.org/): |
Human Proteinpedia was developed as a community
portal for sharing and integrating human proteomic data over the
world wide web. Through this portal, research labs all over the world
can contribute and upload their experimental data. This initiative
is an effort to bring together the entire biomedical community and
will enable dissemination of valuable proteomic data. This will empower
scientists to take advantage of information that is at presently
confined to particular research labs. Such a concerted effort will
help enrich this database and minimize redundancy inherent in most
other publicly available databases.
Data pertaining to post-translational modifications,
protein-protein interactions, tissue expression, expression in cell
lines, subcellular localization and enzyme substrate relationships
can be submitted to Human Proteinpedia. It even allows proteomic
investigators to share unpublished data and provides an effective
means of sharing such data.
Human Proteinpedia currently contains over 4.8 million
MS/MS spectra and ~2 million peptides and is an important resource
for cataloging proteotypic peptides (which serve as a unique identifier
of a given protein or isoform in tandem MS experiments) that can
be used for biomarker analysis using MRM (Multiple Reaction Monitoring).
Human proteinpedia also provides a list of phophopeptides
identified in Mass-Spectrometry based phosphoproteomic studies and
the phosphorylation or dephosphorylation data curated from literature
has been mapped to corresponding site and residue of sequences in
HPRD. This is useful to investigators in the development of phospho-specific
antibodies and peptide arrays.
Protein annotations present in Human Proteinpedia
are derived from a number technology platforms such as co-immunoprecipitation,
fluorescence based or western blotting or mass spectrometry based
experiments, immunohistochemical analysis, yeast two-hybrid or protein
and peptide microarrays.
|
|
Statistics to date: |
Annotations |
HPRD |
Human Proteinpedia |
Protein entries |
30047 |
15231 |
Mode of Data entry |
Manual curation by experts from literature |
Experimentally verified data over the web |
Number of contributing labs |
2 to 6 |
75 |
Protein Protein interactions |
39194 |
34624 |
PTMs |
93710 |
17410 |
Protein Expression |
112158 |
150368 |
Subcellular Localization |
22490 |
2906 |
Domains |
470 |
NA |
PubMed Links |
453521 |
NA |
MS/MS Spectra |
NA |
4855122 |
Number of experiments |
NA |
2710 |
|
|
Plasma Proteome Database (http://www.plasmaproteomedatabase.org/): |
The Plasma Proteome Database (PPD),
the first of its kind ensures a comprehensive resource for all human
plasma proteins. The database includes information pertaining to
isoform
specific expression, disease, localization, post translational
modification and single nucleotide polymorphism. The information
provided in this
database is through manual annotation done by exhaustive mining
of published literature.
|
|
Statistics
to date |
|
Unique Genes |
9,297 |
Proteins & Isoforms |
15,747 |
PTMs |
40,997 |
PubMed Links |
24,838 |
|
|
Other Databases Developed at IOB: |
Resource of Asian Primary Immunodeficiency
Diseases (http://rapid.rcai.riken.jp/RAPID): |
Resource of Asian Primary Immunodeficiency Diseases (RAPID) is a web-based
compendium of molecular alterations in primary immunodeficiency diseases.
Detailed information about genes and proteins that are affected in
primary deficiency diseases is presented along with other pertinent
information about protein-protein interactions, microarray gene expression
profiles in various organs and cells of the immune system and mouse
studies. RAPID also hosts a tool, the mutation viewer, to predict deleterious
and novel mutations and also to visualize the mutation positions on
the DNA sequence, protein sequence and three-dimensional structure
for PID genes. The information in this database should be useful to
researchers as well as clinicians.
RAPID is a result of collaboration between the Institute of Bioinformatics
and Immunogenomics research group at RIKEN Research Center for Allergy
and Immunology in Yokohama, Japan.
|
|
India Cancer Research Database (http://www.incredb.org/): |
India Cancer Research Database (ICRD) provides details of scientists
and physicians involved in cancer research in India along with the
information about their areas of expertise, research publications
and funded grants. The main goal of the database was to foster collaborations
among researchers and to provide a snapshot of ongoing research initiatives
and activities in India.
|
|
TBnet (http://tbnetindia.ibioinformatics.org/): |
TBNet India was developed by IOB as an initiative by Department of
Biotechnology, Government of India, with active collaboration from
13 institutions all across India. This resource places special focus
on Indian contributions to research and issues related to tuberculosis.
M. tuberculosis is a gram-positive bacterium which causes tuberculosis,
the leading cause of infectious disease mortality. The M. tuberculosis
genome was sequenced in 1998. About 1.5 million people die from tuberculosis
each year, and it is thought that as many as 2 billion people (one
third of the human population) may be infected with M. tuberculosis.
It is estimated that 80% of the Asian and African population test
positive in tuberculin tests while only 5-10% of the United States
population test positive. People with compromised immunity, largely
due to high rates of HIV infection have higher chances of developing
the disease. This problem is compounded by appearance of drug resistant
TB strains, including strains with multiple drug resistance (MDR)
and, more recently, strains with extensive drug resistance (XDR),
which are much more difficult to treat, posing a significant public
health threat. Tuberculosis has an estimated mortality rate of ~49
per 100,000 people per year in India. TBNet India endeavors to gather
clinical, epidemiological and molecular data and make it available
to the biomedical community.
|
|
|