The NIMH-NCI Protein-Disease Database* * * * Working D R A F T [under construction] * * * * C. Merril(1), M. Goldstein(2), J. Creed(1), J. Myrick(3), P.F. Lemkin(4) (1)LBG/NIMH, (2)Monoclonetics Inc., (3)CDC/Atlanta, (4)IPS/LMMB/NCI/FCRDC,
The examination of body fluids for diagnostic markers in disease
states dates back to antiquity. However, the study of protein
alterations in specific body fluids such as serum and plasma was begun
in earnest shortly after the turn of this century, with most of the
observations occurring in the last three decades.
For example, the acute phase proteins (APPs) are generally defined as
those whose concentrations or activitiess increase or decrease by 25%
or more in response to inflammation [ManK93]. Most clinicians currently evaluate
only a few specific APPs in body fluids such as plasma or serum when
they suspect a disease state. This approach has two major flaws:
first, when an error is made in measuring the quantity of the specific
protein for which an assay was requested the physician may reach an
erroneous conclusion and second, if the physician has displayed a bias
in the diagnostic process, such that he/she requests the examination
of a protein not affected by the actual disease state the results
obtained will be of little or no use in diagnosing the underlining
illness.
The primary purpose of this relational database is to facilitate
quantitative and qualitative comparisons of proteins in human body
fluids in normal and disease states. For decades researchers and
clinicians have been studying proteins in body fluids such as serum,
plasma, cerebrospinal fluid and urine. Currently, most clinicians
evaluate only a few specific proteins in a body fluid such as plasma
when they suspect that a patient has a disease. Now, however, high
resolution two-dimensional protein electrophoresis allows the
simultaneous evaluation of 1,500 to 3,000 proteins in complex
solutions, such as the body fluids. This and other high resolution
methods have encouraged us to collect the clinical data for the body
fluid proteins into an easily accessed database. In addition, this
database will provide a linkage between the disease-associated protein
alterations and images of the appropriate proteins on high-resolution
electrophoretic gels of the body fluids. This effort requires the
normalization of data to account for variations in methods of
measurement.
Initial efforts in the establishment of this database have been
concentrated on alterations in the acute-phase proteins in individuals
with acute and chronic diseases. Even at this early stage in the
development of our database, it has proven to be useful as we have
found that there appear to be several common acute-phase protein
alterations in the plasma and cerebrospinal fluid from patients with
Alzheimer s disease, schizophrenia and major depression. Our goal is
to provide access to the database so that systematic correlations and
relationships between disease states can be examined and extended.
For example (3), one possibility is to first find protein pattern
changes for each of the two types of meningitis, and for (4) to find
the patterns for infections and tumors. Then, compare the sets of
(protein,fold-change)s both qualitatively and quantitatively checking
to see which pattern most closely matches that of a patient in
question.
The PDD has the capability of finding protein patterns for different
diseases and then computing and reporting several metrics and plots of
differences of patterns between the set of diseases.
Although many of the protein changes listed in our data were assayed
by non-electrophoretic techniques such as immune assays or in some
cases merely interaction with specific substances and sedimentation
rates, this information can still be used in our database.
Difficulties do arise when the proteins were assayed in specific
enzyme or other units which measure protein activity rather than
concentration. This problem can be overcome by converting these
test-specific units to relative increases or decreases (e.g.
three-fold increase, two-fold decrease) thereby normalizing the
variations for the different proteins. Another difficulty is that, at
present, not all of the interesting proteins in the APPs have been
linked to the gel map images.
[PutF84] Putnam FW, (Ed), The Plasma Proteins -
structure, function, and genetic control, Vol 4, Academic Press,
NY, (1984).
[GolD80] Goldman D, Merril C, Ebert M, Two-dimensional
electrophoresis of cerebrospinal fluid proteins, Clin. Chem.,
26: 1371-1322 (1980).
[HocD93] Hochstrasser, D., Tissot, J., "Clinical
Applications of two-dimensional gel electrophoresis", in Advances
in Electrophoresis - Vol 6, A. Chrambach, M.J. Dunn, B.J. Radola
(Eds), VCH Pub., NY, pp 267-375, (1993).
[CelJ84] Celis J (Ed), Electrophoresis in Cancer Research,
Electrophoresis, 15: 305-556 (1994).
Introduction
This document introduces the NIMH-NCI Protein-Disease Database or PDD.
We first discuss the rational for the database and then discuss its
its initial implementation and future expansion.The Protein-Disease Database (PDD) System
These observations stimulated the development of a protein-disease
relational database system which will allow the correlation of APP
changes in disease states to alterations in APP patterns observed in a
high-resolution electrophoretic gels. The system is being developed as
a joint collaboration between groups at the NIMH and the NCI. The
software for the PDD (Browser-WWW graphical user interface and
relational database servers) is being developed by the Image
Processing Section of the Laboratory of Experimental and Computational
Biology of the NCI at FCRDC. The acquisition of the literature based
data, data entry and spot identification in 2D gels is being done by
the Laboratory of Biochemical Genetics of the NIMH.
Primary types of Queries
The most critical type of queries that are addressed by this database
are given by these examples and involve quantitative changes:
Other types of Queries
In addition to the above critical queries, there are other types of
questions. For example, you should be able to answer questions of the
type:
2D gel protein databases
A number of individuals have initiated databases concerned with
proteins in body fluids. Among the pioneers in this effort are the
Andersons with their serum and plasma databases [PutF84], and Goldman and Merril with their
cerebrospinal fluid database [Gol80]. These
databases are primarily concerned with the identification of the
spots, however, and no serious effort has been made to correlate
changes in the APPs to disease states. This task of relating the APP
database to disease states is made easier by the fact that a fairly
vast body of information concerning plasma and serum protein changes
in disease states has been compiled over the past two decades. To
take advantage of this, we have begun to establish our own plasma and
serum literature library from which we are extracting data for the
database. A recent review by Hochstrasser etal. shows the current
status of a number of tissue and body fluid 2D gel databases with many
proteins identified [HocD93] with much of
this data network accessible via the WWW ExPAsY server.Literature based protein-disease data
One feature of the database is that is will allow the clinician or
researcher to quickly determine where the information (in the
database) came from by making the references for each of the data
points readily available. For example, a "spot" may be selected from
the gel image and physical data on that specific protein will be
immediately at hand. This and other queries are secondary, however,
the to primary mission of the database: to relate quantitative changes
in APP patterns to disease states. Again, the model query - "If I
suspect a patient to have lupus erythematosis, what quantitative
changes in the APP's should I observe?" Along with providing
diagnostic support, this type of query may show a relationship between
diseases which are seemingly unrelated and which may not have been
found otherwise. For schizophrenia and Alzheimer's disease, the serum
concentration of the APP haptoglobin has been seen to increase in both
cases. It's interesting that despite all the clinical tests perform
ed is hospitals, little effort has been made to measure the
correlation of changes in the APPs to disease states; despite the fact
that such an approach would offer more data to aid in the diagnosis of
disease states.The Acute Phase Proteins - as a pilot database for the PDD
For our feasibility study, we have limited the initial database to
address primarily the acute phase proteins (and a few others). As we
work out the bugs in the PDD methods, the database will be expanded to
include other proteins. The database is expected to grow at a rate
that will allows the PDD methods presented here to be scalable. Details
of its implementation are discussed in other documents.The Cancer associated proteins found in body fluids
Cancer associated protein markers found in body fluids are also being
entered into the PDD. Several 2D gel databases relating to cancer
research are given in a special issue of Electrophoresis [CelJ84].Links to other 2D gel and literature databases
We are also taking advantage of many of the existing and
well-maintained protein databases that are currently accessible from
the Internet. Protein databases include: The ExPASy SWISS-PROT, SWISS-2DPAGE and
SWISS-3DIMAGE, as well as NCBI-DNA, NCBI-mRNA, PIR and GDB. We are
developing links into other databases through the Internet and World
Wide Web links to ExPASy and the National Library
of Medicine's NCBI network version of ENTREZ will be accessable
from the PDD. Links resulting from a PDD search can then be visited
using your browser with respect to specific proteins, literature references,
etc.
References
[ManA93] Mackiewicz A, Kushner I, Baumann MH, (Eds),
Acute Phase Proteins - Molecular Biology, Biochemistry, and
Clinical Applications, CRC Press, Boca Raton, FL, pp 4-5
(1993).
$Date: 1997/05/19 18:19:54 $ /
pdd@ncifcrf.gov