The NIMH-NCI Protein-Disease Database

The NIMH-NCI Protein-Disease Database

   * * * * Working  D R A F T  [under construction] * * * *

   C. Merril(1), M. Goldstein(2), J. Creed(1), J. Myrick(3),
   P.F. Lemkin(4)

  (1)LBG/NIMH, (2)Monoclonetics Inc., (3)CDC/Atlanta,
  (4)IPS/LMMB/NCI/FCRDC,

Introduction

This document introduces the NIMH-NCI Protein-Disease Database or PDD. We first discuss the rational for the database and then discuss its its initial implementation and future expansion.

The examination of body fluids for diagnostic markers in disease states dates back to antiquity. However, the study of protein alterations in specific body fluids such as serum and plasma was begun in earnest shortly after the turn of this century, with most of the observations occurring in the last three decades.

For example, the acute phase proteins (APPs) are generally defined as those whose concentrations or activitiess increase or decrease by 25% or more in response to inflammation [ManK93]. Most clinicians currently evaluate only a few specific APPs in body fluids such as plasma or serum when they suspect a disease state. This approach has two major flaws: first, when an error is made in measuring the quantity of the specific protein for which an assay was requested the physician may reach an erroneous conclusion and second, if the physician has displayed a bias in the diagnostic process, such that he/she requests the examination of a protein not affected by the actual disease state the results obtained will be of little or no use in diagnosing the underlining illness.

The primary purpose of this relational database is to facilitate quantitative and qualitative comparisons of proteins in human body fluids in normal and disease states. For decades researchers and clinicians have been studying proteins in body fluids such as serum, plasma, cerebrospinal fluid and urine. Currently, most clinicians evaluate only a few specific proteins in a body fluid such as plasma when they suspect that a patient has a disease. Now, however, high resolution two-dimensional protein electrophoresis allows the simultaneous evaluation of 1,500 to 3,000 proteins in complex solutions, such as the body fluids. This and other high resolution methods have encouraged us to collect the clinical data for the body fluid proteins into an easily accessed database. In addition, this database will provide a linkage between the disease-associated protein alterations and images of the appropriate proteins on high-resolution electrophoretic gels of the body fluids. This effort requires the normalization of data to account for variations in methods of measurement.

Initial efforts in the establishment of this database have been concentrated on alterations in the acute-phase proteins in individuals with acute and chronic diseases. Even at this early stage in the development of our database, it has proven to be useful as we have found that there appear to be several common acute-phase protein alterations in the plasma and cerebrospinal fluid from patients with Alzheimer s disease, schizophrenia and major depression. Our goal is to provide access to the database so that systematic correlations and relationships between disease states can be examined and extended.

The Protein-Disease Database (PDD) System

These observations stimulated the development of a protein-disease relational database system which will allow the correlation of APP changes in disease states to alterations in APP patterns observed in a high-resolution electrophoretic gels. The system is being developed as a joint collaboration between groups at the NIMH and the NCI. The software for the PDD (Browser-WWW graphical user interface and relational database servers) is being developed by the Image Processing Section of the Laboratory of Experimental and Computational Biology of the NCI at FCRDC. The acquisition of the literature based data, data entry and spot identification in 2D gels is being done by the Laboratory of Biochemical Genetics of the NIMH.

Primary types of Queries

The most critical type of queries that are addressed by this database are given by these examples and involve quantitative changes:

  1. If proteins A,B,C,& D are increased, and proteins H & K are decreased, what disease entities best correlate to this pattern of change?
  2. For a given disease, specifically what pattern changes might be expected for proteins A, D & H?
  3. I know the patient has meningitis, but is it viral or bacterial? What differences would I see in the APP patterns in each case?
  4. I know the patient has a lesion in the lung, but is it an infection or a tumor?

For example (3), one possibility is to first find protein pattern changes for each of the two types of meningitis, and for (4) to find the patterns for infections and tumors. Then, compare the sets of (protein,fold-change)s both qualitatively and quantitatively checking to see which pattern most closely matches that of a patient in question.

The PDD has the capability of finding protein patterns for different diseases and then computing and reporting several metrics and plots of differences of patterns between the set of diseases.

Other types of Queries

In addition to the above critical queries, there are other types of questions. For example, you should be able to answer questions of the type:

Diseases
  • What diseases have fold-changes with protein K? - fold changes of > 5X?
  • What disease have fold-increases > 2.0X for protein A & B, and fold-decreases < 2.5X for protein C & D?
  • Which diseases have a 50% fold-decrease (0.5X) in this protein? - in `all' the proteins in the database?
  • Compare protein patterns which change for diseases A & B

    Proteins
  • What proteins have fold-changes associated with this disease? How much do they change? How much do they change in plasma, and in urine?
  • Which proteins change by more than 50% fold-increase (1.5X) in this disease?
  • Where are these proteins in the serum (urine, etc) reference 2D gel map?
  • What is the name of this protein I am pointing to in the 2D gel map? What diseases is it involved in?
  • Show 2D gel map of proteins which increase in this disease in red and those which decrease in blue.

    Literature references
  • What literature references discuss these proteins in the context of these diseases?
  • What literature references for disease Q had more than N patients in the study?
  • Who has worked on this protein?
  • 2D gel protein databases

    A number of individuals have initiated databases concerned with proteins in body fluids. Among the pioneers in this effort are the Andersons with their serum and plasma databases
    [PutF84], and Goldman and Merril with their cerebrospinal fluid database [Gol80]. These databases are primarily concerned with the identification of the spots, however, and no serious effort has been made to correlate changes in the APPs to disease states. This task of relating the APP database to disease states is made easier by the fact that a fairly vast body of information concerning plasma and serum protein changes in disease states has been compiled over the past two decades. To take advantage of this, we have begun to establish our own plasma and serum literature library from which we are extracting data for the database. A recent review by Hochstrasser etal. shows the current status of a number of tissue and body fluid 2D gel databases with many proteins identified [HocD93] with much of this data network accessible via the WWW ExPAsY server.

    Although many of the protein changes listed in our data were assayed by non-electrophoretic techniques such as immune assays or in some cases merely interaction with specific substances and sedimentation rates, this information can still be used in our database. Difficulties do arise when the proteins were assayed in specific enzyme or other units which measure protein activity rather than concentration. This problem can be overcome by converting these test-specific units to relative increases or decreases (e.g. three-fold increase, two-fold decrease) thereby normalizing the variations for the different proteins. Another difficulty is that, at present, not all of the interesting proteins in the APPs have been linked to the gel map images.

    Literature based protein-disease data

    One feature of the database is that is will allow the clinician or researcher to quickly determine where the information (in the database) came from by making the references for each of the data points readily available. For example, a "spot" may be selected from the gel image and physical data on that specific protein will be immediately at hand. This and other queries are secondary, however, the to primary mission of the database: to relate quantitative changes in APP patterns to disease states. Again, the model query - "If I suspect a patient to have lupus erythematosis, what quantitative changes in the APP's should I observe?" Along with providing diagnostic support, this type of query may show a relationship between diseases which are seemingly unrelated and which may not have been found otherwise. For schizophrenia and Alzheimer's disease, the serum concentration of the APP haptoglobin has been seen to increase in both cases. It's interesting that despite all the clinical tests perform ed is hospitals, little effort has been made to measure the correlation of changes in the APPs to disease states; despite the fact that such an approach would offer more data to aid in the diagnosis of disease states.

    The Acute Phase Proteins - as a pilot database for the PDD

    For our feasibility study, we have limited the initial database to address primarily the acute phase proteins (and a few others). As we work out the bugs in the PDD methods, the database will be expanded to include other proteins. The database is expected to grow at a rate that will allows the PDD methods presented here to be scalable. Details of its implementation are discussed in other documents.

    The Cancer associated proteins found in body fluids

    Cancer associated protein markers found in body fluids are also being entered into the PDD. Several 2D gel databases relating to cancer research are given in a special issue of Electrophoresis [CelJ84].

    Links to other 2D gel and literature databases

    We are also taking advantage of many of the existing and well-maintained protein databases that are currently accessible from the Internet. Protein databases include: The ExPASy SWISS-PROT, SWISS-2DPAGE and SWISS-3DIMAGE, as well as NCBI-DNA, NCBI-mRNA, PIR and GDB. We are developing links into other databases through the Internet and World Wide Web links to ExPASy and the National Library of Medicine's NCBI network version of ENTREZ will be accessable from the PDD. Links resulting from a PDD search can then be visited using your browser with respect to specific proteins, literature references, etc.


    References

    [ManA93] Mackiewicz A, Kushner I, Baumann MH, (Eds), Acute Phase Proteins - Molecular Biology, Biochemistry, and Clinical Applications, CRC Press, Boca Raton, FL, pp 4-5 (1993).

    [PutF84] Putnam FW, (Ed), The Plasma Proteins - structure, function, and genetic control, Vol 4, Academic Press, NY, (1984).

    [GolD80] Goldman D, Merril C, Ebert M, Two-dimensional electrophoresis of cerebrospinal fluid proteins, Clin. Chem., 26: 1371-1322 (1980).

    [HocD93] Hochstrasser, D., Tissot, J., "Clinical Applications of two-dimensional gel electrophoresis", in Advances in Electrophoresis - Vol 6, A. Chrambach, M.J. Dunn, B.J. Radola (Eds), VCH Pub., NY, pp 267-375, (1993).

    [CelJ84] Celis J (Ed), Electrophoresis in Cancer Research, Electrophoresis, 15: 305-556 (1994).


    $Date: 1997/05/19 18:19:54 $ / pdd@ncifcrf.gov