Overview

The research is not intended to affect the participants in any way and is designed to be of minimal intervention and discomfort. The study is observational and should have no impact on the care received by the patient.

The key goal of the research is the derivation of diagnostic and prognostic signatures that can be used to assist health care professionals in diagnosis and treatment plan selection. The intent is to commercialise the signatures after validation.

Purpose and Design

The primary goal of the study is to derive molecular diagnostic signatures that are diagnostic or prognostic of disease state or outcome. The technology used is generic and can reliably measure thousands of analytes simultaneously. The largest proportion of these analytes will be proteins.

Biosignatures has a patent applied for analysis platform that can take the thousands of measures and derive combinations of analytes that when used together can produce high performance signatures of biological status.

Most diagnostic tests available today (e.g. prostate specific antigen (PSA) used in prostate cancer) rely on the absolute concentration levels of a single of very few analytes and as such can have reduced sensitivity due to biological variability (such as haemodilution in obese males). Single markers are usually termed 'biomarkers'. In a recent study Anderson et al reviewed the literature and observed there are nine FDA approved blood plasma based biomarkers (e.g. tryponin, CA-125) and 1251 biomarkers reported in the literature that should be observable in plasma but that have not been clinically validated [1][2]. Further research has shown that longitudinal use of these analytes can be a lot more sensitive than relying on absolute concentration thresholds. Another possibility is to measure the analyte with respect to another. Such a 'signature' can vastly reduce the inter-individual variability and can provide a far more stable measure. A well known example of such a 'relationship' measure is body mass index, where two basic physical measures are combined via a defined mathematical relationship and the derived measure used as a clinical indicator of disease risk.

Blood is an obvious and attractive candidate for diagnosis and prognosis and as such a great deal of research effort has been involved internationally in attempting to create blood based diagnostics. A large international collaboration was instigated by the Human Proteome Organisation (HUPO HPP) [3]. In overview they found that, given the technologies used, 3020 distinct proteins could be reliably observed. Over 15,000 proteins were reported by researchers involved but only 3020 were accepted once more stringent acceptance criteria were employed (stricter identification criteria and observed in more than one research group). Researchers who now work at Biosignatures have been heavily involved in the efforts to improve the basic technologies with leaders in the field and during the course of this collaborative work developed insights into what may be causing issues in successful production of 'biomarkers' of disease [4].

The main techniques used to measure analytes (generally protein expression but some smaller peptides and hormones may be measured) within blood plasma are Mass spectrometry (coupled with some form of prefractionation technology to spread out and simplify the sample), gel electrophoresis and protein arrays [5]. The largest problem with all of these technologies is the large dynamic range of abundance observed in plasma (e.g. in a healthy individual albumin to lower abundance proteins is around 10,000,000,000,000:1 or equivalently trying to find a bumble bee between here and the sun.) and the problems in measuring large biological sample groups reproducibly across this large dynamic range. In plasma, around 12 protein species account for over 95 % of the protein abundance.

Large studies (notably by the US National Cancer Institute) have attempted to create signatures (e.g. Petricoin and Liotta using SELDI analysis of ovarian cancer samples[6]). Proteomics did not live up to the media hype and several researchers criticised design issues in the study. Several years of reflection and analysis have shown that the major issue in these studies was technical study design, lack of strict sample collection and processing protocols and inherent issues with the reproducibility of the analysis technologies.

A great deal of research has gone into sample handling (for biological fluids) and studies have proved protocols that are stable and suitable for large comparative studies (e.g. UK biobank [7]).

A large amount of work has gone into sample analysis. The main problem being that the technologies used were not shown to be reproducible across laboratories (and sometimes with multiple runs within the same laboratory). The researchers at biosignatures worked closely with researchers at Astra Zeneca UK (AZ) on a large blood based study of 'typical cohorts'[8]. This study took 60 volunteers from the AZ site and under strict protocols took two samples, one week apart, from each participant. The participants were male and female and age matched to approximately fit into the post menopausal age range. This study found that using analysis protocols and technology similar to those that will be used in this study that 1200 analytes could be simultaneously measured at expected univariate power levels of 80% (p<0.05, 2 fold difference) with 20 biological replicates and that 1700 analytes had similar levels if a longitudinal design was employed. This study used univariate testing of each analyte and did not explore correlation issues. The analysis strategies used in this study are expected to be more powerful than those reported.

A second study involved working with Novartis Institutes for Biomedical Research (NIBR) along with another four of the best proteomics research facilities in the world (and supported by HUPO). It looked at reproducibility of analysis results across facilities. This study (manuscript in preparation) shows the level of reproducibility expected across research facilities and also shows that strict adherence to protocols can produce raw data level reproducibility of thousands of simultaneous analyte measurements.

A third study with NIBR applied biosignatures proprietary analysis technology to a complex biological system. The study subjected H. Flu to 18 different treatment classes corresponding to subjecting H.Flu to each of 72 anti-bacterial compounds. Signatures were created to predict the mode of action of the treatments and performed perfectly in a double blind test. [9]

It is the combination of all of this research and technology that biosignatures now believes can produce blood derived signatures of disease in an applied clinical setting. [10]

In essence, the analysis technology is classed as supervised learning. The system is given multiple examples of group classes (disease cases) and from this derives a signature pattern that allows the classes to be discriminated. It is usual to test the signature against data that was not used to derive the signature itself.

This research proposal has been discussed with leading international researchers who have given us their support and are following our progress with interest.

There are multiple goals of the research and this has complicated the experiment design. The research designs can broadly be split into three categories:

  1. iterative cross-sectional
  2. longitudinal across intervention and follow up
  3. designed validations

All of the research relies on getting a sample and being able to classify it (i.e. we have a diagnosis). Some of the data collection is prospective i.e. we may collect a sample from a patient who has been diagnosed with one condition but later is diagnosed with another. We can use this information to attempt to build new signatures that may provide earlier or presymptomatic diagnosis.

We intend to average collection of around 25 samples per week from a cross section of patients. The intent is to sample from 'patients who attend a urology clinic'. This is mainly to compensate for another problem with previous studies. In previous studies they compared disease cases with healthy controls. It has subsequently become apparent that a person's plasma proteome changes markedly when they are sick. Early studies ended up finding 'markers of generally being sick' and had very little specificity for the target condition. The medical profession has found 'urology' a meaningful and systemically useful designation for a whole range of conditions. The 'anyone on the ward or in the clinic' policy is intended to improve disease selectivity by specifically attempting to exclude markers pertaining to 'being sick with a urological complaint' from markers specific to, for example prostate cancer. This also allows us to obtain controls that are not only similar in disease state but that have been under similar conditions for a period of time (i.e. have been admitted to the ward, low activity, similar diet).

In overview, the iterative cross sectional design will have the following work-flow:

  1. samples collected by research nurse (informed consent etc.). One lithium heparin prepared vacutainer of venous blood (8.5 mL draw volume) and where possible around 150 mL of mid stream urine. (wherever possible sample collection will be combined with routine assessment to minimise the impact on the patient)
  2. research nurse will centrifuge blood and extract plasma supernatant and aim to freeze at -80 C within an hour.
  3. samples will be anonymised and medical details collated
  4. fully anonymised and blinded samples will be sent to analysis facility by courier (in batches each week)
  5. samples will be analysed 'blind' and attempts made to predict the diagnosis of the patient.
  6. the predictions will be sent to the research team in the clinic and they will compare the blind predictions to the diagnoses. The sensitivity and specificity performance will be logged and the samples' classes sent to the research team un blinded (fully anonymised).
  7. this procedure will iterate throughout the study and it is expected that at some point the signature performance will converge. It is this convergence that signals that we have sampled a large enough proportion of the target population. The univariate studies on healthy patients showed that for 1200 analytes 20 biological replicates should be sufficient. The proposed study plans to sample in excess of 100 different patients with each condition. This is to cover the possibilities that the analytes of interest may be low abundance, patients may have more than one condition, may be on multiple drug regimens etc. Essentially we expect the real world variance to be far higher than predicted in our previous study. This is also to allow us to take far more of the 'real world variance' into account and remove the need for carefully matched cohort populations.

Concerns about sensitivity and time scales bring us to the longitudinal across intervention and follow up design. In this case a patient will be asked if they are happy to provide samples pre and post surgical intervention and at scheduled follow up (i.e. the standard clinical follow up recommended by the health care team). This design has many potential benefits. The first of these is sensitivity, not only can we directly compensate for inter individual differences via multiple samples but we also have a clear 'with disease' and 'without disease' case from which we can train models. This model data can be used to help guide analytes to use in the cross sectional study. This model also has the potential to provide valuable prognosis and monitoring candidate signatures that, after further validation, could be used to assist clinicians in post surgical follow up (to be clear it is not proposed that any patient treatment be altered as part of this research study. Any use of a discovered signature panel in this way will be covered by a separate research study and ethical approval.)

The final design is a multi clinical site, multiple analysis site conventionally designed validation study of the signature (i.e. standard cohort trial). It is intended that once a given signature has converged we will have enough information to design a validation experiment that allows us to confirm the biological nature of the signature by utilising a second clinical site following standard procedures defined by the first site. This project is solely concerned with validating signatures that are robust to the clinical realities. This final confirmation study is to cover remaining doubts there may be in the source of the differences. Second site confirmation should allow us to control for remaining variables and assess how portable the signature is. The study will be double blinded with an independent party validating correct diagnosis prediction.

Recruitment

Recruitment will be driven from the suggestions of the clinical team based on patients attending clinic and on the ward. To improve the biological replicates per disease sampling will be biased towards specific disease classes. It will be made clear that patients will receive no material benefit (therapeutic or otherwise) from participation and that participation is entirely voluntary.

Inclusion/exclusion

Criteria are purely based on clinical recommendation of suitability, ability to provide informed consent and disease diagnosis.

Consent

Only people usually classed as being capable of giving their informed consent will be considered in this study. An information leaflet will be provided outlining the aims of the study and detailing what it would involve and that it may lead to commercialisation. The research nurse will discuss any issues with the patient and then ask them to sign informed commercial consent to use of the samples 'as a gift' and to give permission for their medical history to be anonymously tracked. It will also be made clear that they are free to withdraw their consent at any point now or in the future.

Risks, burdens and benefits

It is expected that the study will have minimal risk and burden upon the patient and it will be made clear that any benefits would be generic and are not expected to have any direct impact on the patient as a result of their participation in this study.

There is a possibility that the patients recruited may request the results but it is explained in the information sheet that the study is looking at common patterns across hundreds of samples and specific results for any individual will have no clinical impact.

Confidentiality

Anonymous medical information is required to derive the signatures. In order to derive presymptomatic and/or prognostic signatures we must have access to the records for a reasonable period of time. No person identifiable information will leave the clinical environment.

If there is strong evidence that a patient may have been mis-classified this information will be supplied and discussed with the clinical team (by anonymous identifier) where they can decide what action if any constitutes the best interest of the patient.

Conflict of interest

No conflict of interest is expected.

What will happen at the end of the study?

The intent is to continue with follow on studies that focus on new disease areas using previous patient analyses as further case controls in studies ever increasing in size. Further REC applications will be made to cover these studies. There is expected to be no patient impact at the end of the study. Tissue samples are expected to be stored after the study ends and informed consent will specifically cover this eventuality. The samples will be stored in a HTA licensed facility. The samples will only be used for research purposes with ethical approval.

Funding

Funding for this project will by provided by Biosignatures Ltd. Financial estimates have been discussed with NUTH Trust R&D. The project is initially costed to cover the full time employment of a grade 6 research nurse and one session a week of consultant and research registrar time. Biosignatures has discussed resourcing with the clinical staff and there is scope to adjust the research nurse grade to 7 if an ideal candidate is available. There is also scope for resourcing further support staffing if this is deemed necessary after we have observed the study running. All analysis and transport costs have also been costed and will be supported by Biosignatures Ltd.

 

[1] Anderson, N.L. and Anderson, N.G. The human plasma proteome: History, character, and diagnostic prospects. Molecular and Cellular Proteomics, 1.11, 845-867 (2002) [2] Anderson, N.L.,et al. The Human Plasma Proteome: A Non-Redundant List Developed by Combination of Four Separate Sources, Mol Cell Proteomics 3: 311-326 (2004).
[3] Omenn, G.S. et al, Overview of the HUPO Plasma Proteome Project: Results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publically-available database, Proteomics 5, Issue 13, p3226:3245 (2005)
[4] HUPO cross lab reproducibility study. Manuscript in preparation. Work mentioned in Nature editorial: http://www.nature.com/nature/journal/v452/n7190/full/452913b.html
[5] For an excellent overview see the multiple papers in the special edition of Proteomics 13'05 "Exploring The Human Plasma Proteome : The Human Plasma Proteome Project", (2005)
[6] Petricoin et al, Use of proteomic patterns in serum to identify ovarian cancer, The Lancet vol 359 (2002)
[7] See the speical edition of the International Journal of Epidemiology: issue 37 (2008) e.g. Elliot and Peakman "The UK Biobank sample handling and storage protocol for the collection, processing and archiving of human blood and urine"
[8] In press, Proteomics: Clinical Applications (electronic galley proof supplied): Jackson et al, "Considerations for Powering a Clinical Proteomics Study: Normal Variability in the Human Plasma Proteome"
[9] Preliminary results presented at Association of Biomolecular Resource Facilities international meeting (2007) electronic version available at http://www.nonlinear.com/docs/pdf/ABRF2007_Predictive.pdf
[10] Further examples of research uses of proteomics and similar technologies can be found here http://www.nonlinear.com/products/progenesis/posters.asp and http://www.nonlinear.com/products/progenesis/publications.asp