WP01 Diagnostic Genechip
Steffen Hennig, PhD
CRM -Coastal Research & Management GbR
- To develop a diagnostic tool to identify known and new mutations in both NCL genes and functional candidate genes, the specific sub-aims being:
- To establish a sequence database with all relevant mutations in promoter-, coding- and intronic regions of both NCL genes and functional candidate genes. Public databases for all relevant informations, e.g. splice sites, promoter-regions, etc. will be used to build a profile of relevant mutation loci.
- To design a microarray covering promoter-, exon- and intronic regions of all known NCL genes and NCL candidate genes.
- To test and optimise the developed microarray as a tool for diagnosing known and new mutations in NCL genes and candidate genes.
- To update and complete the diagnostic gene chip as novel NCL-associated mutations and as potential novel NCL genes are identified, to allow the usage of the chip for large-scale diagnostic studies. Extensive hybridisation experiments will be performed to optimise and test the prototype array.
- If applicable, extend the chip-based diagnostic tool to next-generation-sequencing technology
The aim of WP01 was to develop a new diagnostic tool for the genetic diagnosis of NCL diseases. This tool should be innovative, reliable, and time- and cost-effective in order to significantly contribute to an early diagnosis of NCLs.
According to the above listed objectives, several steps had to be undertaken during this development process:
Establishment of sequence database
In preparation of the diagnostic chip design, it was necessary to establish a sequence database with all available information about known or putative NCL-relevant mutations. Since the DEM-CHILD grant application in 2011, the number of known NCL genes had increased from eight to at least thirteen (Warrier V et al. BBA 2013, Schulz A et al. BBA 2013). In consequence, the identification of new NCL genes had broadened the number of putative NCL candidate genes from nine to more than twenty. During several WP01 workshops and phone conferences, all geneticists involved in the DEM-CHILD project developed and agreed on a most up-to-date list of thirteen NCL genes and twenty-five NCL candidate genes. Together with project partner 03 (Sara Mole, UCL), who runs the NCL resource database, and the partners 01, 02, and 04, the list of relevant mutations was agreed and respective sources of information were identified and imported to our local server.
For the project partners, an excerpt of the database was provided as flat file in dynamic excel format, which allowed direct linking-out to external information systems like the UCSC genome browser. To date, the database contains 38 gene loci, which cover about 1.5 Mb (repetitive elements not counted) of sequence. Apart from the 13 genes carrying known NCL-related mutations, we added 25 more genes, which are of potential disease relevance as they cause NCL resembling disease in animals or NCL-like phenotypes in patients or represent good functional candidates for the disease. In total, these are 21 more genes than originally planned for the chip reflecting the rapidly growing field of NCL diseases.
Fig.1: The NCL candidates sequence mutation database, version 1, which is basis for the first generation of the diagnostic chip. The links open access to all kinds of information needed for the array design, plus useful information concerning gene loci and annotations
Design and update of a microarray covering promoter-, exon- and intronic regions of all NCL genes and NCL candidate genes
The first test version of the microarray-based diagnostic chip was produced by Agilent and covered all thirteen known NCL genes including exons, introns and about 3 kb promoter regions. The original plan was to use Nimblegen arrays, but since the announcement of Roche in early 2012 to close down the Nimblegen array production, we had to employ an alternative system. Agilent offered the same capacities for a very comparable price and could be processed by partner 06 (imaGenes in Berlin), as well. As this version was used as a primary test version for positive control samples, NCL candidate genes were not yet included for cost-effective reasons.
For the second version of the chip, apart from the 13 genes carrying known NCL-related mutations, we added 25 more NCL candidate genes, which are of potential disease relevance as they cause NCL resembling disease in animals or NCL-like phenotypes in patients e.g. genes associated with myoclonic epilepsies (PME). Those have been added as a result of the recent finding that KCTD7, a gene causative for PME type 3, might also be responsible for an infantile NCL phenotype, termed CLN14 (Staropoli J et al. Am J Hum Genet 2012).
Testing of the developed microarray as a tool for diagnosing known and new mutations in NCL genes and candidate genes
A total of 32 positive control samples of NCL patients with known NCL mutations were provided by partners 01, 02, 03, and 04. Samples were selected to cover almost all known NCL genes and to provide a mixture of compound heterozygous and homozygous mutations as well as a mixture of point mutations, insertions and deletions. All samples were blinded at the coordinator’s site, received a unique identifier code, and were entered into the sample database. After being sent to partner 06, they were prepared in parallel and used for hybridisations on the first test version of the diagnostic chip.
The testing procedure included several steps in the microarray laboratory, with modest technical changes as compared to the standard Agilent CGH protocol, i.e. hybridisation times were increased by 4 hrs (= 16% of standard time). No significant changes were observed in the final intensities per oligo and in the final analysis results.
Testing of the microarray-base diagnostic chip revealed two major issues of this technique:
(i) While the diagnostic chip was able to pinpoint exonic regions where the disease-associated mutation is known to reside in approximately 50% of the samples, for the remaining approximately 50% of samples, the interval with the known mutations were not clearly identified by the diagnostic chip. Possible explanations included poor DNA quality/quantity, hybridisation problems, or the need for optimisation of the chip design.
ii) In addition to potential known NCL mutations, the chip apparently identified a significant number of gene regions being different between patient samples and reference (commercially available Promega male genomic DNA), the majority of which were not known before, as in most cases previous studies have been limited to one or a few NCL genes.
Both issues required further investigation and had significant influence on the task to test and optimize a new time- and cost-effective method for genetic diagnosis. In the meantime, since our initial working plan, technology had further developed and made customized targeted gene deep-sequencing in next-generation more affordable and practicable in neurogenetic laboratories. In particular, it seemed now cost-effective to combine the speed of PCR with the sensitivity of hybridization providing a robust solution for targeting smaller capture regions such as it occurs in the analysis of a limited set of genes.
Change of core technology from microarrays to Next-Generation-Sequencing (NGS)
After discussion between the project coordinator and partners 1, 2, 4, 6, 11, 12 there was a clear decision to go on with the NGS approach. As expected, analysis of the first batch of 20 positive control samples from patients with known NCL mutations by NGS showed that the sequencing information was more precise and allowed deeper analysis than the microarray-based data. Moreover, per sample the total costs of the NGS approach could be reliably estimated to be almost equivalent to the array costs as of September 2013. The conclusion was to save budget and further consumables for tests and improvement of the diagnostic chip, and in turn to continue 100% with the development and set-up of the diagnostic sequencing tool taking the following steps:
a) Target enrichment system development
Partners 11 and 12 designed a novel tool for target enrichment and carried out a series of tests to evaluate the potential of the NGS-based diagnostic tool.
b) Test sequencing and quality analysis of the HaloPlex system
16 positive control samples from partners 1, 2, 3, 4, most of them already used for the microarray testings, were taken for the initial tests of the target enrichment step. All sample preparations, processing of the customized Agilent HaloPlex kit and the sequencing runs on an Illumina MiSeq machine were done by partner 12. Partner 11 developed a tailored Bioinformatics pipeline to map the millions of reads to the 39 gene loci and to perform all statistical and mutation analysis steps. Since the goal of the NGS diagnostic tool was to become a routine application in NCL diagnostics it was of major importance that the target enrichment delivers reproducible results which could be shown by comparing several testings of the same samples. In conclusion, the target enrichment system provided sufficient quality for routine applications in NGS based diagnostics of NCL.
c) Development of Bioinformatics package for an NGS-based diagnostic tool
The change of strategy from a microarray-based analysis system to an NGS-based approach required development of a completely new software package. This was carried out by partner 11 in the second half of DEM-CHILD. By the end of the project, the software package was completed and can be applied in routine analysis projects now.
The package was designed to run fast and efficiently. Even large numbers of samples can be processed on standard servers in a reasonable time frame
All annotated variations as they were published by the 1000 genome project and other large genome projects were downloaded and are used to flag mutation candidates accordingly. This feature allows filtering of likely polymorphic variants against NCL specific mutations.
In the final step the software was extended by modules to compare single sample analyses against each other. This is very useful, since it allows ranking of mutation candidates in replicate analysis runs, which is the optimal mode of NGS-based analysis. Partner 11 and 12 are open to offer the complete NCL diagnostic tool as a routine service to partners and researchers in the NCL field.
d) Final evaluation of NGS diagnostic tool in large numbers of samples
In the last stage of the DEM-CHILD project partners 1,2,3,4,5 collected DNA from altogether 209 NCL samples, which were to a large extent run in duplicates. The quality of the found mutations was extensively evaluated and adjusted in a large number (16) of reference samples, where NCL-related mutations were already mapped by PCR-screenings or classical Sanger sequencing. Almost all mutations were correctly mapped, in one case the NGS-based diagnostic tool could even improve the previous analysis: A mutation denoted as ‘homozygous’ could be corrected into heterozygous by the new tool.
WP1 has successfully developed a novel NGS-based tool for the diagnosis of all known and novel NCL mutations, including mutations in NCL candidate genes. This tool reduces the time and costs needed for genetic diagnosis of all NCL forms significantly and can now be offered to the public as diagnostic service.