Using Define-XML for faster, better quality and more efficient studies
Since its inception in 1997, the
Clinical Data Interchange Standards Consortium (CDISC)
has developed and supported globally-adopted data standards to improve clinical trial efficiency. Clinical data standards are now recognized as playing a vital role in the entire end-to-end clinical trial process. Standardization allows for faster, better quality and less costly drug discovery.
One of the most widely used standards today is Define-XML. The latest version of Define-XML is v2.1, which went live in May 2019. Get release information on the CDISC website.
Â
What is Define-XML?
According to CDISC: âDefine-XML is required by the United States Food and Drug Administration (FDA) and the Japanese Pharmaceuticals and Medical Devices Agency (PMDA) for every study in each electronic submission to inform the regulators which datasets, variables, controlled terms, and other specified metadata were used.â
The FDAâs Technical Conformance Guide explains that Define-XML is "arguably the most important part of the electronic dataset submission for regulatory reviewâ because it helps reviewers gain familiarity with study data, its origins and derivations.
The standard itself is known as âDefine-XMLâ. The file thatâs submitted to the FDA upon completion is the data definition file, known simply as âdefine.xmlâ.
Â
Define-XML as a dataset descriptor
It is commonly thought that Define-XML is simply a dataset descriptor: a way to document what datasets look like, including the names and labels of datasets and variables, what terminology is used etc. This is essentially what Define-XML was created for.
But by instead thinking of Define-XML as a tool to create better quality, more efficient clinical studies, users can unlock the true potential of the standard.
Â
Progressive uses of Define-XML
You can use Define-XML to help you optimize the end-to-end clinical trial process in the following ways:Â
1) Use Define-XML in your CRF design process
Many organizations treat Define-XML as an afterthought: only when case report forms (CRFs) are designed, data is collected and the study is complete do they think about creating the define.xml file for FDA submission.
But this approach can lead to incomplete data, the need for protocol amendments, complex mapping, increased quality control. How do you know when designing the CRF that youâre collecting all the relevant SDTM data? And when the data has been collected, how can you verify the submission is what you intended when you have no study definition to compare it with? It can take valuable time and resources to make sure all data has been collected in the right format, and ultimately can elongate the study process.
A more efficient approach is to use Define-XML to define your study, end-to-end, right at the start. This includes defining SDTM, SEND, and ADaM datasets upfront.
Â
Using Define-XML and SDTM to design submission datasets at the start of a study makes it easier to set up your study and create your case report forms (CRFs). By setting out what information should ultimately appear in your submission datasets before you collect any patient data, you can create CRFs with confidence, knowing that youâre collecting all the required information in the right format.
For example, the SDTM standard gives the âIdentifierâ, âTopicâ, âQualifierâ, and âTimingâ variables required in your submission datasets. If you know upfront what variables to use, you can create your CRFs accordingly.
You can also do your dataset annotation of CRFs with SDTM variables upfront. This can help ensure all your collected data has a place in SDTM. This has the additional benefit of providing basic mapping between the forms and the datasets. CDISC provides a mechanism to extend Define-XML which is permissible and allows the storage of additional metadata such as complex dataset mappings (e.g. how data may be merged into one single dataset from two sources).
In this way, using Define-XML upfront, rather than retrospectively, can help you ensure your study is a success. Read more about using Define-XML for dataset design.
Â
2) Use Define-XML in EDC data conversions
Define-XML is not limited to just describing CDISC SDTM and ADaM dataset structures. From an electronic data capture (EDC) system, you can export proprietary dataset formats which can be described using the Define-XML model. With the right tools, you can automatically generate a Define-XML that describes the EDC export datasets using the CRFs/eCRFs themselves. This can then be displayed in a friendly HTML or PDF format allowing early visibility of the datasets that will be delivered by the EDC system.
The Source Proprietary dataset spec enables upfront mapping of EDC datasets to SDTM datasets. These mappings can be described (and made machine-executable) using extensions to Define-XML and human-readable SDTM mapping specifications produced automatically, aiding review and approval of mappings.
In addition, the Define-XML mapping extensions provide a machine-executable format that can be processed by data transformation code to enable the automatic conversion of datasets in commercially available tools.
The diagram below shows the flow of data from data capture through to CDISC datasets and the part CDISC metadata plays. Metadata is used in designing data capture forms using CDISC ODM and Define-XML in designing destination datasets. All of this vendor-neutral metadata can form the basis of form and dataset libraries which can be re-used from study to study.
3) Creating and re-using dataset libraries
Define-XML is the perfect tool to help you create libraries of datasets (EDC, SDTM, ADaM), mappings, page links to CRF variables, and so on for re-use from one study to the next.
A metadata-driven approach using Define-XML can optimize a single study from set-up to submission. But creating libraries of reusable metadata will make future studies even more efficient.
If you have a library of data acquisition forms, proprietary EDC datasets, SDTM datasets, ADaM, and dataset mappings that are approved internally and ready to use, youâll only have to create new content where there is a specific requirement for it. All other approved metadata is already there in your library.
4) Automating dataset validation
Another major advantage to defining datasets upfront is that validation can also be done up front. By creating a prospective definition of the intended datasets at the start of the study, it is possible to machine-validate study dataset designs for conformance to external standards. It is also possible to validate that populated datasets match the original specifications. This way, data quality and submission compliance are built-in upfront with less reliance on downstream validation.
We go into a little more detail on validation possibilities below:
Â
* Compare study dataset designs, including controlled terminology, to external and internal standards
When designing SDTM datasets and creating controlled terms, it is imperative that these comply with the latest and/or chosen version of National Cancer Institute Controlled Terminology (NCI CT). During the dataset design phase, automatic comparisons and compliance checks should be made with the appropriate version of NCI CT.
Companies should also develop their own domains that comply with CDISC SDTM but include content that falls outside of the standard Implementation Guide domains. For example, specialist findings domains may be required for a particular therapeutic area. In this scenario, companies should compare study dataset designs against their own data standards to check for differences and either accept or reject them accordingly.
* Compare âAs specifiedâ study dataset specification against âAs deliveredâ study dataset designs
Increasingly, studies are outsourced to Contract Research Organizations (CRO) and this leads to an increased burden on sponsors. This tends to happen in two areas: (a) upfront specification of deliverables and (b) downstream validation of those deliverables.
When dataset validation is done upfront, a human-readable target SDTM specification (in HTML, PDF, Word or Excel) can be given to a CRO to describe what is expected to be in the delivered datasets: an âAs specifiedâ study dataset specification.
When CROs return the datasets, they should also provide âAs deliveredâ study dataset metadata. With both âAs specifiedâ and âAs deliveredâ study dataset metadata available, it is easy to compare the study dataset metadata to verify that the âAs deliveredâ dataset actually matches what was specified.
* Â Compare dataset data to dataset metadata and SDTM or ADaM
Having a target SDTM Define-XML available upfront allows automated comparison of delivered datasets against study dataset metadata, either as specified or as delivered. Comparing data to as specified Define-XML verifies that the data matches what was originally intended/specified. And comparing data to as delivered Define-XML ensures that the data matches the dataset definition. This is important as it will ultimately be this as delivered Define-XML that is submitted to the FDA.
You can also compare with CDISC standards such as SDTM and ADaM using the CDISC Open Rules Engine (CORE). CORE is a validation engine that can help you easily validate against the standardized conformance rules published by CDISC. In April 2023, we released Formedix CORE, a free desktop application for validating datasets using the CORE engine. This will validate against both SDTM/ADaM, and the Define-XML metadata. CDISC will also be adding the ability to validate against regulatory rules (FDA, PMDA etc) in the future.
Â
Define.xml file submission
As weâve shown, there are many benefits to using Define-XML - not only as a dataset descriptor, but as a means to streamline the clinical study process.
Define-XML should not be thought of as simply a submission deliverable, but as a CDISC model that helps optimize the end-to-end clinical trial process. It can be used to establish dataset libraries that promote study-to-study re-use, as well as to drive efficiencies through expedited study set-up and streamlined dataset conversions.
Learn how you can create Define XML in one click with our Visual Define.XML editor  or check out of blog to what's new in Define-XML 2.0 including define xml 2.0 examples.
.
The 6 dos and donâts of Define-XML
Ready to find out more about Define-XML creation?
Download our free guide to the 6 dos and donâts of Define-XML to help you implement the standard and get your study submission ready.Â
Â
Â
Author's note:Â this blog post was originally published in May 2014 and has been updated for accuracy and comprehensiveness.
Â
Streamline your clinical trials with automated metadata management
When it comes to efficient clinical study build, content is king. Most importantly: metadata content. Metadata is the building blocks of your study. Itâs the forms, terminologies and datasets you need to structure your study and start collecting data.
Managing clinical trial metadata can be challenging. You have to find the right content, make sure itâs compliant with CDISC standards, and get it approved before you start collecting data.
That's where automated metadata management comes in. Automation can help streamline your clinical trial process, meaning more efficient processes, easier study build, and faster submission.
In this blog, we'll explore the role of metadata in clinical trials, the benefits of automating metadata management, and how to implement standardization to help you build trials more quickly and efficiently.
Letâs dive in!
Â
What is metadata and why is it important for clinical trials?
The number of global clinical trials is increasing every year and there are more regulations than ever before. This means trials are becoming increasingly complex to manage. Thereâs a vast amount of data to be collected, organized and analyzed. Study teams must consider not only the study data, but the metadata too.
Metadata is often described as âdata about dataâ. In other words, it gives detail and context to data elements. By doing so, it helps to organize and optimize data, and makes it easier to use. Some examples of clinical metadata are case report forms (CRFs), datasets, controlled terminologies, mappings, and edit checks.
Metadata is a crucial part of any clinical trial. The FDA has made it mandatory for submissions to comply with CDISC standards â SDTM, ADaM, and Define-XML. Not only must the study submission comply with these standards, it must also include information describing the submitted data. Study metadata lets regulatory reviewers understand and interpret clinical trial data. And, the higher the quality the metadata, the quicker it is to review.
Â
Common challenges in clinical metadata management
Many organizations use manual processes, such as Excel spreadsheets and Word documents, to manage metadata. Here are challenges of doing it this way:
* Inconsistency and errors: Manual metadata management means time consuming and fiddly work. For example, tracking changes to standards within a Word document or Excel spreadsheet can easily get confusing. And changes happen all the time in clinical studies! A manual process can be error-prone, leading to inaccurate and incomplete metadata.
* Lack of visibility and control over change: Itâs not easy to see the impact of changes to your metadata when itâs stored manually in disparate places. Itâs difficult to make decisions when you canât see how other assets, standards, and the relationships between them are affected. You canât see what may need to be changed as a result. If thereâs no proper governance in place, metadata can become inconsistent, out of date, and inaccurate.
* Lack of standardization: Without standardization, metadata can be difficult to use and maintain. Staying up to date with changes to CDISC standards and knowing all the different versions is very tricky and time-consuming to do manually.
* Difficulty finding metadata: Metadata that is managed manually tends to be stored in different folders, on different systems, owned by different stakeholders, in different formats. Tracking it down wastes a lot of time and delays study build.
* Creating content from scratch for each study: Many organizations need to create metadata content from scratch every time, or dust off old content and review it before they can use it. This means the time to set up the study is longer than it needs to be.
* Difficult to manage and scale: Manual metadata management can be difficult to scale, especially when managing large volumes of data. For example, if you need to use different electronic data capture (EDC) systems and therefore different forms for different stages of your study, you need to be able to effectively manage the different versions and changes to them, to ensure accuracy and consistency.
Â
Implementing automated metadata management in your organization
To stay ahead of the competition, clinical trial organizations are increasingly embracing technology solutions to help them effectively manage and automate metadata processes.
Implementing automated metadata management for your clinical trial requires careful planning and execution. The process can broadly be broken down into two phases, which weâve explored below. Note, there isnât necessarily a right or wrong order regarding how to implement automated metadata management, and you might run both phases in tandem. It all depends on what works best for your organization.
Phase 1: Implement a clinical metadata repository (CMDR)
Metadata management is done best when using a CMDR. A CMDR is a centralized repository for storing and governing clinical metadata. It has a user-friendly interface that allows planning, communication, and collaborative working between different teams. All your metadata is kept in one place, so itâs available for all stakeholders to easily find, edit, and share. Standards and automated processes are built into its framework to streamline clinical studies from inception to completion. A CMDR allows organizations to connect disparate metadata siloes often present in clinical trial design. For example, different teams using different metadata, which is stored in various places, often resulting in inconsistencies and poor version control.
Typically, this phase might involve the following activities:
* Choosing a clinical metadata repository that meets your organization's needs.
* Uploading your metadata into the CMDR, or as with some systems, design metadata within the platform.
* Developing a metadata management and review strategy outlining how metadata will be maintained, including how to handle changes to metadata, retiring metadata and creating new metadata.
* Training employees on how to use the platform and following the metadata management strategy â CMDRs often come with training, and are typically straightforward to use.
Phase 2: Standardize clinical metadata
This means creating metadata (forms, standards, annotations etc) that meets the requirements of the organization and getting these approved internally by all stakeholders. Once a metadata asset has passed quality checks, it is stored within the CMDR for use and reuse across studies.
Standardizing metadata reduces the resources, effort, and time spent setting up studies. Metadata is more consistent, higher quality, and the likelihood of a successful submission is increased.
Standardizing content also saves time that youâd otherwise be spending searching for the right forms, terminologies and annotations to use.
Typically, this phase might involve the following activities:
* Identifying the metadata commonly used throughout your clinical trials (eg case report forms).
* Standardizing the metadata in accordance with organizational needs, sponsor and regulatory requirements (eg aligning with controlled terminology).
* Getting all metadata reviewed and approved by relevant internal stakeholders, to make sure all department needs are met.
Donât be put off by the steps above or the efforts required to create, implement and maintain a CMDR. Remember â the time you put into learning the new platform will pay dividends in the future because your clinical trials will become much faster and easier to build with automated metadata management in place.
Â
How automated metadata management actually works
A CMDR and automation go hand-in-hand:
 Automatic study build
A study can be built automatically from existing standards â thereâs no need to manually look for content. Youâll only need to create new content when the specific study requires it.
Automatic compliance
Because content is standardized in alignment with regulatory standards, as well as sponsor-driven standards, studies are automatically compliant with CDISC. For example, controlled terminology (a form of metadata) is used to ensure consistency across trials. This means regulatory bodies can quickly locate, view, and analyze data, making the review process much more straightforward.
Automatic change control
Changes are automatically entered into a thorough approvals cycle, and tracked before being accepted. Meaning thereâs no need for a complex manual cycle, with long email trails detailing changes, or potential version confusion.
Automatic impact analysis
The upstream or downstream impact of changes to standards is automatically identified - thereâs no need to track this manually. You can review metadata adjustments (for example standards updates) and perform an impact analysis on all parts of the trial and connected studies before changes are approved and implemented.
Automatic quality control
Quality control is enforced through a customizable metadata lifecycle, meaning all stakeholders can see exactly which state a standard or study is in (i.e. draft, in progress, in review or approved) and only perform the actions allowed at that stage. Read our blog about improving clinical trial data quality with a metadata repository.Â
CMDRs facilitate automated metadata management, meaning clinical trial efficiencies and lasting changes to your study build process. In fact, we believe a CMDR could be one of the best investments a sponsor organization could make. In the next section, we take a look at some of the key benefits to automated metadata management.
Â
The benefits of end-to-end automated metadata management
Automated metadata management offers many benefits over manual management.
* It saves time and reduces errors. Manual metadata management means human involvement in fiddly, time-consuming tasks. This can take up a lot of resources in the study build team, especially when dealing with large or multiple studies. By automating processes, organizations can save time and reduce the risk of human error.
Â
* Metadata can be reused again and again. Metadata is only useful if it is accurate and consistent. Automated metadata management ensures that studies are built on the latest approved standards, so they can be reused. This in turn reduces the chance of errors, achieves consistency, whilst also saving time and effort.
Â
* Stronger, more consistent, end-to-end processes. Each stage in a CMDR is built with the next stage in mind. Standards such as SDTM, ADaM, and Define are integrated into the framework. This allows for seamless automation and inheritance between the different stages. For example, generating database specifications from form designs or creating define.xml as you go rather than waiting for the study to complete.
Â
* Improved searchability and discoverability. Metadata is created, edited, approved and retired within your CMDR. Everything is in one place, so you have peace of mind youâll find what you need, when you need it.
Â
* Visualize up front. Some CMDRs also let you design and preview eCRFs before you build your The eCRF can be fully visualized, interrogated and refined before your EDC is created. This ensures that it works just the way it should before youâve even started collecting study data. This right-first-time approach removes the need for timely, repetitive approvals cycles and changes down the line.
Â
The future of clinical metadata management
Building an effective clinical trial starts with metadata. With quality, standardized and approved metadata in place, trials can start faster and, with less manual work. Insights can be gained earlier, and the trial can proceed more quickly.
Implementing a CMDR with automated metadata management can deliver many long-term benefits. As CDISC standards continue to be updated, and more and more regulations come into play, CMDRs will play a central role in organizing and standardizing this data to speed up the clinical trial process and make life easier for organizations. With clinical metadata standardized, organized and automated processes in place, organizations can focus on getting treatments to market more quickly.
Want to read more about CMDRs and automation? Why not download our free whitepaper: 'The importance of a clinical metadata repository in clinical trials'?
Author's note:Â this blog post was originally published in February 2021 and has been updated for accuracy and comprehensiveness.