Data flows, Data Lifecycle, and ALCOA+
The points discussed in this article and summarized in Table 1, provide the data1 integrity compliance by design2. Data Governance is the strategic document implementing and enforcing the tactical elements discussed in this article. Data Governance is discussed in Reference 3.
Table 1, the Data flows, Data Lifecycle, ALCOA, and Key Expectations, provides, per typical data/records flow, the data/records attribute(s) associated with the data flow and the associated typical data/records integrity controls. The "Key Expectations" in Table 1 came from "A Computer Data Integrity Compliance Model"2.
Data lineage is the process of understanding, recording, and visualizing data as it flows from data sources to consumption and includes all transformations the data underwent, how data transformation, what changed, and why. A data flow map is a valuable tool to visualize data flow and the risks and vulnerabilities of electronic systems, particularly interfaced systems. It shows the complete data flow from start to finish.
A record is considered to have integrity if it is complete and protected from unauthorized addition, deletion, alteration, use, and concealment.
36 CFR Part 1236.10
Specifically, understanding the data flows among the components of an electronic system aids in correctly implementing the appropriate integrity controls needed. The definition of data integrity in NIST SP 800-57P14 provides four specific data flows associated with the appropriate data/records integrity controls: data created, while in transit, during processing or stored.
These four typical data flows can be correlated with the data lifecycle5. Table 1 provides this correlation.
The essential management measures and technical means should be adopted for the data/records collected, processed with, while in transit to, and stored by an electronic system6.
A typical manufacturing system (Figure 1) is used as an example to explain the relationships between Data flows, Data Lifecycle, ALCOA, and the data integrity controls per typical data flow as described in Table 1.
21-23 February 2024
Cloud Computing in a GxP Environment - Live Online Training
Fig. 1: Typical Manufacturing Data flows
Many published articles, regulatory authority guidelines, and global industry group guidelines provide information about data integrity, primarily illustrating ALCOA7 and ALCOA+8. The emphasis of all regulators is on the ALCOA principles to outline regulatory expectations. ALCOA These are is considered principles that ensure the integrity of data in its lifecycle.
In the data/records framework applicable to data flows9, ALCOA and all its variants indicate reliability10, but it cannot be used to determine the appropriate data/records integrity controls. Not all ALCOA principles are applicable in all data flows.
There may be some inconsistencies by the software developers and/or by the data/e-records11 risk evaluators about which situation an ALCOA principle is applicable.
The other rarely discussed item 0in many articles about data integrity is the controls needed to preserve the data element in a typical data flow through the electronic system12. In a high-level framework, the data flows involved are data "created, while in transit, during processing or stored."4 A record that has integrity in all of these data flows is complete and unaltered. Only personnel or electronic systems with appropriate access levels can modify the records in case of any alteration. In the case of altered data/records, there is traceability with the original data/records.
Understanding the data flows between the components of an electronic system enables the implementation of the correct applicable integrity controls needed per specific process.
The validation process applicable to workflows ensure that the intended steps, requirements, design, and qualification/verification results are accurate, performed adequately, and documented. The maintenance and operation phases provide that any modification maintains the validated state of each workflow.
This paper provides the relationships between data flows, data lifecycle, ALCOA, and key data integrity controls per typical data/ records flow.
Even though many of the references in this article are related to manufacturing, this article applies to all GxP regulations13.
Understanding Data Integrity
As per NIST SP 800-57P1, data integrity refers to the "property that data has not been altered14 in an unauthorized manner since it was created, while in transit, during processing or stored." This definition is consistent with ISO 15489, INFOSEC, 44 USC, Sec. 3542, 36 CFR Part 1236.10, and ANSI/IEEE and embraces the state or condition that e-records integrity is a measure of the validity and fidelity of a data object15.
During the data/records lifecycle16, 17, data/records are placed in one of the data flows mentioned above. Each data flow can be correlated with the key expectations regarding the controls to maintain the data/record with integrity. Refer to Table 1. It is vital to recognize the data/records lifecycle elements (Figure 2) to ensure the proper controls.
E-records, as informational objects, have a life cycle that begins with data capture, maintenance and use, and disposal18.
As depicted in Figure 2, as part of the data utilization, the data can be analyzed, archived, extracted, loaded, migrated, processed, reported, retained, retrieved, transformed, transmitted, and so on.
Work in progress corresponds to the data generation and is considered sensor data. The e-records handling function starts after sensor data is captured. A sensor or transient data becomes an e-record after being captured and saved to a repository for e-records19. During maintain and use e-records are kept and used for their intended purpose. The e-records must be authentic, trustworthy, and reliable to allow those individuals who depend on the e-records to fulfil their job functions correctly.
In some situations, the processes' abilities are retained throughout the active phase as the tools required to use the records are the same tools used for their creation.
Fig. 2: Typical Manufacturing Data flows
E-records that are no longer active are usually moved to separate e-records storage for long-term retention or retention. These inactive records need to be kept meeting retention schedule requirements, but often they are maintained with the read/view attribute.
The disposal of the e-records is the end of retention and destruction of e-records, including content, attributes, and audit trails.
Note that the business requirements that cause the e-records handling requirements drive the selection of appropriate supporting technologies. The technologies pose questions associated with ongoing internal and external secondary access to records, support the selection of appropriate technologies, and identify important system migration issues20.
Table 1 provides, per typical data/records flow, the data/records attribute(s) associated with the data flow and the associated typical data/records integrity controls. The "Key Expectations" in Table 1 came from "A Computer Data Integrity Compliance Model"2.
The integrity of GMP data/records should be safeguarded throughout the retention period during entry or collection, storage, transmission, and processing.
Applicability of the Concept from Figure 1
Figure 1 represents a typical PLC manufacturing data flow. In this figure, the PLC is connected to an instrument/equipment on the left-hand side of the figure and a Historian or supervisory control and data acquisition (SCADA) to the other end. In addition to these two connections, the PLC is connected to a supervisory system.
The information received from the instrument/equipment and immediately 0before the SCADA repository is considered transient or sensor21 data since it is not saved to a repository for e-records.
The sensor data may use virtual memory as transient data storage. The data are written to disk as virtual memory is not a true e-record since it serves as temporary data storage for the instrument or equipment application.
The sensor data in temporary data storage cannot be used nor maintained.
The associated sensor data integrity controls contained in Table 1 are:
- The interface between equipment and computer should be qualified and checked periodically to ensure data/record accuracy and reliability.
- The accuracy and reliability of the raw data depend not only on properly calibrated instruments and equipment but also on the integrity of the raw data produced by the recording action.
- Sensor data captured by the system should be saved into memory in a format that is not vulnerable to manipulation, loss or change.
- Built-in checks (EU Annex 11 p5) verify the data after the transformation.
The ALCOA elements associated with sensor data are accurate, complete, and consistent.
The equipment connected to the I/O card, maybe an electronic subsystem with some limited manual adjustable input data and the generated GxP data22 is not stored but sent via an interface to the PLC23.
The Supervisory system has the same characteristics as the subsystem above.
The PLC is connected to the SCADA / Historian subsystem, where GxP data are permanently stored, and the GxP data can be maintained and used by the user to generate results.
An e-record is generated after the sensor data (a.k.a. transient data) is recorded in a repository for e-records (e.g., SCADA, Historian, data server, and so on). After the sensor data24 is recorded, the e-records lifecycle begins. The e-records recorded in the SCADA / Historian are called original records25.
The overall e-records integrity controls applicable to all e-records in storage are:
- Qualify the repositories of e-records for their intended use.
- Retain records in their original format.
- User access controls to the records repository shall be configured and enforced to exclude unauthorized access and changes to and deleting data/records.
- A backup must be performed for disaster recovery to data/records, metadata and system configuration settings on storage. The backup copy must be a true copy of the backed-up records.
- Original data/records or a true copy are subject to periodic review by qualified personnel. Data / e-records should be periodically checked for accessibility, readability and integrity.
- Data/records should be traceable to the original data/records by documenting changes to the original data/records. An audit trail must be created for any modification to and deletion of data/records.
- Verification of data/records retrievability is required before modifying the electronic system application and/or infrastructure.
The ALCOA elements associated with e-records in storage are Attributable, Legible, Contemporaneously, Original, Accurate, Complete, Consistent, Enduring, and Available.
In addition to the brief description of the data lifecycle, ALCOA attributes, and critical expectation integrity controls to the sensor data and e-records in storage data flows, Table 1 contains the data lifecycle, ALCOA attribute, and overall e-records integrity controls for e-records while in transit, during processing, and capture.
Copenhagen, Denmark22-24 May 2024
IT / OT Infrastructure Qualification and Operation in a GMP Environment
As specified before, understanding the data flow between the components of an electronic system permits a straightforward implementation of the appropriate integrity controls needed per specific process.
From start to finish, the complete data flow is the process of understanding, recording, and visualizing data as it flows from data source systems to data consumption (e.g., business intelligence). Dataflows must include all the data underwent along the way and how the data was transformed, what changed, and why26.
The data lineage tool or data flow map27 is a helpful tool in understanding the risks and vulnerabilities of electronic systems, particularly interfaced systems.
As per NIST SP 800-57P1, there are four basic flows of information: data entry/collection, routes between each destination, during processing, and storage points.
In each of the above primary data flows, the data integrity control's objective is to prevent and detect any data integrity issues.
The prevention is performed during the design of the data handling applications to ensure that data is stored, archived or disposed of safely and securely during and after the retirement of the electronic system.
The design of the data handling applications function(s) includes the records identification and associated flow and the source of the original records.
Any data integrity issue is detected through periodic reviews, external and internal audits, regulatory inspections, and the investigation of incidents not intended by the electronic system.
|NIST SP 800-57P1 |
(Basic data flows)
|Data Lifecycle||ALCOA||Key Expectation|
Note: not considered data/record. It is considered transient
|Data generation and capture |
|Data / records entry / collection29||Data Capture (Data save (data |
|Storage32||Maintain and use (Access, use, and reuse)||Attributable |
Access limitation of user's interactions with data and the handling system is established by following a logical and physical procedural control, including segregation of duties. (EU Annex 11 p 12; Part 11.10(d) and (g)).
|While in transit34||Maintain and use - Transmission||Attributable |
|Maintain and use - Migration|| |
|Maintain and use - Archiving / |
|Maintain and use - Retrieving/ |
|Processing35 (Part 11.10(f))||Maintain and use - Use and reuse|| |
|Maintain and use - Migration||See migration as part of data/ records on "While in transit."|
|Maintain and use - Archiving / |
|See archiving as part of data/ records on “While in transit.”|
|Disposal (Data / records retirement37)|| |
|Maintain and use (Data Presentation |
(e.g., business intelligence38))
Table 1 – Data flows, Data Lifecycle, ALCOA, and Key Expectations.
About the Author:
Orlando López is a seasoned expert with worldwide experience in pharmaceutical device and medical device e-compliance.
1 Data is defined as the contents of a record. It is the basic unit of information with a unique meaning and can be transmitted. (ISO/IEC 17025).
2 López, O., “A Computer Data Integrity Compliance Model”, Pharmaceutical Engineering, Vol. 35 No. 2, March/April 2015.
3 López, O., "Electronic Records Governance,” in Data Integrity in Pharmaceutical and Medical Devices Regulation Operations,” O. López, Eds. (Taylor & Francis Group, Boca Raton, FL, 1st ed., 2017), pp. 133-141.
4 NIST, “SP 800-57 Part 1 Rev. 5 - Recommendation for Key Management: Part 1 – General,” May 2020.
5 NARA, “Universal Electronic Records Management Requirements,” Version 2.03, June 2020.
6 CFDA, “Data Record and Management,” December 2020.
7 ALCOA - Attributable, Legible, Contemporaneous, Original. Accurate.
8 ALCOA+ – ALCOA and Complete, Consistent, Enduring, Available.
9 Data flow - A graphical representation of the "flow" of data through an information system. (PIC/S).
10 López, O., “A Computer Data Integrity Compliance Model”, Pharmaceutical Engineering, Vol. 35 No. 2, March/April 2015.
11 Record is defined as the collection of related data treated as a unit (ISPE/PDA, “Technical Report: Good Electronic Records Management (GERM),” July 2002)
12 Electronic system - Electronic system means systems, including hardware and software, that produce e-records.
13 GxP - The underlying international life science requirements such as those outlined in the US FD&C Act, US PHS Act, FDA regulations, EU Directives, Japanese MHL.W regulations, Australia TGA, or other applicable national legislation or regulations under which a company operates. (GAMP Good Practice Guide, IT Infrastructure Control and Compliance, ISPE 2005)
14 Alteration of e-records includes the insertion, deletion, and substitution of data within the record.
15 López, O., “Maxims of Electronic Records Integrity,” Pharmaceutical Technology, May 2019.
16 CEFIC, “Practical risk-based guide for managing data integrity,” March 2019 (Version 1)
17 ECA, “GMP Data Governance and Data Integrity,” Section 9.3.15, Rev 2, January 2018.
18 MHRA, “MHRA GMP Data Integrity Definitions and Guidance for Industry”, March 2018.
19 Repository for e-records - A direct access device on which the electronic records and metadata are stored.
20 Center for Technology in Government University at Albany, SUNY, “Practical Tools for Electronic Records Management and Preservation,” July 1999.
21 Sensor (a.k.a. transient) data is the output of a device that detects and responds to some input from the physical environment.
22 GxP data - Data generated to satisfy a GxP regulation requirement.
23 CEFIC, “Practical risk-based guide for managing data integrity,” March 2019 (Version 1).
24 Data - The contents of the record are the basic unit of information that has a unique meaning and can be transmitted. (ISO/IEC 17025)
25 Original record - Data as the file or format in which it was initially generated, preserving the integrity (accuracy, completeness, content and meaning) of the record.
27 A data flow map is a way of representing a flow of data through a process or a system (usually an information system). The data flow map also provides information about the outputs and inputs of each entity and the process itself.
28 Pre-recording – During the generation and transformation stage, the actions that involve sensor data collections are performed. The transformation includes those actions that scale and convert data to digital.
29 Data entry or collection. - The process of placing an object under records management control for disposition and access purposes (López, O., “A Computer Data Integrity Compliance Model,” Pharmaceutical Engineering 35, No 2 (March/April 2015); 79-87).
30 Data accuracy refers to whether the data values stored for an object are the correct values.
31 Critical Data - data with high risk to product quality or patient safety. (ISPE GAMP COP Annex 11 – Interpretation, July/August 2011).
32 Storage - Data storage is the recording (storing) information (Data) in a storage medium.
33 True copy - An exact copy of an original record, which may be retained in the same or different format in which it was initially generated, e.g., a paper copy of a paper record, an electronic scan of a paper record, or a paper record of electronically generated data. (MHRA)
34 Data Transmission - Data transfer is transferring data between different data storage types, formats, or computer systems (MHRA, "GxP Data Integrity Guidance and Definitions," March 2018). This data flow requires retrieving data from the source repository and loading the data to the recipient repository.
35 Data Processing – (1) A sequence of operations performed on data to extract, present, or obtain information in a defined format (López, O., “A Computer Data Integrity Compliance Model,” Pharmaceutical Engineering 35, No 2 (March/April 2015); 79-87). (2) All system transactions ad defined by MHRA (GxP Data Integrity Guidance and Definitions, Mar 2018), Section 6.12.
36 Data transformation - Processing data that can result in the creation of additional data. Data transformations are widespread in Big Data environments.
37 Data/records retirement - Company records/data meeting the approved retention time are tagged for physical deletion of the associated repository per approved procedure.
38 Business intelligence (BI) refers to the procedural and technical infrastructure that collects, stores, and analyzes the data produced by a company's activities. BI is a broad term that encompasses data mining, process analysis, performance benchmarking, and descriptive analytics.