Warning: fopen(/home/virtual/epih/journal/upload/ip_log/ip_log_2024-11.txt): failed to open stream: Permission denied in /home/virtual/lib/view_data.php on line 95 Warning: fwrite() expects parameter 1 to be resource, boolean given in /home/virtual/lib/view_data.php on line 96
1Department of Applied statistics, Yonsei University College of Commerce and Economics, Seoul, Korea
2Department of Statistics and Data Science, Yonsei University College of Commerce and Economics, Seoul, Korea
3Institute for Health and Society, Hanyang University, Seoul, Korea
4Department of Preventive Medicine, Hanyang University College of Medicine, Seoul, Korea
5Division of Infectious Disease, Department of Internal Medicine, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
6Department of Internal Medicine and AIDS Research Institute, Yonsei University College of Medicine, Seoul, Korea
7Department of Internal Medicine, Kyungpook National University School of Medicine, Daegu, Korea
8Division of Infectious Diseases, Department of Internal Medicine, Korea University College of Medicine, Seoul, Korea
9Division of Infectious Disease, Department of Internal Medicine, Incheon St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Incheon, Korea
10Division of Viral Disease Research Center for Infectious Disease Research, Korea National Institute of Health, Cheongju, Korea
©2020, Korean Society of Epidemiology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
CONFLICT OF INTEREST
The authors have no conflicts of interest to declare for this study.
FUNDING
This study was supported by a grant for the Chronic Infectious Disease Cohort Study (Korea HIV/AIDS Cohort Study) from the Korea Centers for Disease Control and Prevention (2016-ER5103-02).
AUTHOR CONTRIBUTIONS
Conceptualization: SMK, YC, BYC. Data curation: SIK, JYC, SWK, JYS, YJK, Korea HIV/AIDS Cohort Study. Formal analysis: SMK, YC, MK. Funding acquisition: MKK, MY, JGL. Methodology: SMK, YC. Project administration: SIK. Visualization: SMK, YC. Writing – original draft: SMK, YC. Writing – review & editing: SMK, YC, BYC, MK, SIK, JYC, SWK, JYS, YJK, MKK, MY, JGL, BYP.
Variables | 10th HIV/AIDS Cohort Study | 14th HIV/AIDS Cohort Study |
---|---|---|
Data collection duration (visit) | Dec 2006-Dec 2014 | Dec 2006-Jul 2019 |
Data cleaning duration (mo) | May 2015-Nov 2015 | Jul 2019-Nov 2019 |
Organization | 16 hospitals | 15 hospitals |
Expected errors to be handled1 | 3,914 | 1,803 |
Corrected errors2 | 2,051 | 1,274 |
Non-errors3 | 629 | 397 |
Unconfirmed4 | 19 | 132 |
Rate of data cleaning (%)5 | 68.5 | 92.7 |
Step | Strategy | Type | Management period | Method | Possible error when unexecuted |
---|---|---|---|---|---|
Pre-phase of the data collection | Development and revision of CRF | QA | Annually (if necessary) | During the survey process occasionally collect the opinion of occurrence situation after then development and apply at the next year | Failure to apply changing AIDS dynamics trends and continued collection of data using incorrect surveys limits the research topic |
Unifying code values of each variables | QA | Annually (if necessary) | Code values are defined to match the code values of the developed CRF, DB, and downloaded data | It is difficult to derive simple errors, and there is a limitation in using data | |
Establish standardized investigation guidelines | QA | Annually (if necessary) | Discuss with clinical experts about investigation guidelines of each questionnaire | Collected data by multiple researchers in multi-institution may not be combined as the same definition | |
Development of logic for detecting errors | QC | Occasionally | Develop logical errors based on clinical and epidemiological facts, including those that differ from the developed guidelines or are out of the code value | There are limitations in the study due to inconsistent relationship errors based on clinical and epidemiological evidence | |
Education and distribution of standardized investigation guidelines | QA | Annually | Annually at the start of the study (regular), when new researchers are hired (occasional), and when the survey is changed during the study or the expected error value is high (supplementary) conduct standardization training with distribution of standardized investigation guidelines | Each investigator has a different understanding of the survey guidelines (documents), which can lead to errors | |
Phase of the data collection | DB monitoring | QA | Occasionally | Review the data collected in the DB in real time and feedback it to the investigator | Since the error occurred at the time of data entry could not be corrected immediately, additional work is required |
Management of repeated survey rates | QC | Quarterly | Confirmed and distributed of the tracking and loss to follow-up rates | There is no management of participants' dropouts | |
Management of DB logic | QC | Occasionally | Developing DB logic to prevent possible errors during the DB input confirmed through real-time monitoring | Monitoring and data cleaning are difficulties when the DB cannot block the errors that occur at the time of data entry | |
Post-phase of the data collection | Raw data cleaning | QC | Annually | Refining within or between the sequence errors including simple errors for all collected data | Various errors may cause errors in the analysis phase or study limitations that subjects are excluded |
Standardization of the descriptive questions | QC | Every 2 yr | Defined as a new variable by standardizing descriptive response values that are difficult to identify and to calculate frequency; Existing variables can be left to compare | The same narrative response value is recognized differently, or the research results are derived differently by the researcher's operational definition | |
Re-survey | QC | If necessary | When there is a change in the survey questionnaire or when the error is high, the historical data is re-confirmed through a medical record, a direct review of the participants, or a doctor's review | Incorrect data may affect research results | |
Substitution using the external data | QC | Annually | The Centers for Disease Control and Prevention used HIV epidemiological reporting data to replace missing values in some questions | There is a limitation in the use of the research due to missing items in the main questions. | |
Revision of the logic for detecting errors | QC | Annually | Revision of the logical error for data cleaning when the expected error value decreases by applying DB logic | Detected inaccurate expected errors due to incorrect logic | |
Cleaned raw data usage guidelines | QA | Annually | Develop guidelines for refined data usage method, including CRFs and defined code values | It is difficult to know the data accurately, which affects the results | |
Development of the DMP | QA | Annually | Develop and revision of the DMP based on data quality control strategy and progress. | Manage data quality systematically without missing procedures. | |
Annual statistical report | - | Annually | Descriptive statistics are calculated annually using refined data | Difficult to identify characteristics of study participants | |
Evaluation of the research feasibility and statistical support (internal researchers only) | - | Occasionally | The feasibility of the study was reviewed by reviewing the number of subjects and event cases according to the research topic; Consultation and support are provided for the design and analysis of epidemiological and statistical analysis specialists | Prevent delays in deriving research outcomes as data limitations. |
Response rate (%) | Within from diagnosis date |
Within from initial ART |
||
---|---|---|---|---|
CD4 | HIV RNA | CD4 | HIV RNA | |
30 d | 38.8 | 35.1 | 64.5 | 60.1 |
90 d | 59.2 | 55.7 | 74.4 | 70.5 |
180 d | 66.6 | 64.6 | 75.7 | 72.3 |
270 d | 70.9 | 68.7 | 76.0 | 72.6 |
1 yr | 73.6 | 71.0 | 76.1 | 72.9 |
Variables | 10th HIV/AIDS Cohort Study | 14th HIV/AIDS Cohort Study |
---|---|---|
Data collection duration (visit) | Dec 2006-Dec 2014 | Dec 2006-Jul 2019 |
Data cleaning duration (mo) | May 2015-Nov 2015 | Jul 2019-Nov 2019 |
Organization | 16 hospitals | 15 hospitals |
Expected errors to be handled |
3,914 | 1,803 |
Corrected errors |
2,051 | 1,274 |
Non-errors |
629 | 397 |
Unconfirmed |
19 | 132 |
Rate of data cleaning (%) |
68.5 | 92.7 |
QA, quality assurance; QC, quality control; CRF, case report form; DMP, data management plan; AIDS, acquired immune deficiency syndrome; DB, database; HIV, human immunodeficiency virus.
ART, antiretroviral therapy; HIV, human immunodeficiency virus.
Derived value through the data cleaning procedure using the logic. The value identified as an error and corrected. The value was derived as an error, however actual observed value. The error value unverified yet.