Vicky Leona Chung
Exploring synthetic data and machine learning for data-intensive health research
- What are the issues around sensitive health
data in the Canadian research context?
-
Can synthetic data and machine learning (a) alleviate privacy concerns that surround the use of sensitive data, and (b) support data reusability for health research?
- What are potential models for multidisciplinary and/or interdisciplinary research collaboration, including data management and data stewardship, for synthetic health data? What are the potential implications for medical librarians and informationists?
Methods
This research project is a narrative review of both scholarly literature and grey literature.
- Scholarly literature: Structured searches of medical, library and information science, computer science, and interdisciplinary databases for the years 2014 to 2024.
- Grey literature: Combination of structured searching (preprint servers, Canadian policy literature), and serendipitous discovery (institutional presentations, email alerts set for new preprints).
Preliminary Findings
The creation, use, and sharing of synthetic data is a complex process requiring collaboration between multidisciplinary teams, where librarians and informationists can play a pivotal role. Effective research data management and data stewardship practices will need to be able to (a) speak specifically to the nuances and ethics of synthetic data, (b) adapt to rapid technical developments, and (c) have mechanisms in place to assess synthetic data for its fidelity and sensitivity.
Next Steps
Develop scenarios as a research methodology to address project questions related to policy and data management implementation.
Selected References
Choi BCK, Pak AWP. Multidisciplinarity, interdisciplinarity and transdisciplinarity in health research, services, education and policy: 1. Definitions, objectives, and evidence of effectiveness. Clin Invest Med. 2006 Dec;29(6):351–64. Available from: https://pubmed.ncbi.nlm.nih.gov/17330451
Douglas CMW, Panagiotoglou D, Dragojlovic N, Lynd L. Methodology for constructing scenarios for health policy research: The case of coverage decision-making for drugs for rare diseases in Canada. Technological Forecasting and Social Change. 2021 Oct 1;171:120960. DOI: https://doi.org/10.1016/j.techfore.2021.120960
Kim J. Scenarios in information seeking and information retrieval research: A methodological application and discussion. Library & Information Science Research. 2012 Oct 1;34(4):300–7. DOI: https://doi.org/10.1016/j.lisr.2012.04.002
Ramirez R, Mukherjee M, Vezzoli S, Kramer AM. Scenarios as a scholarly methodology to produce “interesting research.” Futures. 2015 Aug 1;71:70–87. DOI: http://dx.doi.org/10.1016/j.futures.2015.06.006
Tsao SF, Sharma K, Noor H, Forster A, Chen H. Health Synthetic Data to Enable Health Learning System and Innovation: A Scoping Review. In: Hägglund M, Blusi M, Bonacina S, Nilsson L, Cort Madsen I, Pelayo S, et al., editors. Studies in Health Technology and Informatics [Internet]. IOS Press; 2023 [cited 2023 Nov 15]. Available from: https://ebooks.iospress.nl/doi/10.3233/SHTI230063
Yan C, Zhang Z, Nyemba S, Li Z. Generating synthetic electronic health record data using generative adversarial networks: A tutorial (Preprint) [Internet]. JMIR AI; 2023 Sep [cited 2023 Nov 7]. Available from: http://preprints.jmir.org/preprint/52615
Yoon J, Mizrahi M, Ghalaty NF, Jarvinen T, Ravi AS, Brune P, et al. EHR-Safe: generating high-fidelity and privacy-preserving synthetic electronic health records. npj Digit Med. 2023 Aug 11;6(1):141. DOI: https://doi.org/10.1038/s41746-023-00888-7
.
Acknowledgements
My participation in the RTI progam has been made possible, in part, by the MLA scholarship fund. Travel support has been provided by the Faculty of Information and Media Studies at Western University. My gratitude to these organizations for their generous support.
Project Progress Log
Please check back for quarterly project updates; next update August 2024.