Vicky Leona Chung






A WORK IN PROGRESS RESEARCH PROJECT
presented at MLA’24, Portland, Oregon

Exploring synthetic data and machine learning for data-intensive health research





Research Questions

  1. What are the issues around sensitive health data in the Canadian research context?

  2. Can synthetic data and machine learning (a) alleviate privacy concerns that surround the use of sensitive data, and (b) support data reusability for health research?

  3. What are potential models for multidisciplinary and/or interdisciplinary research collaboration, including data management and data stewardship, for synthetic health data? What are the potential implications for medical librarians and informationists?

Methods

This research project is a narrative review of both scholarly literature and grey literature.

  • Scholarly literature: Structured searches of medical, library and information science, computer science, and interdisciplinary databases for the years 2014 to 2024.

  • Grey literature: Combination of structured searching (preprint servers, Canadian policy literature), and serendipitous discovery (institutional presentations, email alerts set for new preprints).

Preliminary Findings

The creation, use, and sharing of synthetic data is a complex process requiring collaboration between multidisciplinary teams, where librarians and informationists can play a pivotal role. Effective research data management and data stewardship practices will need to be able to (a) speak specifically to the nuances and ethics of synthetic data, (b) adapt to rapid technical developments, and (c) have mechanisms in place to assess synthetic data for its fidelity and sensitivity.

Next Steps

Develop scenarios as a research methodology to address project questions related to policy and data management implementation.  
                                                                                                                                                    




Selected References

Chen H, Grossman M, Sen A, Tsao SF. Establishing a FAIR, CARE, and Efficient Synthetic Health Data Sharing Ecosystem for Canada. 2023. Available from: https://www.researchgate.net/publication/375446378

Choi BCK, Pak AWP. Multidisciplinarity, interdisciplinarity and transdisciplinarity in health research, services, education and policy: 1. Definitions, objectives, and evidence of effectiveness. Clin Invest Med. 2006 Dec;29(6):351–64. Available from: https://pubmed.ncbi.nlm.nih.gov/17330451

Douglas CMW, Panagiotoglou D, Dragojlovic N, Lynd L. Methodology for constructing scenarios for health policy research: The case of coverage decision-making for drugs for rare diseases in Canada. Technological Forecasting and Social Change. 2021 Oct 1;171:120960. DOI: https://doi.org/10.1016/j.techfore.2021.120960

Kim J. Scenarios in information seeking and information retrieval research: A methodological application and discussion. Library & Information Science Research. 2012 Oct 1;34(4):300–7. DOI: https://doi.org/10.1016/j.lisr.2012.04.002

Ramirez R, Mukherjee M, Vezzoli S, Kramer AM. Scenarios as a scholarly methodology to produce “interesting research.” Futures. 2015 Aug 1;71:70–87. DOI: http://dx.doi.org/10.1016/j.futures.2015.06.006

Tsao SF, Sharma K, Noor H, Forster A, Chen H. Health Synthetic Data to Enable Health Learning System and Innovation: A Scoping Review. In: Hägglund M, Blusi M, Bonacina S, Nilsson L, Cort Madsen I, Pelayo S, et al., editors. Studies in Health Technology and Informatics [Internet]. IOS Press; 2023 [cited 2023 Nov 15]. Available from: https://ebooks.iospress.nl/doi/10.3233/SHTI230063

Yan C, Zhang Z, Nyemba S, Li Z. Generating synthetic electronic health record data using generative adversarial networks: A tutorial (Preprint) [Internet]. JMIR AI; 2023 Sep [cited 2023 Nov 7]. Available from: http://preprints.jmir.org/preprint/52615

Yoon J, Mizrahi M, Ghalaty NF, Jarvinen T, Ravi AS, Brune P, et al. EHR-Safe: generating high-fidelity and privacy-preserving synthetic electronic health records. npj Digit Med. 2023 Aug 11;6(1):141. DOI: https://doi.org/10.1038/s41746-023-00888-7
.




Acknowledgements

Thank you to the MLA Research Training Institute (RTI) faculty, peer mentors, and my fellow participants from the 2023 cohort and student fellow group. Special mention to Mark MacEachern, Andy Hickner, Dr. Emily Vardell, Dr. Ana Cleveland, and Susan Lessick. Thank you to my colleagues at University of Waterloo Library, especially Sandra Keys and Anneliese Eber. 

My participation in the RTI progam has been made possible, in part, by the MLA scholarship fund. Travel support has been provided by the Faculty of Information and Media Studies at Western University. My gratitude to these organizations for their generous support. 





Project Progress Log

Last updated: May 20, 2024

Please check back for quarterly project updates; next update August 2024.