ru24.pro
News in English
Июль
2024

Flexibility of a large blindly synthetized avatar database for occupational research: Example from the CONSTANCES cohort for stroke and knee pain

0

by Marc Fadel, Julien Petot, Pierre-Antoine Gourraud, Alexis Descatha

Objectives

Though the rise of big data in the field of occupational health offers new opportunities especially for cross-cutting research, they raise the issue of privacy and security of data, especially when linking sensitive data from the field of insurance, occupational health or compensation claims. We aimed to validate a large, blinded synthesized database developed from the CONSTANCES cohort by comparing associations between three independently selected outcomes, and various exposures.

Methods

From the CONSTANCES cohort, a large synthetic dataset was constructed using the avatar method (Octopize) that is agnostic to the data primary or secondary data uses. Three main analyses of interest were chosen to compare associations between the raw and avatar dataset: risk of stroke (any stroke, and subtypes of stroke), risk of knee pain and limitations associated with knee pain. Logistic models were computed, and a qualitative comparison of paired odds ratio (OR) was made.

Results

Both raw and avatar datasets included 162,434 observations and 19 relevant variables. On the 172 paired raw/avatar OR that were computed, including stratified analyses on sex, more than 77% of the comparisons had a OR difference ≤0.5 and less than 7% had a discrepancy in the statistical significance of the associations, with a Cohen’s Kappa coefficient of 0.80.

Conclusions

This study shows the flexibility and the multiple usage of a synthetic database created with the avatar method in the particular field of occupational health, which can be shared in open access without risking re-identification and privacy issues and help bring new insights for complex phenomenon like return to work.