A continuous, in-situ, near-time fluorescence sensor coupled with a machine learning model for detection of fecal contamination risk in drinking water

Published in Water Research, June 2022

We designed and validated a sensitive, continuous, in-situ, remotely reporting tryptophan-like fluorescence sensor and coupled it with a machine learning model to predict high-risk fecal contamination in water (10 colony forming units (CFU)/100mL E. coli). We characterized the sensor’s response to multiple fluorescence interferents with benchtop analysis. The sensor’s minimum detection limit (MDL) of tryptophan dissolved in deionized water was 0.05 ppb (p < 0.01) and its MDL of the correlation to E. coli present in wastewater effluent was 10 CFU/100 mL (p < 0.01). Fluorescence response declined exponentially with increased water temperature and a correction factor was calculated. Inner filter effects, which cause signal attenuation at high concentrations, were shown to have negligible impact in an operational context. Biofouling was demonstrated to increase the fluorescence signal by approximately 82% in a certain context, while mineral scaling reduced the sensitivity of the sensor by approximately 5% after 24 hours with a scaling solution containing 8 times the mineral concentration of the Colorado River. A machine learning model was developed, with TLF measurements as the primary feature, to output fecal contamination risk levels established by the World Health Organization. A training and validation data set for the model was built by installing four sensors on Boulder Creek, Colorado for 88 days and enumerating 298 grab samples for E. coli with membrane filtration. The machine learning model incorporated a proxy feature for fouling (time since last cleaning) which improved model performance. A binary classification model was able to predict high risk fecal contamination with 83% accuracy (95% CI: 78% – 87%), sensitivity of 80%, and specificity of 86%. A model distinguishing between all World Health Organization established risk categories performed with an overall accuracy of 64%. Integrating TLF measurements into an ML model allows for anomaly detection and noise reduction, permitting contamination prediction despite biofilm or mineral scaling formation on the sensor’s lenses. Real-time detection of high risk fecal contamination could contribute to a major step forward in terms of microbial water quality monitoring for human health.