Project name: Analytics for Hemoglobin Count Based on Image Data
 
Motivation
Medicine has been a classic case of practical data analysis for centuries, both for diagnosis and cure through clinical trials. Computer vision, having experienced massive growth starting with the penetration of deep-learning, has also been a boon for medical diagnostics. Just the last decade has accounted for over 90% of the total articles published in Computer Vision-based Analytical Chemistry (CVAC) [1].

The general procedures of measuring hemoglobin count is limited by safe, quick and efficient infrastructure required for phlebotomy and lab testing which pose great challenges in times of epidemics. Computer vision based diagnostics holds promise here.

Problem
CVAC procedures are quite in their nascent conceptual stage and reasonable conclusions can only be derived through controlled experiments. The project envisioned building a statistical model for Hemoglobin prediction from images of eye conjunctiva taken on cattle in the field and in the sheep-pens. Over 100 images for sheep’s palpebral conjunctiva were analyzed and the RGB (red-green-blue) channels from the region of interest were extracted. Gyan Data was tasked with building a predictive model for Hemoglobin count based on the RGB data and a host of derived parameters with little knowledge of the actual application and experimental procedures. Our initial data analysis helped formulate the actual problem definition. Our research revealed the necessity for a controlled environment during image acquisition which was observed to be uncontrolled in the project.

Solution
Initial attempts focused on applying a host of model building techniques as simple as ordinary least-square (OLS) to more advanced kernel-Principal Component Regression (kPCR) for the small data-set. These efforts were preceded by data visualization and exploration that together confirmed the inadequacy of the naive regression methods. Though deep-learning models can outperform traditional regression methods, they often require domain specific data in large quantities for production ready applications. This project though involved a few hundred samples that would not be enough to build a reasonable deep-learning model.

After exploring published literature on the matter, we discovered certain issues while acquiring images during the project. Uneven lighting severely impacted the R, G, B channels of the image of palpebral conjunctiva. Consequently we suggested controlled lighting and a photographic standard for subsequent experiments. Based on the approach followed in the published literature [2], an OLS model was built with derived features that were based on RGB channels in the conjunctiva image scaled by a photographic standard that was available in all images. In order to have uniform lighting though, this approach could only be applied on a reduced data set. Subsequently, we recommended some best practices for future experiments.

References
[1] Luis Ferm´ın Capit´an-Vallvey, Nuria L´opez-Ruiz, Antonio Mart´ınez-Olmos, Miguel M. Erenas, and
Alberto J. Palma. Recent developments in computer vision-based analytical chemistry: A tutorial review.
Analytica Chimica Acta, 899(Supplement C):23 – 56, 2015.
[2] Selim Suner, Gregory Crawford, John McMurdy, and Gregory Jay. Non-invasive determination of hemoglobin by digital photography of palpebral conjunctiva. The Journal of emergency medicine, 33(2):105–111, 2007.
 
About the author:
Sam Mathew is a graduate from IIT Bombay and has worked in international academic and industrial research teams over the last 15 years. His domain of research ranges from nanoparticle synthesis, large-scale fluid and particle simulations and data science. He has published over 5 peer-reviewed articles in international journals including Physical Review E, during his time at IIT Madras. In the last 6 years at Gyan Data, his work has been on data filtering and reconciliation, mixed-integer optimization and time-series modeling applied to a variety of domains like, process industry, power-plants and fuel supply-chain. He is also a consultant for GITAA Pvt. Ltd.