Pnuemonia Detection
Abstract
Within our blog post, we created a neural network and implemented three different binary classifers trained on chest x-ray image data to detect pneumonia based on images. We used convolution layers to convert images into latent vectors by which we could feed into our various machine learning models: a transformer, an SVM, and a gradient boosting model. Through analyzing the accuracy of each model, we discovered a similar accuracy between the models of around 78% on testing data.
Introduction
Within our project, we wanted to compare 3 seperate binary classification machine learning models, seeing which is best for an image classification task. Our project attempts to uncover what types of algorithms are best for binary image classification tasks using the pneumonia chest xray dataset, with our models being trained to discern pneumonia based on chest xray images only. This dataset demonstrates a case where finding the most optimal image classifcation algorithm is very important as it could result in saving a life. Our research could also inform which types of algorithms should be considered other important image classification tasks. We found five studies the explored other work done by scholars
In the study by M. R. Rahman, Islam, and Islam (2023), the researchers mainly focus on deep learning algorithms to tackle this same image classification task. Through their research, they discovered the MobileNet CCN gave the best accuracy on two datasets with values of 94.23% and 93.75%.
In another study titled Rajpurkar et al. (2017), researchers create their own CNN known as CheXNet that detects pneumonia as well as other chest related illnesses (fibrosis, hernia, etc.) that which accuracies ranging from 0.7 to 0.9. With such a large focus on CNNs for this image classification, ww wanted to determine if other kinds of algorithms good for binary classifcation could also be useful image classifiers.
Recent research has expanded the exploration of deep learning architectures for pneumonia detection. For instance, a study by Singh, Thomas, and Jaiswal (2021) compared a custom convolutional neural network (CNN) and a multilayer perceptron (MLP) for classifying chest X-ray images. The CNN achieved an accuracy of 92.63%, outperforming the MLP’s 77.56%, highlighting the efficacy of CNNs in medical image classification tasks (Singh, Thomas, and Jaiswal (2021)).
Another study by T. Rahman et al. (2020) employed transfer learning with various pre-trained CNN models, including AlexNet, ResNet18, DenseNet201, and SqueezeNet, to classify chest X-ray images into normal, bacterial pneumonia, and viral pneumonia categories. Their approach achieved classification accuracies of 98% for normal vs. pneumonia, 95% for bacterial vs. viral pneumonia, and 93.3% for normal, bacterial, and viral pneumonia, demonstrating the potential of transfer learning in enhancing diagnostic performance.
Furthermore, a study by Mabrouk et al. (2023) proposed an ensemble learning approach combining DenseNet169, MobileNetV2, and Vision Transformer models for pneumonia detection in chest X-ray images. Their ensemble model achieved an accuracy of 93.91% and an F1-score of 93.88%, indicating that integrating multiple deep learning architectures can improve classification performance.
Values Statement
The potential users of our project would be primary care clinicians and radiologists who must regularly discern chest-related illnesses through X-rays. These machine learning models trained on chest X-ray image data may help them make more informed decisions if they are trying to discern specifically pneumonia.
I believe that our work contributes to AI researchers who are studying how to optimize for performance in image classification tasks, especially regarding medical concerns. If it can inform medical researchers on what machine learning models are best at medical image classification, they and their patients can also benefit from greater accuracy in detecting chest-related illnesses.
Because our models are quite poor at predicting images without pneumonia correctly, they could falsely flag patients as having pneumonia, which may lead them to incur unnecessary medical expenses. Based on the background of these patients, this could seriously affect patients who struggle financially.
Our group personally enjoyed and had an interest in each of the algorithms that we worked on and took this project as a learning experience to expand our knowledge on what image vectorization and binary classification algorithms are out there and how they differ from what we have learned through our class assignments.
Based on our experiments, we believe if our project can help inform image classification tasks, especially those in the medical field, then the world can become a better place by being able to help people detect illnesses earlier and possibly save lives.
Materials
Our data comes from the Pneumonia Chest X-ray dataset on Kaggle. This data came from the Guangzhou Women and Children’s Medical Center. Samples were collected from patients and labels were created by pneumonia specialists, with two specialists making labels and then a third corroborating the label of normal or pneumonia. Our data lacks information regarding the severity or time span of the pneumonia for positive cases, meaning that the model has no clear way of understanding which X-rays should be encoded closer or further away from the normal cases. Additionally, the dataset has a 64% / 36% split, with the majority of X-rays containing positive cases of pneumonia. This bias happens to work out well for mitigating false negatives; however, it makes models have more difficulty understanding when an X-ray is normal.
Results
As demonstrated before, the models contained much higher precision rates than recall in order to catch more of the positive pneumonia cases due to their costliness as compared to the costs associated with missing a normal case. Within the models, the transformer did the best, with the highest recall and precision of 93% and 41% respectively. The F-1 score of 57% suggests that the model was beginning to learn differences between the classes but still encountered much difficulty. This is also present in the 3-D PCA plot of the latent vectors where it becomes evident that many of the embeddings are caught in an overlapping region where both classes meet. The results suggest that the image embeddings need more fine-tuning to increase accuracy and recall.
Conclusions
The project accomplished many of the goals that we set out to accomplish during the duration of this project and also failed to meet others. We got a working convolutional neural network to embed the images and learn important features of those images. We correctly identify 93% of all pneumonia cases. On the other hand, we correctly identify less than half of all normal cases. This project demonstrates the difficulty of complex machine learning tasks without good computational resources. Running and auditing the CNN alone takes two hours per run with a GPU. Due to this constraint, we were unable to readily take advantage of all of the data available. Additionally, the binary classification models also took 5–15 minutes depending on the model. The most apparent hurdle in this project was creating a complex model while also being able to run it in a reasonable amount of time. Other pneumonia binary classification projects are able to get higher accuracy through the usage of pre-made ResNet models. These models are trained on millions of images and use residual connections to improve the performance of neural networks. If we had more time, we would do a more thorough error analysis of misclassified normal images to understand what features the model is missing and improve the architecture to capture that feature. Additionally, we would utilize more of the training data without run-time constraints and try adopting residual neural network architecture to improve performance.