Optimization of fuzzy c-means (FCM) clustering in cytology image segmentation using the gray wolf algorithm

Background Image segmentation is considered an important step in image processing. Fuzzy c-means clustering is one of the common methods of image segmentation. However, this method suffers from drawbacks, such as sensitivity to initial values, entrapment in local optima, and the inability to distinguish objects with similar color intensity. This paper proposes the hybrid Fuzzy c-means clustering and Gray wolf optimization for image segmentation to overcome the shortcomings of Fuzzy c-means clustering. The Gray wolf optimization has a high exploration capability in finding the best solution to the problem, which prevents the entrapment of the algorithm in local optima. In this study, breast cytology images were used to validate the methods, and the results of the proposed method were compared to those of c-means clustering. Results FCMGWO has performed better than FCM in separating the nucleus from the other dark objects in the cell. The clustering was validated using Vpc, Vpe, Davies-Bouldin, and Calinski Harabasz criteria. The FCM and FCMGWO methods have a significant difference with respect to the Vpc and Vpe indexes. However, there is no significant difference between the performances of the two clustering methods with respect to the Calinski-Harabasz and Davies-Bouldin indices. The results indicate the better efficacy of the proposed method. Conclusions The hybrid FCMGWO algorithm distinguishes the cells better in images with less detail than in images with high detail. However, FCM exhibits unacceptable performance in both low- and high-detail images.


Background
Image segmentation is the division of an image into discrete regions such that the pixels inside each region have the highest similarity and those across different regions have the highest contrast [1]. Threshold-based, edgebased, region-based, matching-based, clustering-based segmentation, segmentation based on fuzzy inference and generalized principal component analysis are image segmentation techniques [2]. Each of these methods has advantages and disadvantages. Consequently, none of them can be considered a comprehensive image segmentation algorithm [2]. Image segmentation can be considered a classification problem. Hence, machinelearning-based classification algorithms can be of great help in this area [3]. Unsupervised and supervised Learnings are two completely different areas in the spectrum of machine learning and pattern recognition methods. In supervised segmentation, a set of pixel-level images and labels are used, and the goal is to train a system that classifies known class labels for image pixels. The disadvantage of supervised learning techniques is that models are limited to learning from labeled datasets which are often expensive, time-consuming, and sometimes difficult to produce. This issue is more acute in the medical image processing field because the producing high quality datasets requires the effort of experienced and skilled human observers. On the other hand, in ground truth, the accuracy of the assessment depends on two important factors. First, one needs to design or have a proper ground truth, and second, one needs to choose appropriate similarity criteria for the problem being considered. A popular technique is to compare automated techniques with a group of human experts. In this context, one assumed that human evaluators have prior knowledge of ground truth, which is reflected in their manual tracing. Unfortunately, human evaluators may make mistakes and considerations of accuracy and variability must be taken into account. After creating ground truth, the main task of evaluation is to measure the similarity between automatic segmentation and reference. It is yet unclear whether a set of general measurements can be used for all segmentation problems.
Unlabeled datasets require less human effort to create and are easier to obtain. Also, in unsupervised classification, an image is divided into as many meaningful areas as possible without any prior knowledge. In the unsupervised method, there are no training images or ground truth labels of pixels beforehand. Therefore, the number of unique cluster labels must be consistent with the image content.
Fuzzy c-means clustering methods have great potential to extracting detailed features from image pixels. Fuzzy c-means (FCM) clustering is one of the important unsupervised learning algorithms. It requires knowledge of the initial details of some of the parameters, such as the number of clusters and the position of the centroid of the clusters, and its performance depends on the input parameters. Some researchers proposed various methods for estimating the number of clusters or cluster centroids [4,5]. Moreover, FCM is sensitive to noise and entrapment in local optima. Various metaheuristic methods have been used to optimize the objective function of the fuzzy algorithm in order to avoid entrapment in local optima. Also, FCM fails in distinguishing objects with similar color intensity in images on its own. To overcome the mentioned issues, the Gray wolf optimization (GWO) was used for optimization in this research [6]. The combined use of FCM and GWO to find the optimal cluster centers improves the cluster performance. The main criterion for selecting the best algorithm for medical images is the accuracy of the algorithms. Reducing complexity is the next goal in medical image processing. Therefore, the present study aims to combine FCM with the GWO. Using this combination prevents entrapment in local optima and better optimizes the cluster centers obtained from FCM. In addition, the clustering will be more capable of distinguishing the nucleus from the cytoplasm and other dark-colored cell features in breast cancer cytology images.

Results
The image segmentation results can be seen in Fig. 1. In all the analyzed images, FCMGWO performed better than FCM in separating the nucleus from the other dark objects in the cell. The points corresponding to the nucleus and other dark objects, such as cytoplasm and red blood cells, have been considered as one cluster by FCM. In FCMGWO, however, these points have been designated as the nucleus, and the other objects have been distinguished as two clusters. The performances of the FCMGWO and FCM in segmenting cytology images were compared. The performance of the algorithms was evaluated using V pc , V pe , DB, and CH validation indices. The clustering result is acceptable when V pc and CH are maximum and V pe and DB are minimum (Fig. 2). A study of the indices presented in Fig. 2 reveals the superiority of FCMGWO over FCM with V pe and V pc criteria for all images. According to the CH index, FCMGWO is better than FCM for images 3 and 4, while FCM is better than FCMGWO for images 1 and 2.
The paired t-test was used to compare the significance between the indices. The normality of the V pc , V pe , DB, and CH indices was examined using the Shapiro-Wilk test (p -value > 0.05). Then, the paired t-test was used to investigate the significance of the differences in the indices ( Table 1). The FCM and FCMGWO methods have a significant difference with respect to the V pc and V pe indices. However, there is no significant difference between the performances of the two clustering methods with respect to the DB and CH indices.

Discussion
Changes in the structure of the nucleus are the morphological hallmark of cancer diagnosis and most of the criteria of malignancy are seen in the nuclei of the cell. Therefore, it is necessary to separate the nuclei from other parts of the image. The segmentation of images containing objects with similar color intensity is a challenge in image processing. It is difficult to distinguish the cell nucleus from other cell components, such as red blood cells and plasma, in cytology images due to color similarities. The current study proposed the FCMGWO method for this purpose. This technique was compared to the FCM and validated for breast cytology images.
The results indicate that FCM is incapable of identifying the cell nucleus. FCM considers the nucleus and other dark objects in the cell as one cluster and cannot distinguish between them. However, the combined FCMGWO method performs better than FCM in distinguishing the cell nuclei. This better discernment can be due to the search process of the GWO, which optimizes the cluster centers obtained from FCM clustering. This optimization can improve the performance of FCM and overcome some of its shortcomings owing to its high exploration capability and the good agreement between the exploration and exploitation in GWO. The improvement in V pc and V pe using FCMGWO is more statistically significant compared to FCM. However, no significant difference was observed between the DB and CH indexes using the two clustering methods. Based on the CH index, FCMGWO performs better than FCM for images 3 and 4, but FCM is better than FCMGWO for images 1 and 2. The CH index also shows that FCMGWO is better than FCM in images with less detail. However, the DB indices of the two methods are almost identical without differences in image type. Lack of ground truth was our main limitation in this study. Therefore, it is not possible to compare clustering results with indices such as sensitivity and specificity.
In future studies, the algorithm can be tested on other images with similar color intensity. Furthermore, the overall performance of the proposed method in images with more detail can be improved by a fuzzy algorithm modified via adding a more powerful objective function.

Conclusion
The results show that the FCMGWO method performs better on images with less detail than those with more detail. The hybrid algorithm distinguishes the cells better in images with less detail than in images with high detail. However, FCM exhibits unacceptable performance in both low-and high-detail images.

Methods
The image analysis consists of preprocessing and segmentation steps, which will be discussed in detail in subsequent sections.  used to reduce the noise from the camera. After using the median filter, morphological closing is employed to highlight the nucleus of the cell in the images.

Image segmentation
After preprocessing, image segmentation was performed via clustering techniques. Classification of tissue as malignant or benign requires detecting the nucleus in the cytology images. This is a challenging task since the images usually contain overlapping and clustered objects. In this study, FCM and FCMGWO clustering were used for image segmentation.

FCM clustering
FCM is a powerful unsupervised method for data analysis. This technique is most widely used in image segmentation [9]. FCM aims to divide the data inside the subspaces according to the distance criterion [5]. The objects at the boundaries between different classes do not have to belong fully to one class but are rather assigned membership degrees between 0 and 1 [9]. FCM clustering was introduced by Bezdek in 1973. The objective function of FCM is defined as follows [10].
where m represents the degree of fuzziness and is a real number greater than 1, u ij is the membership degree of the i th datum in the j th cluster, x i denotes the data points, and c j is the cluster center. Also, � • � represents the Euclidean distance, n is the number of data points, and c denotes the number of clusters [11].

Actual images
Imprint touch breast cytology images were utilized to examine the segmentation methods. All the images were confirmed by a pathologist. The images were produced with a magnification of 400x.

Histological preparation and staining
Preparing a smear The image for digital analysis was generated by a echo-LAB camera mounted atop an echoLAB microscope. They were first resized to 800*600 to reduce the processing time. All the analyses were performed using Python 3.8 and SPSS 26.

Image preprocessing
Image noise is the random change in the brightness or color data of an image [7] and can severely deteriorate image quality [8]. In addition to denoising, preserving the edges and details of an image plays a key role in image processing [8]. In this study, a median filter of size 5 is The initial parameters were initialized as follows: Number of clusters = 3; Fuzziness factor = 1.5; Number of iterations = 5.

Mathematical modeling of GWO Encircling the prey
Gray wolves encircle the prey during hunting. The following equations are proposed to model the encirclement behavior [6]: where where t is the current number of iterations, − → A and − → C are the coefficient vectors, − → X p is the position vector of the prey, and − → X is the position vector of the gray wolf

Gray wolf optimization
Optimization is a common method in machine learning for searching for the best solution or a sufficiently good solution. The GWO is a heuristic swarm intelligence optimization algorithm introduced by Mirjalili et al. in 2014 [6]. The best, second-best, and third-best responses are recorded as alpha, beta, and delta, respectively, and the rest of the wolves are considered as omega [12]. Optimization algorithms require exploration and exploitation in a search space. In GWO, exploration refers to when the wolf leaves the initial search path in a specific context and turns to a new direction [12]. Exploitation refers to when the wolf searches more accurately in the initial search path in a specific context [12]. An optimization algorithm requires a good agreement between the exploration and exploitation steps for successful implementation [13]. GWO has a high exploration capability in finding the best solution for the problem. This capability prevents the entrapment of the algorithm in local optima [6].