

ORIGINAL ARTICLE 

Year : 2022  Volume
: 8
 Issue : 1  Page : 15 

Application of graphbased features in computeraided diagnosis for histopathological image classification of gastric cancer
Haiqing Zhang^{1}, Chen Li^{1}, Shiliang Ai^{1}, Haoyuan Chen^{1}, Yuchao Zheng^{1}, Yixin Li^{1}, Xiaoyan Li^{2}, Hongzan Sun^{2}, Xinyu Huang^{3}, Marcin Grzegorzek^{3}
^{1} Microscopic Image and Medical Image Analysis Group, College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China ^{2} Liaoning Cancer Hospital and Institute, Shengjing Hospital, China Medical University, Shenyang, China ^{3} Institute of Medical Informatics, University of Luebeck, Luebeck, Germany
Date of Submission  06Mar2022 
Date of Decision  30Apr2022 
Date of Acceptance  15May2022 
Date of Web Publication  07Jul2022 
Correspondence Address: Chen Li Microscopic Image and Medical Image Analysis Group, College of Medicine and Biological Information Engineering, Northeastern University, Shenyang China
Source of Support: None, Conflict of Interest: None  Check 
DOI: 10.4103/digm.digm_7_22
Background: The gold standard for gastric cancer detection is gastric histopathological image analysis, but there are certain drawbacks in the existing histopathological detection and diagnosis. Method: In this paper, based on the study of computeraided diagnosis (CAD) system, graphbased features are applied to gastric cancer histopathology microscopic image analysis, and a classifier is used to classify gastric cancer cells from benign cells. Firstly, image segmentation is performed. After finding the region, cell nuclei are extracted using the kmeans method, the minimum spanning tree (MST) is drawn, and graphbased features of the MST are extracted. The graphbased features are then put into the classifier for classification. Result: Different segmentation methods are compared in the tissue segmentation stage, among which are LevelSet, Otsu thresholding, watershed, SegNet, UNet and TransUNet segmentation; Graphbased features, Red, Green, Blue features, GreyLevel Cooccurrence Matrix features, Histograms of Oriented Gradient features and Local Binary Patterns features are compared in the feature extraction stage; Radial Basis Function (RBF) Support Vector Machine (SVM), Linear SVM, Artificial Neural Network, Random Forests, kNearestNeighbor, VGG16, and InceptionV3 are compared in the classifier stage. It is found that using UNet to segment tissue areas, then extracting graphbased features, and finally using RBF SVM classifier gives the optimal results with 94.29%. Conclusion: This paper focus on a graphbased features microscopic image analysis method for gastric cancer histopathology. The final experimental data shows that our analysis method is better than other methods in classifying histopathological images of gastric cancer.
Keywords: Gastric cancer, Graph feature, Image classification
How to cite this article: Zhang H, Li C, Ai S, Chen H, Zheng Y, Li Y, Li X, Sun H, Huang X, Grzegorzek M. Application of graphbased features in computeraided diagnosis for histopathological image classification of gastric cancer. Digit Med 2022;8:15 
How to cite this URL: Zhang H, Li C, Ai S, Chen H, Zheng Y, Li Y, Li X, Sun H, Huang X, Grzegorzek M. Application of graphbased features in computeraided diagnosis for histopathological image classification of gastric cancer. Digit Med [serial online] 2022 [cited 2022 Dec 7];8:15. Available from: http://www.digitmedicine.com/text.asp?2022/8/1/15/350175 
Introduction   
Background
Cancer is a disease caused by the uncontrolled growth of cells,^{[1]} while gastric cancer is a group of abnormal cells that gather in the stomach to form tumor. Comprehensive data from recent years show that the incidence and mortality rate of gastric cancer are the thirdhighest among women and the secondhighest among men.
Effective diagnosis of gastric cancer relies on the examination of hematoxylin and eosinstained tissue sections under a microscope by pathologists. Microscopic examination of tissue sections is tedious and timeconsuming, with screening procedures usually taking 5–10 min, making it difficult for pathologists to analyze more than 70 samples a day.^{[2]} Moreover, the potential for incorrect diagnosis is also high. Therefore, a histopathological diagnosis of gastric cancer is important.^{[3],[4]}
Computeraided diagnosis (CAD), the basic concept of it, is to use computed judgments as objective opinions to help pathologists make a diagnosis.^{[5],[6],[7],[8]} The goal of CAD is to improve the quality and efficiency of histopathological images. Moreover, through increasing the accuracy and consistency of image diagnosis, it can reduce image reading time.^{[9],[10],[11],[12],[13],[14]} In the last few decades, a lot of research has been done on the development of CAD systems that can help physicians track cancers.^{[15],[16],[17],[18]} Meanwhile, cancerous gastric histopathological image cells proliferate indefinitely.^{[19]} It causes cancer cells to become denser, and the graph made of plasmas with nuclei is more compact than normal cells. Therefore, the study of graph theory is applied to classify histopathological images with better results.
In this paper, a method of classifying histopathological images of gastric cancer using graphbased features is proposed. The workflow is shown in [Figure 1].  Figure 1: Workflow of the proposed method. RGB: Red, Green, Blue, RF: Random forest, KNN: knearest neighbor, ANN: Artificial neural network, RBF: Radial basis function, SVM: Support vector machine, RGB: Red, green, blue, HOG: Histogram of oriented gradient, GLCM: Graylevel cooccurrence matrix, LBP: Local binary pattern, VGG: Visual geometry group.
Click here to view 
Research content
The method proposed in this paper consists of seven main parts: (a) six different image segmentation methods are compared to obtain the optimal one; (b) cell nuclei are extracted using the kmeans clustering method; (c) the center of mass of the cell nuclei is used as the point to form the graph structure and extract the graphbased features using the minimum spanning tree (MST) algorithm; (d) graphbased features, red, green, blue (RGB) features, graylevel cooccurrence matrix (GLCM) features, local binary pattern (LBP) features, and histogram of oriented gradient (HOG) features are extracted after segmentation for a comparison; (e) based on the previous work, the extracted feature vectors are put into different classifiers: radial basis function (RBF) support vector machine (SVM), linear SVM, artificial neural network (ANN), random forests (RF), and knearest neighbor (KNN); (f) two deep learning comparison experiments are designed; (g) experimental results are obtained by calculating accuracy, precision, recall, specificity, and F1score.
The main contributions of this article are as follows:
 A new framework is designed to introduce graphbased feature image classification method to the field of histopathological image analysis of gastric cancer.
 A large number of comparative experiments are done to demonstrate the effectiveness of our method.
 This method for gastric cancer proposed in this study achieves good results, with a final classification accuracy of 94.29%.
Related Works   
The development of graph theory on computeraided diagnosis
The application of graph theory in CAD can effectively use the topological structure of histopathological images to analyze the information in histopathological images by analyzing the structure of the graph. Moreover, the length and size of each edge and corner in the graph can be used to represent the spatial ordering of different tissues, which can intuitively reflect the content of histopathological images. These data can be better used as the basis for the judgment of pathologists. Therefore, the application of graph theory techniques to CAD analysis of histopathological images has become popular.
Graph theory is applied to extract topological and relational information from collagen frameworks through the integration of deep learning with graph theory.^{[20]} The results are consistent with the expected pathogenicity induced by collagen deposition, demonstrating the potential for clinical application in analyzing various reticular structures in wholeslide histological images.
Computer image processingbased Voronoi diagrams, Gabriel diagrams, and MST are used to represent the spatial arrangement of blood vessels as a way to quantitatively analyze microvessels.^{[21]} Derived features of graph structure are extracted using syntactic structure analysis of graph structurebased derived features. The most discriminative features are found using a KNN classifier.
A large number of color, texture, and morphological features are extracted from stained histopathological images of cervical cancer.^{[22]} In addition, it extracts 29 features such as edge length and area from three graph structures, after which the nuclei are classified using linear discriminant analysis.
Instead of cell nuclei, skeletal nodes are used in histopathological images of cervical cancer, and the work constructs graph structures using MST, extracts several feature values such as the edge and corner, and clusters them using kmeans.^{[23]} In addition, four graph theoretic methods are added as a comparison in the step of constructing the graph structure to select the optimal graphtheoretic method. The deep learning network structure with gradient direction histogram is used to compare with the selected optimal graph theoretic methods, and the optimal method is obtained by the evaluation of doctors.
The development of computeraided diagnosis in gastric cancer
Deep learning is used in many medical image processing tasks, for example, deep learning is used to identify COVID19 samples in chest Xray images.^{[24]} The continuous progress of deep learning algorithms has led to the rapid development of CAD technology in gastric cancer. Currently, the deep learning methods used in the field of gastric cancer mainly include image preprocessing, image segmentation, feature extraction, and image classification methods.
In histopathological image preprocessing of gastric cancer, the work proposes an image classification model, which can alleviate bad annotation training set.^{[25]} Through finetuning the neural network in two stages and introducing a new intermediate dataset, the performance of the network in image classification is improved. The work sets up an imagedenoising network based on CNN and optimizes the denoising network by using the advantages of complex numerical operations to increase the tightness of convolution.^{[26]}
In the image segmentation stage, a new polyp segmentation method based on multidepth codec network combination is proposed.^{[27]} The network can extract the features of different effective receptive fields and multisize the image to represent multilevel information. It can also extract effective information features from the missing pixels in the training phase. A radiologybased deepsupervised UNet is developed for the segmentation of prostate and prostatic lesions.^{[28]} These methods can also be applied to histopathological images of gastric cancer. A highly effective hybrid model for filter bubble partitioning is proposed.^{[29]}
In the stage of feature extraction, the HOG and LBP features are extracted in gastric cancer histopathological images.^{[30]} By comparing, the LBP feature is superior to the HOG feature. A new unsupervised feature selection method that calculates the dependencies between features and avoids selecting redundant features is developed.^{[31]} It can also be used in the histopathological images of gastric cancer to improve operational efficiency.
In terms of classifiers, this article compares RF and ANN. Through extensive experimental comparisons, the ANN classifier outperforms the RF classifier. The standard InceptionV3 network framework is used.^{[32]} The parameters are reduced by changing the depth multiplier. After several iterations, the model with the lowest validation error is finally selected as the final model.
Methods   
UNetbased image segmentation
UNet is an Fully Convolutional Networks (FCN)based semantic segmentation network originally applied to medical cell microscopic image segmentation tasks.^{[33],[34],[35]} The endtoend structure of this network can efficiently recover the information loss at the shallow level due to pooling operations. In addition, the training strategy of UNet network uses a data expansion strategy that efficiently makes full use of the limitedlabeled training data for detraining. UNet contains two parts; the first part is the left half of Ushaped structure for feature extraction, while the second part is the right half of the Ushape, which is the upsampling part. A copy and crop jump connection layer is used before fusion to ensure that more features are fused in the final recovered feature map, which also allows the fusion of features of different sizes, thus allowing multiscale prediction.
[Figure 2] shows UNet structure used in this paper. The network consists of a downsampling (left side) and an upsampling (right side) path. In the downsampling path, a convolution operation using ReLU for activation is first followed by a max pooling operation. After that, the size of the feature map generated by the convolution operation is reduced to half of the original size. This set of operations is repeated four times in the downsampling path. In the upsampling path, each step contains three main operations, which are the upconvolution operation, the copy and stitch operation, and the convolution operation with ReLU for activation. These three operations are repeated a total of four times in the upsampling path. The final segmentation result is generated by a 2 × 2 sized convolution kernel (activated using Sigmoid).  Figure 2: UNet structure. Conv: Convolution. ReLU: Rectified Linear Unit.
Click here to view 
Graph theory and graphbased features
Graph theory takes graphs as the object of the study.^{[36]} A graph is an image consisting of a number of given nodes and an edge connecting the two nodes. Such graphs are usually used to describe a particular relationship between certain things, with the points representing the things and the edges connecting the two nodes, indicating that the corresponding two things have this relationship. A graph usually contains nodes, edges, paths, loops, and weights. An example of a graph is shown in [Figure 3].  Figure 3: An example of a graph with five points and eight weighted edges.
Click here to view 
Graph theory is a graph structure which can record the topological structure in an image by computing the features of the graph structure. There are various ways of graph composition: MST,^{[37]} Delaunay triangulation,^{[38]} Voronoi Diagram,^{[39]} etc., Our previous work finds that the graph formation characteristics of MST are better in comparison with cervical cancer histopathological images.^{[23],[40]} A comparative analysis reveals that the topological information carried by using the MST graph structure in histopathology is the most complete.^{[40]} Hence, the MST graphbased features obtain the optimal result. Therefore, in this work, the MST is chosen as the graph formation method for the graph structure.
This paper proposes a method for image analysis of histopathology of gastric cancer using graphbased features. While observing the experimental data, the cancerous tissues in the histopathological images of gastric cancer are significantly different from the normal images in terms of topological structure. Therefore, we intend to design a method to classify the gastric cancer pathology using topological structure. A large amount of literature shows that the topological structure information in the graph can be gotten by the method of graph theory, and the MST algorithm is chosen as the graphforming method.
In this work, the information of edges and angles is obtained on the MST. The reason why extracting the information of edge lengths and angles is that the graph is composed of nodes and edges and more than 2–3 nodes constitute a corner. Edges and angles are the most basic elements to characterize the graph structure. Edges represent the degree of dispersion between two nodes, and angles represent structural complexity.^{[41]} The information of edges is the edge length of the MST, and the information of angles is to calculate the angle between every two adjacent edges. The MST contains all nodes with the smallest sum of weights in the original graph, which is the least connected subgraph of the Delaunay triangulation. At the level of the sum of edge weights, the value of MST is less than or equal to the sum of the weights of all other spanning trees. As shown in [Figure 4], there are five points ABCDE, and ∠ ABC, ∠ ABD, ∠ CBD, and ∠ BDE can be calculated. Mean, variance, kurtosis, and skewness are found for all edges and angles of each tree, and eight feature values can be output for each histopathological image.  Figure 4: Example figure of a minimum spanning tree composed of cell nuclei.
Click here to view 
Classification methods
In this article, we compare the performance of different kinds of classifiers, including RBF SVM, linear SVM, ANN, KNN, RF, and two classification models in deep learning: visual geometry group16 (VGG16) and InceptionV3. SVM is a binary classifier that uses linearity for classification,^{[42]} and it produces different classification results depending on the kernel function. The linear kernel function, on the other hand, has the advantage of having fewer parameters and fast evolution on linearly separable data. ANN is a model that simulates the information transfer mechanism of neurons,^{[43]} which has the advantage of high accuracy and parallel distribution processing capability but requires many initial parameters and a long training time. KNN is a machine learning a simple classification learning algorithm,^{[44]} which is suitable for multilabel problems, but the accuracy suffers when the number of samples of classes is unbalanced. RF is an integrated learning method in machine learning.^{[45]} RF is simple, easy to implement, and has low computational overhead, but is prone to overfitting.
VGG16 uses smallsized convolutional kernels to reduce the parameters, and the network structure is regular and suitable for parallel acceleration.^{[46]} The main idea of Inception architecture is to find out how to approximate the optimal local sparse knots with dense components. This network model proposed by InceptionV3 gives more accurate feature information when dealing with a larger number of features and features with high feature diversity and also reduces the computational effort.^{[47]}
The SVM classifier with RBF kernel function is chosen for the experiments. The main reasons are: SVM classification is effective, the most fundamental verdict in SVM classification judgments is provided by support vectors, the complexity of the computation is not affected by the number of samples, but mainly by the number of support vectors, so the structure has small storage space, and the algorithm is robust. Furthermore, SVM is a small sample learning method that does not involve concepts such as probability measures, simplifying the usual problems such as classification and regression. In terms of kernel functions, the RBF kernel function is more advantageous on linearly indistinguishable data, and classification is more accurate.
Experiments and Analysis   
Image dataset and experimental setup for gastric cancer
The 2017 Brain of Things (BOT) competition provides 700 histopathological images with 2048 × 2048 resolution. Among them, 560 are histopathological images of gastric cancer and the rest are histopathological images of normal stomach. The training set of UNet segmentation network contains 300 images randomly selected among 560 histopathological images of gastric cancer. 120 are the validation set and the remaining 140 are used as the test set. Then, experimental operations such as feature extraction are performed on the basis of these 140 histopathological images of gastric cancer and 140 histopathological images of normal stomach separately. In the classifier comparison section, data augmentation is performed on the existing data to improve the performance of the classifier.
During the training, only abnormal images have Ground Truth (GT) images, and normal images do not have GT images. However, in the later test, all images are subjected to image segmentation and graphbased features are calculated.
Analysis of image segmentation results
In the image segmentation stage, the same two images are segmented in this paper using six segmentation methods, respectively. In this article, the parameters of UNet are set such that two sets of convolution operations of size 3 × 3, followed by Max Pooling operations of size 2 × 2. The size of the upconvolution operation is 3 × 3, and the third operation is two 1 × 1 convolution operations. These three operations are repeated a total of four times in the upsampling path. The final segmentation result is produced by a convolution kernel of size 2 × 2. The results of the different segmentation methods are shown in [Figure 5].  Figure 5: Comparison of six segmentation methods on the test set for two typical examples (a and b). GT: Ground Truth.
Click here to view 
In [Figure 5], it can be seen that the LevelSet segmentation method is segmented according to the edges of the image, which cannot distinguish the structure of different tissues in the pathological images and watershed segmentation only separates the whole image, but there is no gastric image segmentation method that is based on tissue structure. Moreover, the information collected is both good and bad, which cannot highlight the graphbased features of the cancer region. The foreground retained by the Otsu thresholding segmentation method includes not only the effective tissue but also the intercellular matrix, which seriously affects the quality of the minimum generated tree graph structure. Moreover, the foreground retained by the Otsu thresholding segmentation method includes not only the effective tissue but also the intercellular matrix, which seriously affects the quality of the minimum generated tree graph structure. It demonstrates that UNet method we used is able to segment the cancer region better, with smoother edges and clear subject regions, and retains less noise compared to other methods. In addition to the visual comparison, we also make evaluation metrics, the details of which are shown in [Table 1].
Medical images are characterized by simpler semantics, fixed structure, and less data volume. UNet segmentation uses Ushaped structure and skip connection to achieve more excellent performance, which can perfectly solve these problems and is very outstanding in the field of medical image segmentation. From [Table 1], except for specificity and accuracy, UNet performs better than other methods in other indicators and lighter than TransUNet.
Analysis of kmeans algorithm
At this stage of the paper, the pixel grayscale values of the images are clustered, and the kmeans algorithm with k = 3 is used as a benchmark for comparison. As shown in [Figure 6], we can observe that when k = 3, the nuclei in the histopathological images of gastric cancer are better expressed, and basically, all the nuclei on the tissue are labeled. When k = 4, a part of the stained gastric cancer tissue region is not labeled due to further clustering of the grayscale values. When k = 5, this becomes more obvious and the nuclei information is severely lost. Therefore, this paper selects the kmeans clustering algorithm with a better effect of k = 3 to extract the cell nuclei in the extraction stage.  Figure 6: Cell nuclei results with different k. (a) The upper right quarter area of the image. (b) The image of the nucleus with k = 3. (c) The image of the nucleus with k = 4. (d) The image of the nucleus with k = 5.
Click here to view 
Analysis of feature extraction methods
After segmentation, the graphbased features are used for feature extraction: kmeans is used to extract the cell nucleus and the MST algorithm is used to draw the graph, which is shown in [Figure 7].  Figure 7: Comparison of minimum spanning tree graph structures of six segmentation methods on the test set for two typical examples (a and b). GT: Ground Truth.
Click here to view 
By comparing the MST graph structures after the segmentation, the MST graph structure extracted from UNet segmented images could represent the real topological information structure of gastric cancer tissues more accurately. However, other segmentation methods have poor quality of their MST graph structures due to their own drawbacks.
Then, five feature extraction methods are compared in this paper, as shown in [Table 2]. First, UNet segmentation method is used for segmentation. After that, the MST, RGB features, HOG features, GLCM features, and LBP features are extracted and compared for the image.
Then, the RBF SVM classifier is selected for classification in the third step. Finally, by calculating the classification accuracy, it can be seen that the graphbased features have obvious advantages in the feature extraction stage and the classification accuracy reaches 94.29%.
Analysis of graphbased features
This study uses two features of the MST that can represent the topology of a graph: edge lengths and angles of the MST. Based on this, eight statistical features are extended, including the mean, variance, skewness, and kurtosis of the edge length and the mean, variance, skewness, and kurtosis of the angle.
As shown in [Figure 8], the first column shows the characteristic statistics of the edge length, which are mean, variance, skewness, and kurtosis. The second column represents the characteristic statistics of the angle, including mean, variance, skewness, and kurtosis. The horizontal coordinates of the statistical plot indicate the number of images in the experiment, 140 in total. The first 70 are normal gastric histopathological images while the last 70 are gastric cancer histopathological images. The vertical coordinates of the statistical plot indicate the statistical values.  Figure 8: Statistical features of graph structure. The horizontal coordinates of the statistical plot indicate the number of images in the experiment. The vertical coordinates of the statistical plot indicate the statistical values. (a) Statistical plot of the mean characteristics of the edge length. (b) Statistical plot of the mean characteristics of the angle. (c) Statistical plot of the variance characteristics of the edge length. (d) Statistical plot of the variance characteristics of the angle. (e) Statistical plot of the skewness characteristics of the edge length. (f) Statistical plot of the kurtosis characteristics of the angle. (g) Statistical plot of the skewness characteristics of the edge length. (h) Statistical plot of the kurtosis characteristics of the angle.
Click here to view 
Meanwhile, it can be seen that the mean values of edge lengths of normal and gastric cancer histopathological images are more stable, indicating that the size of tissue structure is more accurately described, and the size of structure of each histopathological image does not differ greatly. In terms of variance, the variance of edge length of normal images is significantly smaller than that of gastric cancer images, indicating that the edge length structure of normal images is more similar, while the edge length structure of gastric cancer images is of different lengths and full of irregular shapes of cancer. Moreover, the angle of normal images is similar to that of gastric cancer images, and their angle structures are similar. In terms of skewness, the edge lengths of the normal and gastric cancer images are also relatively close. It indicates that they have a similar degree of asymmetry relative to the mean, while the angles are significantly different and they have a greater degree of asymmetry relative to the mean. In terms of kurtosis, the edge lengths and angles of normal and gastric cancer images are similar, indicating that the steepness of their distribution patterns is similar.
Collectively, it can be seen that the topological information of histopathological images can be extracted more completely by using MST. The classification accuracy of this method is the highest among all the feature extraction methods, reaching 94.29%. Moreover, it fully illustrates the high performance of the graphbased feature extraction method on the histopathological images of gastric cancer.
Analysis of red, green, blue features
In this study, in the RGB feature extraction method, the histogram statistics of R channel, G channel, and B channel of each image are performed. As shown in [Figure 9], the horizontal coordinate of this histogram is the pixel value (the interval is from 0 to 255) and the vertical coordinate is the number of pixels for each pixel value. In the statistics of each channel, the background (the pixel value is 0) in UNet segmentation images is removed, and this region is the part that is segmented off during UNet segmentation process and cannot be counted into the RGB features as feature statistics.  Figure 9: Red, green, blue features histogram. The horizontal coordinate of this histogram is the pixel value (the interval is from 0 to 255) and the vertical coordinate is the number of pixels for each pixel value. (a) R channel. (b) G channel. (c) B channel.
Click here to view 
Analysis of histogram of oriented gradient features
In this study, feature vectors of dimension 2,340,900 are extracted for each UNet segmentation graph in the HOG feature extraction stage and then put into RBF SVM for classification. The HOG features use grayscale gradients to describe the local shape of the object, which marks the gradient orientation of the graph on the original graph. The advantage of HOG features is that they have better geometric invariance with optical undistorted due to the histogram of gradient orientations drawn in a small region. The disadvantage is that the noise immunity is poor. HOG features are better in detecting human body according to their characteristics, but the effect of feature extraction for histopathological images is not satisfactory. The classification accuracy of feature extraction is only 55%.
Considering that the feature dimensions extracted by different classifiers are different, the effect of different classifiers is also compared in HOG feature extraction, which is shown in [Table 3].  Table 3: The results of each metric obtained by classifying histogram of oriented gradient features using different classifiers.
Click here to view 
From the above analysis, it can be concluded that the feature extraction of images using HOG features is not very effective, and the highest accuracy is 61.54% using ANN.
Analysis of graylevel cooccurrence matrix features
GLCM is a matrix that represents the grayscale relationship between pixels at each location of the image, either adjacent pixels or pixels at a specified distance pixel. The work finds that among the 14 statistics derived from GLCM, only four statistics (homogeneity, correlation, contrast, and energy) are uncorrelated, and these four features are easy to compute and give high classification accuracy;^{[53]} six texture features are studied in detail and conclude that contrast and entropy are the most important features.^{[54]} Therefore, in the extraction of GLCM features, this article calculates four statistical attributes in the grayscale cooccurrence matrix: homogeneity, correlation, contrast, and entropy.
The main reason for the bad effect of the GLCM features is due to the fact that the image used is a UNet segmented image. After the image segmented by UNet, the pixel value of the background region is zero. This significantly affects the image information carried by the grayscale cooccurrence matrix, whose four statistical attributes are very distorted in describing the texture features of the image. The final classification accuracy using GLCM features is 53.57%.
Analysis of local binary pattern features
In the LBP features extraction stage, features are extracted for each UNet segmentation image using the LBP operator to form a grayscale image with the same resolution as the original image. From the LBP features extracted image, the LBP histogram is formed using the image grayscale, as in [Figure 10], the horizontal coordinate is the grayscale of the LBP image, and the vertical coordinate is the number of pixels per grayscale.  Figure 10: LBP histogram. The horizontal coordinate is the grayscale of the local binary pattern image, and the vertical coordinate is the number of pixels per grayscale. LBP: Local Binary Pattern.
Click here to view 
The advantages of LBP features are fast operation speed and rotation invariance and grayscale invariance. The LBP value of each pixel can reflect the texture relationship between the point and the surrounding pixels, and the texture information is better preserved. The disadvantage is that it is sensitive to the orientation information, and the gradient orientation in the image has a relatively large impact on the LBP value of the pixel. The purpose of this study is to classify gastric cancer tissues and normal tissues, and the LBP feature is extracted from the local texture information of the image to form the LBP image. The grayscale histogram is then drawn from the LBP image. Instead of passing down all the information of the tissue region, the histogram passes down the local texture features, which eventually leads to the loss of the feature information needed for the experiment. The classification accuracy using LBP features is 65.71%.
Analysis of classifier design
In the classifier design stage, as shown in [Table 3], this article compares five methods. First, the image is segmented by UNet and then MST are used to extract the features. At last, the classification accuracy of the five classifiers: RBF SVM, linear SVM, ANN, RF, and KNN are compared, respectively. In this paper, we select a sixlayer ANN network, a RF with 512 trees, and a KNN structure with k = 15.
The classification accuracy in [Table 4] shows that the graphbased features in this article have good robustness and the experimentally used classifiers are effective. By comparison, the most suitable for the structural features of the histopathological image of gastric cancer is the RBF SVM classifier.
Meanwhile, a comparison experiment of two types of deep learning classification is designed: VGG16 and InceptionV3. As shown in [Table 5], to compare the effect of different deep learning classification methods under the same experimental data, the data of the comparison experiment are 70 randomly selected from each of 140 gastric cancer images and 140 normal images, with a final training set and test set of 140 images each. To better compare the classifier performance, we achieve data augmentation by meshing the current images into patches (256 pixel × 256 pixel). After augmentation, 8960 training images and the same number of test set are obtained.
The advantage of VGG16 is that the framework is more concise, and the network uses the same volume of convolutional kernels and maxpooling and a combination of several small filter (3 × 3) convolutional layers instead of one large filter (5 × 5 or 7 × 7). The advantage of InceptionV3 is that it uses differentsized convolutional kernels within a layer, which improves perceptual power, and uses batch normalization, which mitigates gradient disappearance. These two deep learning networks are chosen for the classification comparison experiments in this experiment and after training. However, it is found that the classification results are not good, with VGG16, a classification accuracy of 75%, and InceptionV3, a classification accuracy of 50%. The main reason for this is that deep learning networks require a large amount of sample data. After augmentation, the accuracy of both VGG16 and InceptionV3 network models has been improved, which are 87.50% and 62.80%, respectively, but they are still lower than our proposed method.
Discussion   
For the different methods used for feature extraction, this article compares the different classification results of graphbased features with other features and draws separate confusion matrices to analyze them. The main evaluation metrics consist of the following five components: accuracy (ACC), precision (PPV), recall (TPR), specificity (TNR), and F1score. Moreover, evaluation metrics are calculated for each confusion matrix, the results of which are shown in [Table 6].
From the above confusion matrix and its evaluation index, we can see that in terms of classification accuracy, the graphbased features perform very well, with an accuracy rate of 94.29%. The other features methods are less effective, with RGB features and LBP features at 69.29% and 65.57%, respectively, and the worst are HOG features and GLCM features, which are even as low as 55% and 53.57%. In terms of accuracy, the graphbased features still perform the optimal, having reached 100%, followed by the RGB features at 76.47%. The accuracy of the LBP features is 68.97%, while the worst are the HOG features and GLCM features at 58.14% and 55.81%. In terms of recall (i.e., classification accuracy of normal images), the value of graphbased features is 88.57%, with eight normal images being misclassified as gastric cancer images. The values of RGB features and LBP features are 55.71% and 57.14%, and the worst are still the HOG features and GLCM features at 35.71% and 34.29%, respectively. In terms of specificity (accuracy of classification of gastric cancer images), the graphbased features work well and reach 100%, which means that no gastric cancer images are classified wrongly. This is followed by the RGB features at 82.86%. The value of HOG features is equal to the LBP features at 74.29%, and the GLCM features only reach 72.86%. In the part of Fscore, the graphbased feature method is still the optimal performer at 93.94%, followed by the RGB and LBP features at 64.46% and 62.50%, and the worst are still the HOG and GLCM features at 44.24% and 42.48%.
Through the above analysis and discussion, it can be concluded that the graphbased feature extraction method performs the optimal throughout the experiment, followed by RGB features and LBP features, and the lowest are HOG features and GLCM features. Meanwhile, by analyzing the results of the comparison experiments between the classification of normal images and the classification of gastric cancer images, it can be found that the method of graphbased features is poor in classifying normal images and easily misclassified normal images into gastric cancer images. However, it performs well in classifying gastric cancer images.
Conclusion   
Histopathological image analysis has been a popular research direction in the medical field and plays a crucial role in the future path of intelligent medicine. In the study of the topology of histopathological images of gastric cancer, graph theory is able to address the problems in this direction. The histopathological images of gastric cancer have a wide range of tissue structures and complex morphology, especially those in the cancer nest region. Moreover, it is difficult to extract the complete tissue information with conventional features to meet the experimental requirements.
In this paper, a graphbased feature microscopic image analysis method is proposed for gastric cancer histopathology, which expands on the classical digital image processing process and mainly includes the main steps of image segmentation, feature extraction, and classifier design. This analysis method mainly takes advantage of the features that the topological structure information of gastric cancer tissue regions is significantly different from normal tissues and uses graphbased features to collect this information and then classify it. By comparing the classification result metrics, this article again validates the advantages of graphbased features on histopathological images of gastric cancer. Furthermore, by comparing multiple image segmentation methods, multiple feature extraction methods, multiple classifiers, and experiments for deep learning, the optimal method can be selected: For histopathological images of gastric cancer, the image is first segmented using UNet, extracted by the graphbased features method, and finally, the RBF SVM classifier, which is optimal for nonlinear data processing, is selected for classification. The final experimental data show that our analysis method has an absolute advantage in classifying histopathological images of gastric cancer. Furthermore, the proposed graphbased features have a potential to work in other microscopic image analysis field, such as microorganism image analysis,^{[55],[56],[57]} cytopathological image analysis,^{[58],[59],[60],[61]} microscopic video analysis.^{[62],[63],[64],[65],[66]}
Acknowledgments
We thank Miss Zixian Li and Mr. Guoxian Li for their important discussion. We also thank B. E. Jiawei Zhang for his help in the experiments.
Financial support and sponsorship
This work is supported by National Natural Science Foundation of China (No. 61806047).
Conflicts of interest
There are no conflicts of interest.
References   
1.  Bugdayci G, Pehlivan MB, Basol M, Yis OM. Roles of the systemic inflammatory response biomarkers in the diagnosis of cancer patients with solid. Exp Biomed Res 2019;2:3743. 
2.  Elsheikh TM, Austin RM, Chhieng DF, Miller FS, Moriarty AT, Renshaw AA, et al. American Society of Cytopathology workload recommendations for automated Pap test screening: Developed by the productivity and quality assurance in the era of automated screening task force. Diagn Cytopathol 2013;41:1748. 
3.  Ai S, Li C, Li X, Wang Q, Li X. A StateoftheArt Review for Gastric Histopathology Feature Extraction Methods, In Proceedings of ISICDM 2020, ACM; 2021. p. 648. [Doi: 10.1145/3451421.3451436]. 
4.  Ai S, Li C, Li X, Jiang T, Grzegorzek M, Sun C, et al. A stateoftheart review for gastric histopathology image analysis approaches and future development. Biomed Res Int 2021;2021:6671417. 
5.  Li X, Li C, Rahaman MM, Sun H, Li X, Wu J, et al. A comprehensive review of computeraided wholeslide image analysis: From datasets to feature extraction, segmentation, classification, and detection approaches. Articial Intell Rev 2022;29:60939. 
6.  Zhou X, Li C, Rahaman MM, Yao Y, Ai S, Sun C, et al. A comprehensive review for breast histopathology image analysis using classical and deep neural networks. IEEE Access 2020;8:9093156. 
7.  Li Y, Li C, Li X, Wang K, Rahaman MM, Sun C, et al. A comprehensive review of Markov random field and conditional random field approaches in pathology image analysis. Arch Comput Methods Eng 2021;2021:131. 
8.  Li C, Chen H, Li X, Xu N, Hu Z, Xue D, et al. A review for cervical histopathology image analysis using machine vision approaches. Artificial Intell Rev 2020;53:482162. 
9.  Doi K. Current status and future potential of computeraided diagnosis in medical imaging. Br J Radiol 2005;78:S319. 
10.  Chen H, Li C, Li X, Rahaman MM, Hu W, Li Y, et al. ILMCAM: An interactive learning and multichannel attention mechanismbased weakly supervised colorectal histopathology image classification approach. Comput Biol Med 2022;143:105265. 
11.  Li Y, Wu X, Li C, Li X, Chen H, Sun C, et al. A hierarchical conditional random fieldbased attention mechanism approach for gastric histopathology image classication. Appl Intell 2022:122. [Doi: 10.1007/s10489021028862]. 
12.  Hu W, Li C, Li X, Rahaman MM, Ma J, Zhang Y, et al. GasHisSDB: A new gastric histopathology image dataset for computer aided diagnosis of gastric cancer. Comput Biol Med 2022;142:105207. 
13.  Sun C, Li C, Zhang J, Rahaman MM, Ai S, Chen H, et al. Gastric histopathology image segmentation using a hierarchical conditional random field. Biocybern Biomed Eng 2020;40:153555. 
14.  Sun C, Li C, Zhang J, Kulwa F, Li X. Hierarchical conditional random field model for multiobject segmentation in gastric histopathology images. Electron Lett 2020;56:7503. 
15.  Bengtsson E, Malm P. Screening for cervical cancer using automated analysis of PAPsmears. Comput Math Methods Med 2014;2014:842037. 
16.  Li C, Chen H, Zhang L, Xu N, Xue D, Hu Z, et al. Cervical histopathology image classification using multilayer hidden conditional random fields and weakly supervised learning. IEEE Access 2019;7:9037897. 
17.  Xue D, Zhou X, Li C, Yao Y, Rahaman MM, Zhang J, et al. An application of transfer learning and ensemble learning techniques for cervical histopathology image classification. IEEE Access 2020;8:10460318. 
18.  Li Y, Wu X, Li C, Sun C, Li X, Rahaman MM, et al. Intelligent gastric histopathology image classification using hierarchical conditional random field based attention mechanism. ICMLC 2021;2021:3305. 
19.  Sun C, Li C, Xu H, Zhang J, Ai S, Zhou X, et al. A Comparison of Segmentation Methods in Gastric Histopathology Images. Proceeding of ISICDM 2020, ACM; 2021. p. 759. [Doi: 10.1145/3451421.3451438]. 
20.  Jung H, Suloway C, Miao T, Edmondson EF, Lisle C. Integration of Deep Learning and Graph Theory for Analyzing Histopathology WholeSlide Images. 2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR); 2018. p. 15. [Doi: 10.1109/AIPR.2018.8707424]. 
21.  Keenan SJ, Diamond J, McCluggage WG, Bharucha H, Thompson D, Bartels PH, et al. An automated machine vision system for the histological grading of cervical intraepithelial neoplasia (CIN). J Pathol 2000;192:35162. 
22.  Guillaud M, Cox D, AdlerStorthz K, Malpica A, Staerkel G, Matisic J, et al. Exploratory analysis of quantitative histopathology of cervical intraepithelial neoplasia: Objectivity, reproducibility, malignancyassociated changes, and human papillomavirus. Cytometry A 2004;60:819. 
23.  Li C, Hu Z, Chen H, Ai S, Li X. Cervical Histopathology Image Clustering Using Graph Based Unsupervised Learning. Proceedings of the 11 ^{th} International Conference on Modelling, Identification and Control (ICMIC2019); 2020. p. 14152. [Doi: 10.1007/s4297902100469z]. 
24.  Rahaman MM, Li C, Yao Y, Kulwa F, Rahman MA, Wang Q, et al. Identification of COVID19 samples from chest XRay images using deep learning: A comparison of transfer learning approaches. J Xray Sci Technol 2020;28:82139. 
25.  Qu J, Hiruta N, Terai K, Nosato H, Murakawa M, Sakanashi H. Gastric pathology image classification using stepwise finetuning for deep neural networks. J Healthc Eng 2018;2018:8961781. 
26.  Quan Y, Lin P, Xu Y, Nan Y, Ji H. Nonblind image deblurring via deep learning in complex field. IEEE Trans Neural Netw Learn Syst 2021;PP:114. 
27.  Nguyen NQ, Lee SW. Robust boundary segmentation in medical images using a consecutive deep encoderdecoder network. IEEE Access 2019;7:33795808. 
28.  Hambarde P, Talbar S, Mahajan A, Chavan S, Sable N. Prostate lesion segmentation in MR images using radiomics based deeply supervised UNet. Biocyber Biomed Eng 2020;40:142135. 
29.  Tao S, Guo Y, Zhu C, Chen H, Zhang Y, Yang J, et al. Hybrid model enabling highly efficient follicular segmentation in thyroid cytopathological whole slide image. Intell Med 2021;1:709. 
30.  Korkmaz SA, Binol H. Classification of molecular structure images by using ANN, RF, LBP, HOG, and size reduction methods for early stomach cancer detection. J Mol Struct 2018;1156:25563. 
31.  Lim H, Kim DW. Pairwise dependencebased unsupervised feature selection. Pattern Recognit 2021;111:107663. 
32.  Iizuka O, Kanavati F, Kato K, Rambeau M, Arihiro K, Tsuneki M. Deep learning models for histopathological classification of gastric and colonic epithelial tumours. Sci Rep 2020;10:1504. 
33.  Ronneberger O, Fischer P, Brox T. UNet: Convolutional Networks for Biomedical Image Segmentation. International Conference on Medical Image Computing and ComputerAssisted Intervention. Springer, Cham; 2015. p. 23441. [Doi: 10.1007/9783319245744_28]. 
34.  Zhang J, Li C, Kosov S, Grzegorzek M, Shirahama K, Jiang T, et al. LCUNet: A novel lowcost UNet for environmental microorganism image segmentation. Pattern Recognit 2021;115:117. 
35.  Zhang J, Li C, Kulwa F, Zhao X, Sun C, Li Z, et al. A Multiscale CNNCRF framework for environmental microorganism image segmentation. Biomed Res Int 2020;2020:4621403. 
36.  Sanfilippo A. Graph theory. In: Encyclopedia of Language & Linguistics. 2 ^{nd} ed., Vol. 311. 2006. p. 1402. [Doi: 10.1016/B0080448542/01600X]. 
37.  Li YZ, Wen J. A novel fuzzy distancebased minimum spanning tree clustering algorithm for face detection. Cognit Comput 2022:112. [Doi: 10.1007/s1255902210002w]. 
38.  Perumal L. New approaches for Delaunay triangulation and optimisation. Heliyon 2019;5:e02319. 
39.  Yan DM, Bao G, Zhang X, Wonka P. Lowresolution remeshing using the localized restricted voronoi diagram. IEEE Trans Vis Comput Graph 2014;20:141827. 
40.  Li C, Hu Z, Chen H, Ai S, Li X. A cervical histopathology image clustering approach using graph based features. SN Comput Sci 2021;2:120. [Doi: 10.1007/s4297902100469z]. 
41.  CruzRoa A, Xu J, Madabhushi A. A Note on the Stability and Discriminability of Graph Based Features for Classification Problems in Digital Pathology. Tenth International Symposium on Medical Information Processing and Analysis (SIPAIM 2014). International Society for Optics and Photonics; 2014. [Doi: 10.1117/12.2085141]. 
42.  Boser BE, Guyon IM, Vapnik VN. A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory; 1992. p. 14452. [Doi: 10.1145/130385.130401]. 
43.  Wang SC. Artificial neural network. In: Interdisciplinary Computing in Java Programming. Boston, MA: Springer; 2003. p. 81100. [Doi: 10.1007/9781461503774]. 
44.  Peterson LE. Knearest neighbor. Scholarpedia 2009;4:1883. 
45.  Breiman L. Random forests. Mach Learn 2001;45:532. 
46.  Simonyan K, Zisserman A. Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv: 1409.1556; 2014. [Doi: 10.48550/arXdioiv. 1409.1556]. 
47.  Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the Inception Architecture for Computer Vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 281826. [Doi: 10.1109/CVPR.2016.308]. 
48.  Actor JA, Fuentes DT, Rivière B. Identification of kernels in a convolutional neural network: Connections between level set equation and deep learning for image segmentation. Proc SPIE Int Soc Opt Eng 2020;11313:1131317. 
49.  Fan H, Xie F, Li Y, Jiang Z, Liu J. Automatic segmentation of dermoscopy images using saliency combined with Otsu threshold. Comput Biol Med 2017;85:7585. 
50.  Hasan SM, Ahmad M, Ahmad M. Twostep verification of brain tumor segmentation using watershedmatching algorithm. Brain Inform 2018;5:8. 
51.  Xue L, Wang X, Yang Y, Zhao G, Han Y, Fu Z, et al. Segnet network algorithmbased ultrasound images in the diagnosis of gallbladder stones complicated with gallbladder carcinoma and the relationship between P16 expression with gallbladder carcinoma. J Healthc Eng 2021;2021:2819986. 
52.  Ying S, Wang B, Zhu H, Liu W, Huang F. Caries segmentation on tooth Xray images with a deep network. J Dent 2022;119:104076. 
53.  Ulaby FT, Kouyate F, Brisco B, Lee Williams TH. Textural information in SAR images. IEEE Trans Geosci Remote Sens 1986;24:23545. 
54.  Baraldi A, Panniggiani F. An investigation of the textural characteristics associated with gray level cooccurrence matrix statistical parameters. Geosci Remote Sens 1995;33:293304. 
55.  Li C, Shirahama K, Grzegorzek M. Application of contentbased image analysis to environmental microorganism classification. Biocybern Biomed Eng 2015;35:1021. 
56.  Kosov S, Shirahama K, Li C, Grzegorzek M. Environmental microorganism classification using conditional random fields and deep convolutional neural networks. Pattern Recognit 2018;77:24861. 
57.  Zhang J, Li C, Rahaman MM, Yao Y, Ma P, Zhang J, et al. A comprehensive review of image analysis methods for microorganism counting: From classical image processing to deep learning approaches. Artif Intell Rev 2022;55:2875944. 
58.  Li C, Huang X, Jiang T, Xu N. Fullautomatic computer aided system for stem cell clustering using contentbased microscopic image analysis. Biocybern Biomed Eng 2017;37:54058. 
59.  Rahaman MM, Li C, Wu X, Yao Y, Hu Z, Jiang T, et al. A survey for cervical cytopathology image analysis using deep learning. IEEE Access 2020;8:61687710. 
60.  Rahaman MM, Li C, Yao Y, Kulwa F, Wu X, Li X, et al. DeepCervix: A deep learningbased framework for the classification of cervical cells using hybrid deep feature fusion techniques. Comput Biol Med 2021;136:104649. 
61.  Liu W, Li C, Rahaman MM, Jiang T, Sun H, Wu X, et al. Is the aspect ratio of cells important in deep learning? A robust comparison of deep learning methods for multiscale cytopathology cell image classification: From convolutional neural networks to visual transformers. Comput Biol Med 2022;141:105026. 
62.  Shen M, Li C, Huang W, Szyszka P, Shirahama K, Grzegorzek M, et al. Interactive tracking of insect posture. Pattern Recognit 2015;48:356071. 
63.  Chen A, Li C, Zou S, Rahaman MM, Yao Y, Chen H, et al. SVIA dataset: A new dataset of microscopic videos and images for computeraided sperm analysis. Biocybern Biomed Eng 2022;42:20414. 
64.  Li X, Li C, Zhao W, Gu Y, Li J, Xu P. Comparison of Visual Feature Extraction Methods of Sperms in Semen Microscopic Videos. Proceedings of ISICDM 2020, ACM; 2021. p. 20612. [Doi: 10.1145/3451421.3451465]. 
65.  Zhao W, Zou S, Li C, Li J, Zhang J, Ma P, et al. A Survey of Sperm Detection Techniques in Microscopic Videos. Proceedings of ISICDM 2020, ACM; 2021. p. 21924. [Doi: 10.1145/3451421.3451467]. 
66.  Li X, Li C, Kulwa F, Rahaman MM, Zhao W, Wang X, et al. Foldover features for dynamic object behaviour description in microscopic videos. IEEE Access 2020;8:11451940. 
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6], [Figure 7], [Figure 8], [Figure 9], [Figure 10]
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5], [Table 6]
