Digital Medicine

: 2022  |  Volume : 8  |  Issue : 1  |  Page : 11-

Super-resolution reconstruction of magnetic resonance images based on multi-scale feature extraction Super-Resolution Convolution Neural Network

Rui Feng1, XiuHan Li1, Wei Wang1, JunXiao Yu1, Da Cao2, YiShuo Li1, XiaoLing Wu1,  
1 Department of Biomedical Engineering, Nanjing Medical University, Nanjing, Jiangsu, China
2 Department of Biomedical Engineering, Nanjing Medical University; Department of Radiology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu, China

Correspondence Address:
Wei Wang
Department of Biomedical Engineering, Nanjing Medical University, Nanjing 210000


Background: Low-resolution magnetic resonance imaging (MRI) has high imaging speed, but the image details cannot meet the needs of clinical diagnosis. More and more researchers are interested in neural network-based reconstruction methods. How to effectively process the super-resolution reconstruction of the low-resolution images has become highly valuable in clinical applications. Methods: We introduced Super-Resolution Convolution Neural Network (SRCNN) into the reconstruction of magnetic resonance images. The SRCNN consists of three layers, the image feature extraction layer, the nonlinear mapping layer, and the reconstruction layer. For the feature extraction layer, a multi-scale feature extraction (MFE) method was used to extract the features in different scales by involving three different levels of views, which is superior to the original feature extraction in views with fixed size. Compared with the original feature extraction only in fixed size views, we used three different levels of views to extract the features of different scales. This MFE could also be combined with residual learning to improve the performance of MRI super-resolution reconstruction. The proposed network is an end-to-end architecture. Therefore, no manual intervention or multi-stage calculation is required in practical applications. The structure of the network is extremely simple by omitting the fully connected layers and the pooling layers from traditional Convolution Neural Network. Results and Conclusions: After comparative experiments,the effectiveness of the MFE SRCNN-based network in super-resolution reconstruction of MR images has been greatly improved. The performance is significantly improved in terms of evaluation indexes peak signal-to-noise ratio and structural similarity index measure, and the detail recovery of images is also improved.

How to cite this article:
Feng R, Li X, Wang W, Yu J, Cao D, Li Y, Wu X. Super-resolution reconstruction of magnetic resonance images based on multi-scale feature extraction Super-Resolution Convolution Neural Network.Digit Med 2022;8:11-11

How to cite this URL:
Feng R, Li X, Wang W, Yu J, Cao D, Li Y, Wu X. Super-resolution reconstruction of magnetic resonance images based on multi-scale feature extraction Super-Resolution Convolution Neural Network. Digit Med [serial online] 2022 [cited 2022 May 20 ];8:11-11
Available from:

Full Text


Magnetic resonance imaging (MRI) has been widely used in various clinical applications. Despite its powerful functions, the MRI duration is relatively long, which is perceived as less comfortable by patients. One of the compromise methods to speed up the imaging process is to reduce the image quality, using image postprocessing techniques to enhance MR image quality. For example, image super-resolution reconstruction is used to improve the resolution of MR images.[1],[2],[3] In 2013, Shi et al. combined the K-Singular Value Decomposition (K-SVD) dictionary algorithm with the idea that high- and low-resolution dictionaries can be generated together to achieve high-resolution image reconstruction and the generated low-resolution dictionary is used in the super-resolution reconstruction algorithm for the sparse representation.[4] In 2014, Yang et al. designed a new neighbor embedding (NE) framework for face prior learning and depth map reconstruction. The results demonstrated the superiority of their method as compared to state-of-the-art depth map super-resolution techniques on both synthetic data and real-world data from Kinect.[5] In 2016, Xin et al. proposed a novel nonlocal feature back-projection method for image SR, which effectively reduce jaggy and ringing artifacts common, in general, iterative back-projection (IBP) method.[6]

In recent years, with the rapid development of Convolution Neural Network (CNN),[7] super-resolution reconstruction algorithms based on deep learning have been successfully applied to natural images,[8],[9] and these methods have been gradually extended to the field of medical imaging by researchers.[10],[11] In 2016, Dong et al. proposed a three-layer Super-Resolution CNN (SRCNN) architecture, which consists of a feature mapping layer, a nonlinear mapping layer, and a reconstruction layer.[12],[13] In 2017, Ledig et al. presented SRGAN, a generative adversarial network for image super resolution, it is the first framework capable of inferring photo-realistic natural images for 4× upscaling factors.[14] In 2019, Pham et al. studied three-dimensional deep-convolutional neural networks for the super-resolution of brain MRI data. Results tended to demonstrate the potential of deep neural networks with respect to practical medical image applications.[15]

In this manuscript, we attempted to introduce SRCNN into the super-resolution reconstruction of MR images and explore the application prospects of the neural network into high-definition reconstruction of MR images.[16] We also introduced a multi-scale feature extraction method based on SRCNN. Compared with the original feature extraction only in fixed size views, we used three different levels of views to extract MR images features in different scales. The multi-scale feature extraction (MFE) method could also be combined with residual learning to improve the performance of MR images reconstruction. This super-resolution network with MFE (MFE-SR) shows superiority in its simple structure, fast training speed, and less training samples during image reconstruction.


Super-Resolution Convolution Neural Network for super-resolution magnetic resonance imaging reconstruction

SRCNN is regarded as the first super-resolution reconstruction method that uses the convolutional neural network structure. Its model architecture can be divided into three layers: the image feature extraction layer, the nonlinear mapping layer, and the reconstruction layer. Its structure is very simple or even omitted the fully connected layers and the pooling layers from traditional CNN.[17],[18] SRCNN is an end-to-end super-resolution method, so there is no need for manual intervention or multi-stage calculation in practical application, the specific CNN structure for super-resolution MRI reconstruction is shown in [Figure 1].{Figure 1}

In the feature extraction step, first, a convolution operation is used to extract the multiple image patches from a low-resolution image. Each patch is displayed as a multi-dimensional vector known as feature vector and together forms the feature matrix. This is equivalent to convolving the image with a set of filters. Then, a Rectified Linear Unit (ReLU) is applied. The procedure can be fully described by the following formula:


where C1 and B1 represent filters and biases, m are the downsampling MR images. The function Max() indicates that the ReLU operation is performed.

In the nonlinear mapping step, convolution and ReLU operations are also included. The purpose of this step is to map the feature space of n1 dimension to the space of n2 dimension:


where the size of C2 is n1*n2, and B2 is a vector of n2 dimensions. Each n2 dimensional output vector represents a high-resolution MR image patch for reconstruction. In order to improve the nonlinearity, more convolutional layers are added. The undesirable effect results from massive computational procedures and its complexity, which is relatively time consuming.

In the reconstruction step, an operation similar to deconvolution is introduced, and the purpose is to generate the final high-resolution image from the feature matrix. The equation is as followed:


where the size of C3 is n2 × n3, B3 is a n3 dimensional vector.

In the above description of the SRCNN network structure, the number of convolutional kernels n is 64, 32, and 1, respectively, and the size of convolutional kernels f is 9 × 9, 1 × 1, 5 × 5, respectively. Although the functional purposes of the above three steps are different, their essence is to perform convolution operations. These three steps together form a convolutional neural network. In this network structure, all filtering weights and deviations must be optimized.

Optimization of Super-Resolution Convolution Neural Network

From the previous discussion on SRCNN, we can know that the feature extraction layer of the standard SRCNN network has only one 3 × 3 feature scale, also known as the smallest. However, 3 × 3 field of view (FOV) is not comprehensive enough for the whole feature extraction of the MRI. Therefore, filters with larger convolutional size such as 5 × 5 and 7 × 7 are involved for multi-scale features extraction. Therefore, it is necessary to introduce larger convolution filters, such as 5 × 5 and 7 × 7 convolution filters.

In 2014, Simonyan and Zisserman found increasing depth using an architecture with very small (3 × 3) convolution filters could achieve the same effect of that from convolution filters with less hyperparameters.[19] Therefore, we replaced one 5 × 5 filter with two 3 × 3 filters and one 7 × 7 filters with three 3 × 3 filters. This procedure was named as the multi-scale feature extraction. We also added a residual learning to the network to reduce network degradation during the training process.[20]

The optimized network structure is shown in [Figure 2].{Figure 2}

Training and testing

The computer we used was equipped with a 32 GB RAM and a 4-core processor (Intel® Core™ i7-6700K CPU at 4.00GHz). We implemented our model with the Matlab and train them using NVIDIA GeForce GTX1080 graphics processors.

The data for experiments are from the fastMRI dataset (, a publicly available database created by the Department of Radiology at NYU School of Medicine and the Center for Advanced Imaging Innovation and Research (CAI2 R) at NYU Langone Health. We selected 91 images from a raw dataset containing 7002 brain images of them and resized these images to keep the same resolution. We divide the training set, validation set, and test set according to 0.7:0.15:0.15.

In the setting of convolution filters, the size of the convolution kernel in the feature extraction step is 9 × 9, the size in the nonlinear mapping step is 1 × 1, and the size of in the reconstruction step is 5 × 5. The batch size is set to 16, the learning rate is set to 0.001, with Adam optimization. Unlike the classic deep CNN which contains a large number of parameters, such as visual geometry group network (VGG),[19] residual network (ResNet),[21] and dense convolutional network (DenseNet),[22] SRCNN only has six parameters (C1, C2, C3, B1, B2 and B3) that need to be learned.[23]

The loss was calculated by mean squared error (MSE) and only needs the gap between the output F3(m) of the network and the ground truth MR images. The calculation method is as follows:


where n is the number of training samples and X is the ground truth MR images. The MSE loss is selected to obtain a high peak signal-to-noise ratio (PSNR), and PSNR is a commonly used as evaluation index for image high-resolution algorithms.


First, we compared the influence of training times of SRCNN network on MR images reconstruction. We compared the reconstruction effects of Bicubic interpolation, 150k iterations and 750k iterations, the results for the brain and abdomen images are as follows.

[Figure 3] and [Figure 4] show that the MR images reconstructed by SRCNN are better than Bicubic interpolation in terms of overall clarity and image details. Moreover, the higher the number of iterations of training, the better the reconstruction performance.{Figure 3}{Figure 4}

Except brain and abdomen, we did a comparative experiment on other three different parts of the body, leg, shoulder, and foot. We quantitatively compared the reconstruction quality of these five parts with PSNR and structural similarity index measure (SSIM). The results are as follows.

As shown in [Table 1], the MR images reconstructed by SRCNN are greatly improved in PSNR and SSIM compared with by bicubic interpolation. In addition, it is easy to see that when the resolution of the initial input images itself is high, the reconstruction effect with SRCNN is better, and the improvement of PSNR can even reach more than 2 dB. For example, the reconstruction effect of foot MR images shown in [Table 1] is significantly better than that of abdomen and brain MRI images.{Table 1}

In order to verify the performance of SRCNN in medical image super-resolution reconstruction, we make a comparison experiment with SRCNN using the classical algorithm. Under comprehensive consideration, K-SVD, NE in the field of machine learning and IBP are selected as the comparison methods. In this way, a total of five methods are compared, as shown in [Table 2], the maximum number of iterations of SRCNN is set to 750k.{Table 2}

From [Table 2], we can see that SRCNN as a deep-learning method has obvious advantages over other nondeep-learning methods. It achieved the best results in terms of reconstruction quality of the brain, shoulder, and foot. Although K-SVD is better than SRCNN in the reconstruction performance of the leg and abdomen images, it performs poorly in the reconstruction of shoulder and foot images, so the overall effect is not as good as SRCNN.

In addition, SRCNN for MR images reconstruction provides more than 1 dB improvement in the evaluation index PSNR, and even more than 2 dB for some images. Other methods only show good reconstruction performance in some specific images, with less robustness. Therefore, SRCNN not only has a broader range of applications but also has a more stable performance. In conclusion, SRCNN is the best performer among the four algorithms and has significant advantages for super-resolution reconstruction of MR images.

Next, we compared MFE-SR (proposed) with standard SRCNN and VDSR[14] in super-resolution reconstruction of MR images. The specific MFE-SR network parameters are shown in [Figure 2]. The training parameters are the same as the previous SRCNN. The comparison results are shown in [Figure 5].{Figure 5}

In order to clearly compare the difference between the three methods, we also made an error maps comparison, as shown in [Figure 6].{Figure 6}

From [Figure 5], we can find that the reconstructed image using MFE-SR has better edge contour information than SRCNN. From [Figure 6], the color of the error images using MFE-SR is lighter, and it means the reconstruction error is smaller. It is not difficult to find that both the perception of human vision and the value of evaluation index show that our proposed (MFE-SR) method has better reconstruction effect.

Then, we also made a quantitative comparison, the results show the reconstruction of MR images in the five parts of the human body (brain, leg, abdomen, shoulder, and foot). Our proposed MFE-SR network is much better than the original SRCNN in terms of MR reconstruction with the advantage of 0.64db higher PSNR value. In [Figure 5], the MR images reconstructed by our network are richer in detail information.

[Table 3] shows the quantitative comparison of SRCNN, VDSR, and our proposed network MFE-SR, where VDSR is a novel network based on residual learning. It is not difficult to find that our network has better results on the evaluation metrics PSNR and SSIM, which should also confirm the superior performance of our proposed network in the reconstruction of MR image details.{Table 3}

 Discussion and Conclusions

In order to improve the speed of MRI, this manuscript introduced SRCNN into the reconstruction of MR images. In addition, we modified the original network by adding residual learning and multi-scale feature learning, inspired by the current popular network.

The use of smaller convolutional kernels is the part of the current trend to reduce the parameters while maintaining network accuracy. In VGG 16, two 3 × 3 convolutional kernels are used instead of 5 × 5. The main purpose of this is to reduce the parameters while ensuring the same perceptual field.

Low-level features have higher resolution and contain more location and detailed information, but they are less semantic and noisier due to less convolution undergone. In the U-net network, it is easy to find that the convolutional neural network has a significant advantage for feature extraction of images, and similar to the coding and decoding mode of the network, the detailed features of the network can be well captured, the FOV felt by different sizes of convolutional kernels are different, and the ability to extract features varies; therefore, the multi-scale feature extraction that we combine can better capture the image features. High-level features have stronger semantic information but have quiet low resolution and poor perception of details. This is the reason why we use multi-scale feature extraction.

Residual networks were introduced by He et al.[20] in 2015, by which the efficiency of the network is significantly increased or at least maintain the same. In this way, the network performance is greatly improved by creating a residual block in the overall architecture.

In our experiments, we also found some problems, such as the poor reconstruction effect of VDSR. We tried to analyze the reasons for this, there may be too few training set settings, resulting in the weak generalization ability of the network, and second, compared with a shallow network like SRCNN, the structure of VDSR is much deeper and the training cost is much higher. By increasing the number of iterations, the network will be much better for MR super-resolution reconstruction.

The proposed approach, MFE-SR, learns an end-to-end mapping between low-and high-resolution images, showing advantages of simplicity and robustness. Through a series of comparative experiments, we obtained the following four important conclusions:

The SRCNN network was successfully introduced in the super-resolution reconstruction of MR images.The effectiveness of deep learning in super-resolution reconstruction of medical is remarkably improved.The appropriate inclusion of residual learning and multi-feature learning is effective in improving the network.Compared with other types of super-resolution reconstruction algorithms, methods based on deep learning have better reconstruction performance and higher robustness.

Financial support and sponsorship

This study was financially supported by the National Key Research and Development Program of China (2017YFB1303203), Jiangsu Provincial Key Research and Development Program (BE2020714), and Postgraduate Research and Practice Innovation Program of Jiangsu Province (KYCX21_1556).

Conflicts of interest

There are no conflicts of interest.


1Chao D, Chen CL, He K, Tang XO. Learning a Deep Convolutional Network for Image Super-Resolution . Zurich. European Conference on Computer Vision (ECCV), 2014.
2Wang X, Yu K, Wu S, Gu J, Liu Y, Dong C, et al. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. Munich. European Conference on Computer Vision, 2018.
3Wang S, Su Z, Ying L, Peng X, Zhu S, Liang F, et al. Accelerating Magnetic Resonance Imaging via Deep Learning. Prague. IEEE International Symposium on Biomedical Imaging; 2016.
4Shi J, Wang X. Image super-resolution reconstruction based on improved K-SVD dictionary-learning (In Chinese). Acta Electronica Sin 2013;41:997-1000.
5Yang S, Liu J, Fang Y, Guo Z. Joint-Feature Guided Depth Map Super-Resolution with Face Priors. IEEE Trans Cybern; 2018;48:399-411.
6Zhang X, Liu Q, Li X, Zhou Y, Zhang C. Non-local feature back-projection for image super-resolution. IET Image Process 2016;10:398-408.
7Li C, Su J, Yu L, Wang L. A variational level set method image segmentation model with application to intensity inhomogene magnetic resonance imaging. Digit Med 2018;4:5.
8Wang S, Cheng H, Ying L, Xiao T, Ke Z, Zheng H, et al. DeepcomplexMRI: Exploiting Deep Residual Network for Fast Parallel MR Imaging with Complex Convolution. Magn Reson Imaging 2020;68:136-47.
9Cho SJ, Choi YJ, Chung SR, Lee JH, Baek JH. High-resolution MRI using compressed sensing-sensitivity encoding (CS-SENSE) for patients with suspected neurovascular compression syndrome: Comparison with the conventional SENSE parallel acquisition technique. Clin Radiol 2019;74:817.e9-14.
10Huang C, Lei D, Li Z. Active contour model for medical sequence image segmentation based on spatial similarity. Digit Med 2019;5:85-9.
11Cheng J, Yin H, Jiang L, Zheng J, Wei S. Local Gauss multiplicative components method for brain magnetic resonance image segmentation. Digit Med 2019;5:68.
12Chao D, Chen CL, Tang X. Accelerating the Super-Resolution Convolutional Neural Network. Amsterdam: European Conference on Computer Vision [Z]; 2016.
13Dong C, Loy CC, He K, Tang X. Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 2016;38:295-307.
14Ledig C, Theis L, Huszar F, Caballero J, Cunningham A, Acosta A, et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network: IEEE Computer Society [Z]; 2016.
15Pham CH, Fablet R, Rousseau F. Multi-Scale Brain MRI Super-Resolution Using Deep 3D Convolutional Networks; 2017.
16Vijayakumar C, Gharpure DC. Development of image-processing software for automatic segmentation of brain tumors in MR images. J Med Phys 2011;36:147-58.
17Roska T, Chua LO. The CNN Universal Machine: An Analogic Array Computer. IEEE Transactions on Circuits & Systems II Analog & Digital Signal Processing 2015;40:163-73.
18Zhao X, Zhang Y, Zhang T, Zou X. Channel Splitting Network for Single MR Image Super-Resolution. IEEE Trans Image Process 2019 ; 28:5649-62.
19Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Comput Sci 2014;arXiv preprint arXiv:1409.1556.
20He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. Las Vegas: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016:770-8.
21Korfiatis P, Kline TL, Lachance DH, Parney IF, Buckner JC, Erickson BJ. Residual deep convolutional neural network predicts MGMT methylation status. J Digit Imaging 2017;30:622-8.
22Tao Y, Xu M, Lu Z, Zhong Y. DenseNet-based depth-width double reinforced deep learning neural network for high-resolution remote sensing image per-pixel classification. Remote Sens 2018;10:779.
23Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, et al. Caffe: Convolutional Architecture for Fast Feature Embedding. ACM international conference on Multimedia; 2014:675-8.