Performance Evaluation of Low-Precision Quantized LeNet and ConvNet Neural Networks

dc.contributor.authorTatar, Güner
dc.contributor.authorBayar, Salih
dc.contributor.authorÇiÇek, İhsan
dc.date.accessioned2025-10-29T12:08:21Z
dc.date.issued2022
dc.departmentFakülteler, Mühendislik Fakültesi, Elektronik Mühendisliği Bölümü
dc.description16th International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2022 -- Biarritz -- 182947
dc.description.abstractLow-precision neural network models are crucial for reducing the memory footprint and computational density. However, existing methods must have an average of 32-bit floating-point (FP32) arithmetic to maintain the accuracy. Floating-point numbers need grave memory requirements in convolutional and deep neural network models. Also, large bit-widths cause too much computational density in hardware architectures. Moreover, existing models must evolve into deeper network models with millions or billions of parameters to solve today's problems. The large number of model parameters increase the computational complexity and cause memory allocation problems, hence existing hardware accelerators become insufficient to address these problems. In applications where accuracy can be traded-off for the sake of hardware complexity, quantization of models enable the use of limited hardware resources to implement neural networks. From hardware design point of view, quantized models are more advantageous in terms of speed, memory and power consumption than using FP32. In this study, we compared the training and testing accuracy of the quantized LeNet and our own ConvNet neural network models at different epochs. We quantized the models using low precision int-4, int-8 and int-16. As a result of the tests, we observed that the LeNet model could only reach 63.59% test accuracy at 400 epochs with int-16. On the other hand, the ConvNet model achieved a test accuracy of 76.78% at only 40 epochs with low precision int-8 quantization. © 2022 Elsevier B.V., All rights reserved.
dc.description.sponsorshipThe IEEE Systems, Man, and Cybernetics Society (SMC)
dc.identifier.doi10.1109/INISTA55318.2022.9894261
dc.identifier.isbn9781665498104
dc.identifier.scopus2-s2.0-85139597429
dc.identifier.scopusqualityN/A
dc.identifier.urihttps://doi.org/10.1109/INISTA55318.2022.9894261
dc.identifier.urihttps://hdl.handle.net/20.500.14854/14440
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_Scopus_20251020
dc.subjectConvNet
dc.subjectConvolutional neural networks
dc.subjectFixed point arithmetic
dc.subjectFloating point arithmetic
dc.subjectFPGA
dc.subjectHardware accelerators
dc.subjectLeNet
dc.subjectQuantized neural networks
dc.titlePerformance Evaluation of Low-Precision Quantized LeNet and ConvNet Neural Networks
dc.typeConference Object

Dosyalar