A Comparison of Hadoop, Spark and Storm for the Task of Large Scale Image Classification

Yükleniyor...
Küçük Resim

Tarih

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

IEEE

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

An image-based retrieval system (IRS) finds the most relevant images of a query among all the images in the database. When the size of the database exceeds the storage capacity of a single machine, conventional systems are not adequate. Hadoop provides a solution by distributing data and processing over any available commodity hardware. It works well for batch processing and when low-latency is not required. When online processing and low-latency is needed Spark and Storm offer solutions. In this paper, we perform two comparisons regarding Hadoop, Spark and Storm frameworks. In the first one, Hadoop MapReduce (M/R) and Spark are analysed for the image indexing task. The results show that Hadoop MapReduce (M/R) performs better than Spark in case we have no iterative operations on data (e.g., indexing) since no intermediate disk writes are needed. On the other hand, Spark performs better when it comes to iterative operations (e.g., Word Count). In the second comparison, Spark and Storm are compared for the task of classification. Storm yields better latency than Spark while both methods are quite stable as the number of queries increases. This analysis could be useful for researchers and developers of distributed image processing systems.

Açıklama

26th IEEE Signal Processing and Communications Applications Conference (SIU) -- MAY 02-05, 2018 -- Izmir, TURKEY

Anahtar Kelimeler

Big Data, Hadoop MapReduce, Spark, Storm, HBase, Image-based Retrieval System

Kaynak

2018 26th Signal Processing and Communications Applications Conference (Siu)

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Onay

İnceleme

Ekleyen

Referans Veren