NoSQL integrates with Hadoop

Apache Hadoop is an open source big data processing platform. It has its own eco-system products to support various needs. Desberdinak big datuak produktuak / plataformen Hadoop eta NoSQL integratu ahal plataforma bat sartu orain errendimendu hobea eta egiaren iturri bakar bat ematen du. Demagun NoSQL eta Hadoop elkarrekin nola lan egin ahal big datuak erronkak begirada bat izan digu.

Noizbait jendeak askotan nahastu Hadoop datu base bat da lotutako biltegiratze sistema bat baitu. Baina utzi argiro ulertu digu Apache Hadoop ez da datu-basea dionik.

Apache Hadoop Kode irekiko datuak big plataforma da honako osagai nagusi ditu.

HDFS: fitxategi-sistema bat bezala ezagutzen Hadoop Distributed File System (HDFS)
MapReduce: A banatuko programazio marko MapReduce bezala ezagutzen
Hadoop Common: liburutegiak eta utilitateak dauka lotutako Hadoop moduluak onartzen.
Hadoop YARN: Hau deitzen da 'Hala Another Baliabideen Negoziatzaile'. It is basically the resource management platform for managing computing resources and scheduling tasks.

Hadoop also has other host of software packages to support the eco-system components. The framework supports the processing of data intensive distributed applications. It enables applications to work in a distributed environment consists of thousands of nodes and petabytes of data. The nodes are independent computers, also known as low cost commodity hardware. Hadoop cluster means a group of computational units (basically machines) running in a general environment with Hadoop distributed file system (HDFS) to support scaling.

The fundamental design goal of Hadoop was to overcome the hardware failure. Because hardware failures are very common and the framework should be able to overcome it automatically. The Hadoop ekosistema Helburu hori lortzen modulu guztiek.

Hadoop plataforma ezaugarri Gakoa banatutako biltegiratze eta banatzen prozesatzeko esparru bat dira. banatuko biltegia (HDFS) fitxategi handiak zatitzen bloke txikitan (lehenetsi 64MB da) eta banatu du multzoek nodo zehar. Banatuko prozesatzeko esparru 'MapReduce bezala ere ezaguna’ prozesatzeko paralelo oso eraginkorra onartzen. MapReduce ezaugarria gakoa dela, ontziak Kodea (bertan prozesatzeko egingo) nodoaren non datuak bizi den. Hau ere deitzen da 'data herria', non datuak haren jatorrian geratzen eta kodea bertara dator izapidetzea egiteko. Hau da, hain zuzen paralelo prozesatzeko domeinu iraultza.

Hadoop osagai nagusiak (HDFS eta MapReduce) eratorritako Google-en gordeko ditu (GFS) and Google’s MapReduce. Apart from the above components, Hadoop consists of a number of related projects like Apache Hive, Apache HBase, and Apache Pig etc.

On the other hand, NoSQL (interpreted as ‘Not only SQL’) is a non-relational database management system. It is identified by the non-adherence to the relational database model. NoSQL databases are not primarily based on tables.

NoSQL database technology provides efficient mechanism for storage and retrieval of data, but it is not similar as relational model. The main design goals of NoSQL databases are simple design, horizontal scaling and better availability. The name interprets as ‘Not only SQL’, so it supports some SQL like query languages like HQL etc. NoSQL databases are mostly used in big data and analytic applications.

So, in short we can define Hadoop and NoSQL as follows

Hadoop: Distributed computing framework.
NoSQL: Non-relational database.

How Hadoop and NoSQL can work together?

gainetik eztabaida aurrera, it is clear that Hadoop and NoSQL is not the same thing, but they are both related to data intensive calculation. Hadoop framework is mostly used for processing huge amount of data (also known as big data) and NoSQL is designed for efficient storing and retrieval of large volume of data. So there is always a chance to have NoSQL as a part of Hadoop implementation. In most of the cases, the processed data from Hadoop system is stored in a NoSQL database. But they can always have independent use cases which may not need a support of both the platforms. For example, if we need only the parallel processing of big data and storing it in HDFS, then may be Hadoop alone is sufficient. Similarly, only for storing and retrieval of unstructured data, any NoSQL database and its associated query language can meet the requirement.

So the integration of NoSQL with Hadoop is always a preferred environment for large scale parallel processing and real time data access. Different Hadoop based products provide the integration of Hadoop and NoSQL in one platform. And this ‘in-Hadoop’ NoSQL database provides real time, operational analytics capabilities. Hadoop products including Apache Hadoop is the best fit for business critical production deployments. These products do not require any additional administrative tasks for the NoSQL data. The integrated platform (NoSQL and Apache Hadoop) supports high performance, extreme scalability, high availability, snapshots, disaster recovery, integrated security and many more, suitable for any production ready operational analytics.

So we can conclude that Apache Hadoop and NoSQL are not the same technology platform, but they are always recommended as an integrated environment suitable for big data solutions.

Share on Facebook

Save

TechAlpine – All About Technology

www.techalpine.com

How NoSQL integrates with Hadoop eco-system?

Enjoy this blog? Please spread the word :)