The Open Data Platform (ODP) is an industry initiative focused on simplifying the adoption of Apache Hadoop by the Enterprise and enabling Big Data solutions to thrive with better ecosystem interoperability.
It builds on the strengths of the Apache Hadoop framework. Obviously, the proponents of the ODP claim that it is going to bring a lot of benefits to those who embrace it, but not everyone is convinced. There appears to be a lot of confusion between choosing ODP and Apache Hadoop as if they were totally different technologies or concepts. It is still early days for ODP and it is going to be interesting to see how the industry embraces (or does not) the ODP.
The Game Changer: Positive side
The big forces beckoning the OPD initiative are the major players–GE, Hortonworks, IBM, Infosys, Pivotal, SAS, AltiScale, Capgemini, CenturyLink, EMC, Teradata, Splunk, Verizon and VMware and few more. The core objective is to leverage open source and open collaboration to further accelerate Apache Hadoop and step up big data to the next level.
The initiative is indeed a game changer as it addresses the needs of not only the vendors but also the end users. Needless to say, it is pretty much aligned with the Apache Software Foundation (ASF) process as it leverages the contributions made to the Apache projects and enhances it further. The ODP has provided the open platform to engage the diverse community as a whole.
With the interlock with leading vendors, service providers and users of Apache Hadoop, the OPD aims to reduce fragmentation and accelerate developments across the Apache Hadoop ecosystem.
The intent of ODP is to work directly with specific Apache projects, keeping in view the Apache Software Foundation guidelines on how to contribute ideas and code. The objective is to enhance compatibility and standardize the way the apps or tools run on any compliant system.
The other interesting aspect is how you standardize the deployment of solutions built on Hadoop or other big-data technology.
The main focus areas around which ODP is working towards:
- Develop an open source ecosystem for Big Data
- Act as catalyst for Hadoop and big data adoption
- Standardize the Apache Hadoop ecosystem
- Standardize the deployment mode for applications
- Adopt the best big data and analytical software to support data-driven applications
Following benefits can be used with the ODP:
- Reduced R&D costs for vendors and solution providers
- Improved interoperability
- Standardize base for future Hadoop distributions
Must read – Big Data Success in the Cloud Platform
Negative Buzz in the Market: Flip Side
While other players in the market see ODP differently. According to these players, the ODP is:
- Redundant with Apache Software Foundation
The Apache Software Foundation has led to Hadoop standard, which is ubiquitous by nature, in which applications are interoperable among Hadoop distributions. The Hadoop has become the de facto standard used extensively across the Industry. So, the question arises is what would ODP provide?
- Lacks participation by Hadoop Leaders
Few Hadoop players, such as MapR, Amazon Web Services and Cloudera, are not even participating in this initiative.
- Interoperability and Vendor lock-in is not an Issue
According to Gartner survey, only few companies feel that interoperability and vendor lock-in is really an issue. Furthermore, the project and sub-project interoperability is guaranteed by both free and paid distributions. So, that’s not the area ODP should spend its effort and time.
- Questions on Governance
Few questions are raised on the governance model as the equal voting rights are not provided to the leading Hadoop distributions. The governance model is not yet disclosed.
- Not Truly Open
With Hortonworks as a partner, the ODP is establishing an open data platform on a single vendor’s packaging. This casts some doubt on the “Open” mode itself.
Also read – Measuring the ROI in Hadoop adoption
What is Open Data Platform?
The core components of the ODP include Hadoop Distributed File System (HDFS), cluster management technology YARN, and Hadoop management console Ambari. By establishing this core for the ODP kernel, the intent is to run applications on OPD built on Hadoop stack. Just to add, the ODP core is a set of software components and open source tests that you can use to build solutions.
With the advent of IoT/WoT, the need of an hour is the data itself. Rather, the need is to enhance the communication among growing network of objects. To facilitate it, the open data platform is the key area here as it leverages the Hadoop ecosystem.
Probably, if you can help me in explaining this graphic, it will help explaining the flow here.
Let us understand the process flow of the Open data platform. The ODP supports the Open Data lifecycle process, including:
- Consuming Data
Also read – Role of Big Data in User Authentication
A Matter of Choice
The way forward for ODP is the Standardization model. Well, the standardization has its own set of advantages but choice is what makes you empowered. It brings up the competition healthy and better.
So, let us wait and see how the Industry embraces ODP given the standardized model. There are still many questions that are unanswered like fees structure, governance model and voting rights. The bigger question is whether ODP effectively addresses the key customer questions. Let us see how this initiative goes further and benefit the community.