The data virtualization industry has been growing fast, something that experts think is going to last. As data becomes one of the most important assets in business, corporations are looking for ways to get the best out of it. The task, you might have guessed already, is not easy with several hurdles on the way. Businesses have to manage growing volumes of structured and unstructured data from different sources. As if that task was not difficult enough, different RDBMSs do not talk to one another. Add to that the growing needs for Business Intelligence (BI), analytics and reporting and the plate for the IT departments in organizations is already overflowing. Data virtualization appears to be the solution for these problems because it decouples data from applications and places the data in the middleware. Data virtualization also potentially provides a unified view of data from disparate sources in a format BI or business users want. But putting data in the middleware is easier said than done. From the perspective of the IT department, implementing data virtualization has been a big challenge. Fortunately, firms like Oracle, Redhat, IBM and Microsoft have been working on providing high quality data virtualization tools.
What is data virtualization?
Data has been becoming increasingly important from the perspective of good business decisions. Companies want a comprehensive and unified view of the data collected from different sources. To do that, data integration is necessary. However, the challenge of managing data has been becoming bigger and more complex mainly because of the following reasons:
- The volume of data has been growing, especially after the arrival of the big data concept.
- Companies now have to deal with both structured and unstructured data. Managing unstructured data puts a lot of strain on the company resources.
- Companies use different database management systems (DBMSs), such as Oracle and SQL Server which were not designed to be compatible with each other.
- Companies are under legal compulsion of retaining data because of data-retention regulations like the Sarbanes-Oxley Act. This has resulted in an unprecedented rise in the amount of data they have to store.
- BI or the business users now need self-service analytics for making better, informed decisions or strategies. They need a unified view of all data. It is a huge technical challenge to bring quality data together to offer a unified view.
According to Noel Yuhanna, an IT analyst with Cambridge, Mass.-based Forrester Research Inc, “Data integration is getting harder all the time, and we believe [one of the causes] of that is that data volumes are continuing to grow, you really need data integration because it represents value to the business, to the consumers and to the partners. They want quality data to be able to make better business decisions.”
Data virtualization potentially addresses such problems by decoupling data from different applications and then places the data in a middleware. Since data resides in the middleware, dependency on DBMSs reduces. Data virtualization tools do not place the actual data in the middleware but only maps to the actual location. Data virtualization is also capable of providing a unified view of the data collected from different sources and this capability is going to get stronger as firms offer more powerful data virtualization tools.
From the perspective of the user, there is no need to spend effort on finding the technical details of the data in the middleware such as formatting and location. The user just needs to think about the data itself.
This case study finds out how data virtualization solved a business problem faced by Pfizer, Inc. the largest drug manufacturer in the world that develops, manufactures and markets medicines for both humans and animals.
The Worldwide Pharmaceutical Sciences division in Pfizer determines which drugs are going to be introduced to the market. Obviously, that is an extremely important role. However, the Worldwide Pharmaceutical Sciences division was also constrained by technological limitations. As part of its day-to-day operations, different stakeholders would want to view data that resided in multiple applications. The data integration request would be carried out by the traditional Extract, Transform and Loading (ETL) process and that is where the problems started. There were two types of problems basically: one, the ETL process was slow and inflexible and, two, the applications hosting the data did not talk to each other. Another problem was the inability to add new data sources or applications that would host data. As a result, a process that was inherently slow struggled to deliver a unified view of data, collected from different applications. That resulted in project slowdown, cost escalation and wasted investments.
Pfizer selected a data virtualization tool from a provider and over time, reaped benefits. The benefits were:
- The tool did not access the data sources to cater to data integration requests. Instead, it would store a view of the data in a middleware or a cache. So, the speed of data integration request fulfillment increased.
- Unforeseen events such as server crashes did not become a showstopper because in such events, users could still views of the data that was stored in the memory.
- The data virtualization platform supported additions of multiple, different data sources such as cloud-based CRM systems and business intelligence (BI) tools.
- Since the data would be stored in a middleware or in the memory and nor accessed from the hosts, the platform could offer unified views of the data as the users would like.
Implications of the rise of data virtualization
Many believe that the rise of data virtualization could diminish the importance of ETL processes significantly. Data from certain industries substantiate such views. For example, Novartis, The Phone House and Pfizer have already turned to data virtualization. Companies that deal in huge data volumes and have legacy data sources are especially investing on data virtualization. Data virtualization offers clear advantages when it comes to offering unified, real-time views of data. Companies need agile, quick fulfillment of data integration of data requests which is extremely difficult with ETL.
However, there is another group of people who believe that it is not all gloom and doom for ETL. According to Mark Beyer, research vice president for information management at Gartner Inc., The EDW is not going away — in fact, the enterprise data warehouse itself was always a vision and never a fact, now the vision of the EDW is evolving to include all the information assets in the organization. It’s changing from a repository strategy into an information services platform strategy.”
It is undeniable that data virtualization is on the rise and the glory of ETL is fading, even if slightly. However, there are still a number of hurdles on the way of seamless adoption of the data virtualization platform. IT departments are finding it technically difficult to create maps of data from the data sources and placing them on the middleware. Also, from a technical viewpoint, creating unified and customized views from several different data sources for different customers is an extremely challenging task. Such challenges need to be acknowledged and proper planning needs to be done.