Overview: Big data means huge volume of data which counts in terms of terabytes and above. These data are of different formats from different sources and then processed/transformed into various software packages/applications. The security aspects of big data applications are often ignored or treated as the secondary requirement. But the security aspect has a tremendous impact on the application as it is handling data.
In this article I will talk about different steps and tools used to protect big data applications.
Introduction: As big data is spreading across different domains, the security aspect is getting more and more attention. Earlier we had end-point centric security systems, but it is not sufficient to protect your application from intrusions. Big data brings with it a different set of security concerns which is very much different from normal applications.
In today’s world, security is very difficult to explore and navigate. It is also very expensive to implement proper end to end security system throughout the software system. And there is always a possibility to breach the security no matter what policy/system you follow. So the organizations taking big data initiatives should plan accordingly based on their budget and policies. All big data organizations are always recommended to adopt modern and up to date security practices.
Security risks in big data environment: In big data age there is a significant growth in data volume, data velocity and data variety. There is also a close relation between the growth in cloud model, mobile apps and other interconnected applications. The data flows from one point to another through different systems, applications and environments. This data explosion offers meaningful insight to the business, but it also exposes the business data to various systems/process/people etc. As this huge volume of data is stored, processed, analyzed and shared in different collaborating systems, there is always a chance of security breaches.
Big data is collected from different sources and various types of business intelligence tools are used to analyze it and get meaningful insight. This information is accessed and used by the decision makers. Sometime the data is also used for collaboration. The tools used for collaboration and processing are also having security limitations. So, there is always a probability of exposing sensitive data/content. Once the value elements of big data are identified, it can be accessed, updated or even changed by the users. This can cause serious security issues and threats to the organizations.
Advanced IT security can ensure information security in a collaborative environment. Big data organizations need to be more precise on controlling and balancing business requirement and data protection. Following are some steps to protect data in a collaborative environment.
- Break big data into small data: The idea is to split big data into small data. In this way the system will be able to handle the volume, velocity and variety of big data. As a result, organizations will also be able to make faster and accurate business decisions.
- Identify the context of the information: This is very important to identify the context on which the data is accessed and used. Organizations need to identify the employees, partners, vendors or any other third party who are involved in this collaboration and also the communication channel. This gives a detailed idea about the collaboration environment and its stake holders.
- Deploy data controls: Data controls are very important to deploy at the strategic locations. This will secure data protection and collaboration.
- Deploy control for cloud and mobile environment: Cloud and mobile collaboration is an essential part of any application and its deployment. Organizations need to understand and identify how the data is shared in cloud and mobile environment. After this they need to manage this high risk area of collaboration.
In cloud and mobile environment, data is being shared outside the traditional network environment. So the organizations are also taking proper steps to protect this sensitive data in different environments, which can be on-premises, cloud or mobile or may be in an enterprise. By doing this organizations are realizing the real benefit of big data sharing. To reduce the risk of data sharing, more strict controls should be implemented, keeping a balance between data control and business enablement.
Big data security and the tools: In earlier days, we had single software vendor and single database (like SAP, Oracle, and PeopleSoft) for the entire organization. So the security issues were more visible and easily manageable. But in the current scenario, where we have big data, cloud, mobile devices, the number of security holes in the system is an unknown number. As a result, possibilities of security breaches are much higher.
In the recent development in information security, there are a number of software packages and vendors available to enforce security practices properly. The perimeter security strategy for big data is same as other systems. So, in this section we will only discuss on the ‘inside-the-network’ tools which are required to protect big data.
- Monitoring and logging: Monitor and log everything is the best strategy to detect unauthorized activities. Some logging systems like syslog (on linux), event log (on windows) can be effectively used. SNMP is also very useful to log network events. There are also different software packages available to aggregate logs and store it in a central location for analysis. These are known as Security Information and Event Management software (SIEM) packages.
- Analyzing and auditing: Main functionality of SIEM packages is to automatically detect unauthorized activities and generate warning. But all SIEM software requires configuration to work properly. So it is always recommended to use pre-configured SIEM packages which are updated frequently and capable of identifying major number of security breaches through log analysis. The latest SIEM packages are LogRhythm, Q1 Labs (IBM), McAfee, Splunk etc which are having good capability.
- Managing Identity: Identity and Access Management (IAM) is very important to protect big data. Because the data is accessed by employees/contractors by using different channels like mobile devices, SAAS model and other services as well. So it is very important to consider ‘ídentity‘ as the new perimeter to identify who are accessing the sensitive data instead of concentrating on the physical location of the data. So it is absolutely necessary to consider a collection of tools which will help us to deal with perimeter failure.
- Masking the data: Data masking is another way of protecting data security. The data can be masked by using encryption or tokenization. Some vendors also demand that their data masking tools do not follow encryption or tokenization but do the entire masking dynamically. But in the big data context it is perfect to use either encryption or tokenization.
- Application security: The final step is to ensure security within the big data applications which are accessing sensitive information. This is very critical in the age of big data as most of the popular tools are not built keeping security factors in mind. In the recent time, most of the big data tools are improving significantly on the security side. The two most important factors are ‘permissions at granular level’ and ‘data encryption’. The latest version of Hadoop is expected to support new security features and probably address some of the emerging issues.
Summary: In today’s world, big data security is a big concern. As we know big data systems are not like normal single vendor systems, so the security issues are much more critical to handle. There is no single solution/tool/vendor which can protect your big data, but you need to use different security tools which are effective, may be in one area or other. So the ultimate solution is to keep on using multiple effective tools over the time. As a result, at some point in future you will have good and comprehensive security system in place.