“Big Data” actually comprises of the huge amounts of data collected about every person on Earth and their surroundings. This data is collected by various organisations, companies and by the government as well. The data generated is extremely huge and it expected to even double every two years. This means that if the total data generated in 2012 is 2500 exabytes, then the total data generated in 2020 will be about 40,000 exabytes! Such data collected, is used in various ways for improvement of the customer care services. But, the huge amounts of data generated are presenting many new problems for data scientists, especially with regard to the privacy.
So, the Cloud Security Alliance, a non-profit organization that promotes safe cloud computing practices, looked around to find out the major security and privacy challenges that big data faces.
How do these problems arise?
Only the vast amounts of data themselves are not the cause of privacy and security issues. The continuous streaming of data, large cloud-based data storage methods, large-scale migration of data from one cloud storage to another, the different kinds of data formats and different types of sources all have their own loopholes and problems.
Big data collection is not a very new thing, as it has been collected for many decades. However, the major difference is that earlier, only large organisations could collect data due to the huge expenses included, but now nearly every organisation can collect data easily and use it for different purposes. The cheap new cloud-based data collection techniques, along with the powerful data processing software frameworks like Hadoop, are enabling them to easily mine and process big data. As a result, many security-compromising challenges have arrived with the large scale integration of big data and cloud-based data storage.
The present day security applications are designed for securing the small to medium amount of data, thus, they can’t protect such huge amounts of data. Also, they are designed according to static data, so they can’t handle dynamic data either. A standard anomaly detection search would not be able to cover all the data effectively. Also, the continuously streaming data needs security all the time while streaming.
The ten biggest big data security and privacy challenges
To make a list of the top ten big data security and privacy challenges, the CSA (Cloud Security Alliance) Big Data research working group found out about these challenges.
Securing transaction logs and data
Often, the transaction logs and other such sensitive data are stored in storage medium have multiple tiers. But this is not enough. The companies also have to safeguard these storage against unauthorized access and also have to ensure that they are available at all times.
Securing calculations and other processes done in distributed frameworks
This actually refers to the security of the computational and processing elements of a distributed framework like the MapReduce function of Hadoop. Two main issues are the security of “mappers” breaking the data down and data sanitization capabilities.
Validation and filtering of end-point inputs
Endpoints are a major part of any big data collection. They provide input data for storage, processing and other important works. So, it is necessary to ensure that only authentic endpoints are in use. Every network should be free from malicious endpoints.
Providing security and monitoring data in real time
It is best that all the security checks and monitoring should occur in real time, or at least in nearly real time. Unfortunately, most of the traditional platforms are unable to do this due to the large amounts of data generated.
Securing communications and encryption of access control methods
An easy method to secure data is to secure the storage platform of that data. However, the application which secures the data storage platform is often pretty vulnerable themselves. So, the access methods need to be strongly encrypted.
Provenance of data
The origin of the data is very important is it allows classifying the data. The origin can be accurately found out by proper authentication, validation and by granting the access controls.
Granular access control
A powerful authentication method and Mandatory Access Control is the main requirement for the grained access of big data stores by NoSQL databases or Hadoop Distributed File System.
Regular auditing is also very necessary along with continuous monitoring of the data. Correct analysis of the various kinds of logs created can be very beneficial and this information can be used to detect all kinds of attacks and spying.
Scalability and privacy of data analytics and mining
Big Data analytics can be very problematic in the sense that a small data leak or platform loophole can result in a big loss of data.
Securing different kinds of non-relational data sources
NoSQL and other such types of data stores have many loopholes which create many security issues. These loopholes include the lack of ability to encrypt data when it is being streamed or stored, during the tagging or logging of data or during classification into different groups.
As every advanced concept have some loopholes. Big data also has some in the form of privacy and security issues. Big data can be secured only by securing all of the components of it. As big data is huge in size, many powerful solutions must be introduced in order to secure every part of the infrastructure involved. Data storages must be secured for ensuring that there aren’t any leaks in it. Also, real-time protection must be enabled during the initial collection of data. All this will ensure that the consumer’s privacy is maintained.