In a previous blog, I wrote about why it’s so important to select the right partner for your data encryption needs. Now, I’d like to cover the five questions every encryption vendor must be able to answer about big data security. And if you don’t like their response, by all means, move on:
Does the solution give you full control over your keys, even as data flows from one system to another?
It’s often said that key management is the hardest part of data encryption. That’s because there’s often a lack of clarity around key management and access. When evaluating encryption vendors, be sure to ask what types of key control policies can be established to prevent unauthorized access, and always be sure the data owner, not the cloud provider or other administrator, has complete control of the encryption keys.
Does the encryption solution allow for separation of duties between authorized personnel and systems administrators?
What good is data encryption if everyone, whether they need it or not, has access to the encrypted data? Proper, policy-controlled key management allows for separation of duties that allows system and cloud administrators to perform their jobs but restricts them from accessing encrypted data. The most important part of key management is ensuring the keys do not reside on the same server as the encrypted data. This is akin to locking your car and leaving the keys in the driver’s side door.
Does the solution work in mixed IT environments where data is stored in public and private clouds as well as in an on-premises data center?
Look for a software-based encryption solution that performs just as well in a data center as it does in the cloud. Remember that regardless of where the data is stored, it’s important that the data owner, not the hosting provider, retain possession and management of the crypto keys. If your encryption solution doesn’t allow you to manage the keys, then look elsewhere.
Has the solution been tested and/or benchmarked on the applications running in your environment?
Most large organizations utilize a variety of database applications from the more traditional like MySQL and PostgreSQL to newer big data apps like Cassandra, MongoDB and HBase. To ensure your encryption utility functions cross-platform and meets your performance standards, ask your provider whether they’ve tested against the databases that are most important to you.
Does the solution use NIST-approved encryption algorithms?
The National Institute of Standards and Technology Computer Security Division publishes security requirements, FIPS 140-2, for cryptographic modules. If your vendor solution uses FIPS-validated crypto modules, you can feel confident in the strength of their cryptographic algorithm.
The bottom line is, you don't need to employ cryptographic experts to secure your data and meet HIPAA, FIPS, FERPA or PCI compliance initiatives. All it takes is a trusted security partner who can help you get where you need to go.
Rarely a day goes by that you don't hear about a data breach. Hospital records stolen. Social media accounts hacked. Education transcripts revealed. Every industry is susceptible and every company is at risk. The result can be embarrassing and expensive at best and absolutely crippling at worst, with potential fines, time-consuming lawsuits, and subsequent loss of customer trust.
The steady pace of breaches reinforces the need for encryption as a last line of defense. Recently however, one of the oldest and most effective security tactics has been largely relegated to an afterthought in today's new cloud and big data environments.
This is the result of some common misperceptions about encryption and key management related to cost, performance and ease of use.
Today we set the record straight, breaking down the nine biggest encryptions myths.
Myth 1: Encryption is only for organizations that have compliance requirements. Certainly any company in a regulated industry that mandates data security and privacy should encrypt. That's a no brainer. But a better way to think about encryption is this: if you've got data about your products, customers, employees or market, that you believe is sensitive/competitive, then you should ALWAYS encrypt it, whether there's a legal obligation or not.
Myth 2: SSL encrypts data everywhere.
SSL only encrypts data in motion; it does not cover data at rest. As data is written to disk, whether it's stored for one minute or several years, it should be encrypted.
Myth 3: Encryption is too complicated and requires too many resources.
Data encryption can be as complicated or as easy as you want to make it. The key is to understand the type of data that needs to be encrypted, where it lives and who should have access to it. There are plenty of readily available, easy to use and affordable encryption tools on the market. If application performance is important, look for a transparent data encryption solution that sits beneath the application layer and does not require modifications to your operating system, application, data or storage.
Myth 4: Encryption will kill database performance.
There are a number of factors that impact database performance, and encryption is just one. Application-level encryption tends to pack the greatest performance hit, while the file-level encryption penalty is much lower. For maximum application performance, run block-level encryption on a system utilizing the Intel AES-NI co-processor.
Myth 5: Encryption doesn't make the cloud more secure.
On the contrary, in many cases storing encrypted data in the cloud is oftentimes more secure than keeping it on premises where insiders may have easier access. To ensure the safekeeping of encrypted data in the cloud, make sure you, not your cloud provider, maintain control of the encryption keys. If your provider requires you to hand over your keys, find another cloud service.
Myth 6: Encrypted data is secure data.
Too many organizations fail to effectively manage their encryption keys, either storing them on the same server as the encrypted data or allowing a cloud provider to manage them. Storing the key on the same server as your data or handing them over to your cloud provider is akin to locking your car and leaving the keys in the door. Good key management, with strong policy enforcement makes all the difference.
Myth 7: Key management requires expensive, cloud-adverse hardware.
While this was once true, today there are effective software-based solutions that enable organizations to deploy key management in the cloud or on premises. These solutions can typically be provisioned far faster than hardware security modules (HSMs), are very cloud friendly and meet most compliance statutes.
Myth 8: If your data is encrypted, it can't be stolen.
There is no security solution that will protect your data 100%. In fact, companies should operate with the mindset that their data can and likely will be compromised at some point in time. Data encryption can make the breach aftermath much more palatable though, since encrypted data cannot be decrypted without the key
Myth 9: Encryption is old school. I need a newer security technology to protect big data.
Data encryption is a proven security technique that works very well in modern NoSQL environments. As big data projects move from pilot to production, sensitive data such as protected health information (PHI), financial records, and other forms of personally identifiable information (PII) will likely be captured, processed, analyzed and stored. Encryption is just as integral to securing data in NoSQL as it is in traditional relational database systems.
Firewalls and VPNs can provide some protection against data breaches and theft, but there is no substitute for strong encryption and effective key management, especially in big data and cloud environments. Now that the biggest myths have been busted, there's no longer an excuse not to encrypt.
I'd like to address a recent blog post in CloudTweaks titled, "Cloudera Not Cutting It With Big Data Security." The author makes a number of very salient and valid points about Hadoop security… or lack thereof.
Indeed the Apache Hadoop platform, which includes HDFS and MapReduce and other projects like HBase, Mahout and Hive, was not designed for security. The Hadoop name, for better or worse, is nearly synonymous with big data because it delivers the "three V's" (velocity, variety, volume) at massive scale, enabling organizations to crunch, process, analyze and retain data like never before.
Clearly there are security and compliance implications to big data. Consider the following:
The blog suggests that Cloudera, and I presume other commercial Hadoop vendors, should do more to address the security concerns in Hadoop.
I believe Cloudera absolutely has the right approach to security.
Cloudera has some of the brightest Hadoop and Apache minds in the world. They're experts in enterprise-class systems management. That's what they do. Addressing the big data needs of customers should always remain the company's primary focus.
Cloudera has also cultivated one of the most comprehensive partner ecosystems in the big data market. This is important because it enables Cloudera to focus on its core strengths, while leveraging outside expertise in analytics, BI, cloud computing, and of course, security.
Think about it this way: Would you expect the same company that built your house to install the alarm, mow your lawn and provide Internet service?
Certainly not, so why then would you demand security from a company that specializes in Hadoop?
The right approach is to look for a company with expertise and experience in securing Hadoop platforms.
There's a false narrative that traditional relational databases inherently offer cutting-edge security, but the truth is, security was (and remains) a responsibility of the end user. Encryption, authentication, policy enforcement and other security tools, are all available for CDH and Hadoop, even if they're not provided directly by Cloudera. A customer can work with Cloudera to locate the right vendor for their particular challenge and work collaboratively to build an integrated, secure Hadoop platform.
Data security - particularly Big data security - is quickly becoming a hot topic, as Hadoop-related projects migrate from pilot to production environments. We believe it's incumbent upon big data providers to develop an ecosystem of tightly integrated vendors to complete their security offerings. Cloudera has certainly done that.
Gazzang is proud to be certified Cloudera partner and will continue to provide enterprise-class data security to Hadoop users.
At Gazzang, we have a mantra that borders on religious fanaticism.
“Customers First. Always.”
It’s the reason we can claim deep expertise in securing unique, enterprise-scale big data environments. It’s the reason we know cloud encryption better than anyone else. And it’s the reason no one on our customer support team owns a bed.
Customers also have a significant impact on our product development cycles. A perfect example being today’s exciting Gazzang CloudEncrypt™ announcement.
Gazzang CloudEncrypt was designed to meet specific customer use cases for securing sensitive data at every stage of the Amazon EMR process. This is a very different challenge than encrypting data on a persisted cloud platform like Amazon EC2, which can be done with readily available solutions like Gazzang zNcrypt and zTrustee.
CloudEncrypt offers encryption and key management in ephemeral, burstable Amazon EMR processes. The solution, which you can read about in great detail in this white paper, was developed at the request of a handful of Gazzang customers that had two very clear needs in common:
More detailed customer use cases are covered in the white paper, but the top three we’ve heard thus far are as follows:
Customer feedback is a part of everything we do at Gazzang. The ability to learn from and innovate in response to what we hear from the companies we serve is a badge of honor that we wear proudly.
As always, we welcome your feedback on Gazzang CloudEncrypt, your solution for securing sensitive datasets and outputs on Amazon EMR.
Gazzang is hitting the road again. This time, we're in Atlanta. Home to the 1996 Summer Olympics, The Varsity and this week, MongoDB Atlanta. The annual conference is hosted by 10gen, the company behind MongoDB and a key partner for Gazzang. 10gen is a global leader in big data with an impressive customer list that includes Disney, Intuit, foursquare and CERN.
In advance of MongoDB Atlanta, I spoke with Matt Asay, 10gen's vice president of Corporate Strategy about use cases for MongoDB and why the Gazzang relationship is important for 10gen customers:
Gazzang: What are you and 10gen most excited about this year?
Matt Asay: Over the past few years, NoSQL went from an industry curiosity to a driving force for two of the industry's most important trends: cloud and Big Data. Along the way, MongoDB has established itself as the industry's most popular NoSQL database, with broad adoption by a range of customers. Going into 2013, we're seeing early experiments with MongoDB turn into enterprise-wide deployments for some seriously mission-critical applications. It's awesome to see.
But it's also to see how open source becoming such an integral part of how the enterprise builds and uses software. I'm particularly excited to see open source at the forefront of innovation now, and in 2013 I think we're going to see projects like MongoDB, Hadoop, Android, and the various open-source cloud projects drive huge value for consumers and enterprises alike. It's an exciting time to be involved with open source.
Gazzang: What are some of the unique big data challenges that 10gen helps customers solve?
Matt: While MongoDB is often used to manage large volumes of data, most enterprises actually think of "Big Data" in terms of data velocity and variety, as a recent NewVantage survey highlights. Looking at the results, a mere 28% of enterprises today see volume of data as a primary driver for their Big Data projects, falling to 25% in three years. Instead, a whopping 64% are motivated by the need to analyze streaming data, analyze new data types, or analyze data from diverse sources. That number rises to 68% in three years.
With MongoDB, enterprises:
Some of these involve huge quantities of data, but Big Data's value isnt' necessarily tied to volume. It's more about intelligently using one's data to engage customers or others in ways previously difficult or impossible.
Gazzang: Why do you think the health care industry has been quick to adopt MongoDB?
Matt: Few industries generate as much data as the healthcare, making the need to cost-effectively scale so important, something easily managed with MongoDB. We have also seen healthcare organizations keen to blend structured and unstructured data to improve care, and NoSQL databases like MongoDB are an excellent way to effectively embrace a wide array of data sources. I also think MongoDB's document data store is a great fit for how healthcare organizations want to structure their data.
Gazzang: Why is data security important to your customers?
Matt: Many of our customers are in highly regulated industries like Financial Services and Healthcare. Security for these industries is not only a nice-to-have, it's a firm requirement. As important as it is to be able to scale one's databases, and to accept an array of data sources, it's critical for firms in such industries to ensure customer or other sensitive data is secure. And while we build strong security features into MongoDB itself, we're also very happy to work with security solutions like Gazzang to offer an even higher level of security.
Gazzang: Why is the Gazzang relationship important for 10gen?
Matt: As I mentioned, we take our customers' data security very seriously, and have built in advanced security functionality like Kerberos Authentication help security-conscious customers rest easy. But Gazzang helps us to add an even richer layer of security to MongoDB, something especially important to customers in regulated industries.
Gazzang: What can attendees expect to see and learn at MongoDB ATL?
Matt: MongoDB ATL, like all MongoDB events, is very focused on enabling developers and IT operations to get productive with MongoDB. There are no vendor infomercials, from 10gen or any of our partners. We keep the agenda information-rich as our main concern is making sure more and more companies build exceptional applications with MongoDB.
Health care organizations are moving infrastructure and data to the cloud at a fairly rapid pace. A recent study suggests the cloud computing market in health care is expected to reach $5.4 billion by 2017. Enticing as the cloud is, when dealing with highly sensitive and regulated information, it's important to proceed with caution.
The good news for pharma companies, biotech firms and research hospitals - organizations most likely to move heavy big data payloads to the cloud- is that there are some security best practices that can protect data at rest in the cloud. Check out the Infographic below, or send us an email at firstname.lastname@example.org.
In a world flush with Big Data hype, I was pleased to read the Wall Street Journal story, How Big Data Is Changing the Whole Equation for Business. It includes some fascinating, in-production big data use cases. Real companies, using real data, for real insights. As I'm reading about Catalyst IT Systems, Zynga, Ford Motor Co. and others, I kept waiting to read about how are they actually securing big data. And by data, I'm referring to both the data input and analytical output. Where does it reside, and how is it protected?
For example, Caesars Entertainment is analyzing health insurance claim data for its 65,000 employees, and their covered family members. This includes how employees use medical services, how often they visit an emergency-room and whether they choose a generic or brand-name drugs. Tracking this data enables Caesars to find less-expensive healthcare alternatives and save millions in the process. An interesting use case to be sure, but one that involves the handling of HIPAA data.
InterContinental Hotels Group is analyzing information about its 71-million Priority Club rewards members, including income levels and travel preferences, which it then uses to run marketing campaigns. InterContental says the campaign has been a success, with a higher rate of customer conversions than a similar campaign run just a year ago. In this case, the data being collected isn't regulated, but it's still sensitive and personally identifiable.
Living in Austin has it’s perks. Sure we have our endless allergy season, and every once in awhile it rains, but you really can't go wrong here, especially during SXSW. While Gazzang didn't exhibit at this year's event, we had plenty of folks roaming the trade show floor and even more representing at various after-hours networking events (a.k.a. "SXSW parties").
One of the cool things about SXSW is that you can spend all week in central Austin without spending a dime. Everywhere you look, someone or something is handing out free breakfast tacos, beer, t-shirts, even piggyback rides. Just login with your Facebook credentials, tweet, register your phone, share your email, Instagram a photo, or send a text. In the photo to the right, consumers were asked to bump their phones. It's easy and it’s free, right?
Not so fast.
When you get something for free, more often then not YOU are the product. You are trading personal information in exchange for an item that would otherwise cost money. You might stop to ask why they need this data? Where is it going? Who will have access to it? How is it being secured? is it encrypted?
These are fair questions, and our hope is that demanding answers will get mobile app and device vendors thinking long and hard about data security.