New research finds 21% of publicly facing cloud storage buckets contain sensitive PII data.
In an effort to understand the prevalence of publicly exposed sensitive data, Laminar Labs scanned publicly facing cloud storage buckets and was able to detect personally identifiable information (PII) in 21% of these buckets – or one in five. Information uncovered included addresses, email addresses, phone numbers, drivers license numbers, names, loan details, credit scores, and more.
Our original hypothesis was that this publicly available data were public datasets or public files, things that were meant to be online. But what Laminar learned was that the majority of this data was actually misplaced data. Data that was mistakenly placed into a publicly exposed bucket where it became unintentionally exposed. Additionally, in some cases, the S3 bucket may have been misconfigured to be public when it should not have been. Both are prime examples of “shadow data.” Shadow data is any sensitive data that is not subject to an organization’s centralized data management framework and is not visible to data protection teams. For example, snapshots that are no longer relevant, forgotten backups, misplaced data, sensitive data log files which are then not properly encrypted or stored, and many more examples.
Here is a summary of some of the sensitive data that Laminar found
Because this data contains such highly sensitive information as loan details, bitcoin addresses and conversations about unemployment benefits, Laminar believes that this data has the potential to put the organizations to whom the information belongs at risk. Organizations cannot properly protect data they do not know is exposed. And in the shared responsibility model, keeping this data secure is the responsibility of the organization that owns the buckets in which the data resides. Fortunately, there are ways to uncover and address this risk.
PII Data Discovery & Monitoring
The first thing that needs to be done in order to start taking care of the problem is understanding what publicly exposed sensitive data is in your environment. However, doing this in the cloud is not as simple as it may seem. Many times S3 buckets that are not public can contain specific files and objects that are public, leaving security teams unaware of the risks. On the other hand, many buckets are supposed to be publicly exposed, for example hosted websites, and unseen shadow data can be misplaced in these intentionally exposed buckets. These misplaced files are often hard to locate amongst the many legitimate files that are housed inside those buckets.
In other words, what is needed is a data-centric view, not an infrastructure-centric one. A way to catalog all data in a cloud environment, figure out which files and objects contain sensitive information and make sure these objects aren’t publicly available without hindering the availability of other files that are safe.
Third Party Data Access Control
Another needed step is making sure that third parties that need access to your data have access only to what they must, as handing your data over to a third party introduces a whole new layer of security threats.
Patti Jo Rosenthal chats about her role as Manager of K-12 STEM Education Programs at ASME where she drives nationally scaled STEM education initiatives, building pathways that foster equitable access to engineering education assets and fosters curiosity vital to “thinking like an engineer.”