IoT Spotlight - EP 091 - Manage your IoT cybersecurity landscape - Gabe Gumbs, Chief Innovation Officer, Spirion

Podcasts > Technology > Ep. 091 - Manage your IoT cybersecurity landscape

Ep. 091

Manage your IoT cybersecurity landscape

Gabe Gumbs, Chief Innovation Officer, Spirion

Tuesday, June 08, 2021

In this episode, we discuss the cybersecurity threat factor landscape and the growing security risk of data generated by IoT devices. The importance of spending as much as possible in problem space and thinking from the perspective of a hacker.

Gabe is the Chief Innovation Officer at Spirion. Spirion is the critical first step toward data privacy and security. We build and deliver the most accurate data discovery and classification solutions on the planet to position our customers for unparalleled data privacy, security, and regulatory compliance. Since 2006, Spirion has empowered the data privacy, security, and compliance strategies of thousands of organizations worldwide and across multiple industries. spirion.com

Gabe's podcast: Privacy Please

Transcript.

Erik: Welcome to the Industrial IoT Spotlight, your number one spot for insight from industrial IoT thought leaders who are transforming businesses today with your host, Erik Walenza.

Welcome back to the Industrial IoT Spotlight podcast. I'm your host, Erik Walenza, CEO of IoT ONE. And our guest today is Gabe Gumbs, Chief Innovation Officer at Spirion. Spirion builds, it delivers data discovery and classification solutions that eliminate data privacy breaches. In this talk, we discussed the cybersecurity threat vector landscape, and the growing risk to data generated by IoT devices. We also explored the importance of spending as much time as you can in problem space and in thinking from the perspective of a hacker.

If you find these conversations valuable, please leave us a comment and a five-star review. And if you'd like to share your company's story or recommend a speaker, please email us at team@IoTone.com. Thank you.

Gabe, Thank you so much for joining us today.

Gabe: Absolutely, a pleasure to have me. It's nice to talk to you, Erik.

Erik: So, Gabe, we're going to be discussing here the topic of how to secure your data footprint. But before we get into a deeper dive on your company, I'm really interested in understanding how you got into this topic originally because it looks like you basically graduated from university and jumped headlong into the topic of cybersecurity. What actually led you to this topic 20 odd years ago?

Gabe: What looks like a straight linear path actually began before university, and spending my formative years in the Northeast of the United States, it was a very vibrant technology and a hacker community that was brewing. And in that environment, there were a lot of hobbyists that were exchanging information and just kind of enjoying the craft of hacking as it were. I use the word hacking and more of the traditional sense of manipulating the world around you for fun and/or profit for that matter. So this is the days when freaking was still an art for with a P that is.

And so my journey into security as a profession really began on the IT side of things as it had with many people from my generation. And this was around the time when my interest outside of my professional work, had already included many security topics and when IT was under heavy assault from the threat actors. Threat models were shifting from the hacktivists, which of there were many back then, and a lot of folks that were mischievous, although illegal, it started making that shift into more of the professional and focus attacks that it's graduated into.

Erik: It's become quite professional, both the criminal organizations and also the government or the corporate espionage has become really an industry in the past. When did you start to see that change towards more of a professional approach from hacking?

Gabe: It depends on where we're drawing the line on professional. The change in how attackers began organizing, that is non-state actors. But when we saw other individuals organizing in ways that brought groups of people together for a single cause, I think we're starting to see a lot of that in the early 2000s. By the mid-2000s, it certainly was there. Right around that same time too, we started seeing more entrance from organized crime also into the hacking scene. And then you even had a number of what I would call malicious hobbyists that were able to grab a toolkit or two off of a forum and create some malware on their own for random hydrants and shenanigans. But now that's graduated even further into ransomware threats where anyone can grab one of these toolkits and hijack someone's sensitive data for in exchange for money.

Erik: Now we have the IoT topic, so this has a kind of a new day mentioned. So certainly, it's an evolving battlefield here. You've worked with what looks like six or seven different companies around the cybersecurity topic. But you joined Spirion back in 2017, you’re at White Hat Security. Before that, what was it that struck you around this team that made throw your hat in the ring with them?

Gabe: I'll give your audience the story that I tell in my personal life also. I originally wasn't that interested in joining the organization. In fact, when I first joined the name of the company was Identity Finder. And I remember hearing that name I’m thinking to myself, it sounds like some identity and access management, but it didn't literally just based on the company name. I thought to myself this doesn't sound like a problem space that I might be interested in.

But then I sat down, and I talked to the founder, Todd Feynman, and he was just so incredibly passionate about protecting sensitive information, and more importantly protecting the personal information of individuals that I really started to sit up and pay attention. And as I got up from that conversation, I remember thinking to myself, oh, yeah, no, this is not just a problem space that interest me; this is certainly something that I am rather passionate about. And so that's what brought me to what is now Spirion. We rebranded several years ago from the old Identity Finder to Spirion.

Erik: The rebranding, was that attached to like an M&A activity or a shift in business, or is that just purely a branding decision?

Gabe: Purely a branding decision. I think part of that decision was based at the time in my own bias as well, like you heard that name and you immediately thought that we solve the problem that we weren't actually solving, or we were in a completely different space. So there was it part of that. Although the irony as you fast forward to today, 2021, as we sit here, and there are some 80+ countries in the world, and 30 odd states in the US that all have privacy laws, all of a sudden the name Identity Finder seems completely apropos. But it was largely a branding decision.

Erik: I think young companies tend to try to choose descriptive names because that helps. We chose the name IoT ONE, and then you find that you end up doing a lot of things that are out of the direct scope of your name. And it actually would be useful just to have kind of an interesting name that doesn't have any particular meaning. That’s something a lot of companies get into.

So I was always looking through your company overview, but it looks like the value proposition you have as a company a lot of data stored in a lot of different places you don't know what you have, you don't know what you have that might be conflicting with some national security law or just that could be sensitive data that's unsecured. And you actually helped to track down that data so people are aware of it and then secured, if I'm correct.

I thought that was quite interesting because normally when I have conversations with security professionals, it almost feels like implicit that your data is somewhere in a known location and it's just about securing that location. But as I was reading through the proposition of Spirion, it looks like it's more you don't know what data you have, you don't know where it's stored, so first you actually have to map it out and understand it and then you can secured. Am I on the right track here or how would you frame the value proposition come in?

Gabe: You are on the track, not just the right track, you are on the track. Let me take it up one level first and talk a little bit about Spirion why. So our why, the reason why we exist, the reason why we all at Spirion get up every morning is to protect what matters most. And what matters most is again that sensitive data of everyone, yours, mine, our parents, our friends, our families, our colleagues.

And when you start with that why, what you just mentioned them have this notion that when you talk something like oh, the data is here in this location or these known locations and it's protected, it does kind of gloss over what if all of that data wasn't just data, and what if it was the digital representation of yourself everything, that made you whole in the real world but in its digital sense?

When you receive compensation, paycheck, monies, whatever, it is a most of us don't get paid in actual dollars, physical one any longer. There’s now most of us don't even get paid an actual paychecks physical ones any longer. Right? There's random ones and zeros that that enter your bank account. And those random ones and zeros represent data. And then there's other data about you might have a mortgage. And so your mortgage company has a large amount of sensitive data about you as well today. They've got your history of all of the transactions you've made and your credit scores, and then everything else. And so there's all these companies that have all of this sensitive data on you.

And so if I told you that implicitly, like forget all the data they said that they think they know and have secured, if you were to go to them and ask them, do you know where all of my data is? You would want that answer to be yes. But the truth is that's a very difficult challenge for most organizations. And it's largely because we create a massive amount of data. Being on a show has its roots in IoT, IoT in particular is responsible for generating an ungodly amount of data, including sensitive data.

I think about wearables, and about the utmost and sensitive data knowing my physical locations and my activities, and even health information about me, that's a lot of information to create on a regular basis. And then we share that information. We take that information, and we say I would like to be able to study this information further, and maybe even find other people like the ones that I want to sell more things to or become parts of groups of whatever it is, and so I analyze that data. And then I share that data with other people, they analyze it, so on and so forth. Before I know it, my data is now everywhere. It's no longer just in that one place. It is in multiple places.

Think about data being digitalized. When I send you a document, what I'm really doing is replicating it. I haven't actually sent you the document. I've sent you a digital copy of the document. It's another two copies of that document. If that document that sensitive data in it, well, now there's two copies of it. And so the challenges that we solve for are finding all of that data because you have to know where all of it is. You cannot secure it all if you do not know where it all is. That is just a statement of fact.

And then because we have so very much of it, so very much data, and again, I think that IoT world is a great example of the data explosion we've seen. And believe I've heard it quoted before that most of the data that we've created as species was created in the last like 5 or 10 years or whatever it is. So then you've got to be able to automate all of that because you there's no way you will keep up with being able to find all the data, identify, classify it for what it is, and even make sure that you slap a label on it so that when it moves around, other people know what it is, the technologies know what it is, and so forth. And you've got to start doing that not now, but right now.

Because, again, if all of the large amount of data in mankind's history created in the last 5-10 years, and we expected that just growing exponentially, if we don't start now, the problem doesn't just get a little bit worse, it gets exponentially worse.

Erik: And so the process here is it a white hat data scraping exercise where you send spiders into a network and you just root out the data based on what would this be? Would this be searching for metadata that keyword searches for categories, and then you said you then attach a tag to the data so that if it's replicated you're able to track it or identify it as it moves to different systems? Can you just explain a bit the process that you would apply here to execute on this?

Gabe: Yeah, it's all of those things. So in an oversimplified manner, it is just going out and crawling all of the data and looking at examining it. To be a bit more specific, what we do is we reach into all of those structured and unstructured environments in the cloud and on premise on endpoints. We reach into all of the locations that data leaps. We get into those locations, and we start identifying sensitive data. And that will differ from customer to customer. It will largely depend on what type of business will dictate what type of sensitive data you have.

Every company has some very similar data types. We all have some financial data. We definitely all have employee data. And we certainly even may have some intellectual property in there. But if you happen to be a medical device, you may also have some clinical trial data inside there. So depending on what business you're in, you may have lots of different types of data. But we go in and we find it all those locations.

But whether you're keeping that on a NAS drive somewhere or a laptop or an S3 bucket in AWS, doesn't matter to us, we're going to go find it. And there are different techniques and technologies we actually use to find it. The thing about data is it doesn't look the same in every environment, like not even a little bit, in fact. It's kind of one of those classic overfitting, underfitting problems where if we tried to look at this problem too narrowly, we will miss all of the data. And as I mentioned before you have to find all of the data. But if we look at it too widely, then we also end up just trying to analyze every single thing from a human's perspective and again, then we can automate that. And remember, the other thing I said that we have to do is automate it.

And so we use a combination of tools and techniques in machine learning, and regression models to look for data in different ways because different data types even are structured differently. Some data is extremely well defined, and well formed in their patterns like the credit card number. And they can be fairly easy to find just using basic rules systems. I use rule systems broadly because they can be machine learning rules-based systems. They can be non-ml rule basis. But they're very good for rule based systems.

Then you have things like names, which are not very good for rule-based systems. Here in America in particular, we have a lot of different cultures that all exist natively within the country, and that make their way here every day. And so names look like every name on the planet. So what does the human name look like? Sometimes it looks like just a noun. And so finding things like names are the types of very difficult data finding challenges that we solve for. Finding credit card numbers, finding that net sea of data, that massive amount of data is what we really excel at.

And then yes, then we put those controls around it, and we slap that tag on it and say, hey, wear this around your neck so that everyone knows exactly what you are, and make sure that you don't leave the building by accident, for example, have the data loss prevention technology stop them from exiting, but then also allows for things like developers to be able to then take that data set, de-identify, and then they can use it in testing out their new systems, so they're not loading production data in a test system.

Again, that gets really, really, really problematic when you're dealing with sensitive data, especially people-sensitive data, if that is my health information, and that's being tested on a new system, and that system fails, or it's attacked, attacked, etc, that's problematic. But then you have a privacy issue. Do I really want that developer seeing my health data? Like, she's not a medical professional. I don't really want that person to have an access that. They're not legally even allowed to here in America.

So find that data and remediating in different ways where different ways might be encrypting it, but it might be in de-identifying it. It might simply be moving it to a more secure corner of your infrastructure. Anyone who approaches solving for data security with any one hammer is going to find themselves breaking a lot of things because we're not surrounded by mails. That's somewhat cryptic analogy is. We have to apply different remediations to the data itself. We have to not just try to apply controls to our ingress and egress flows because that's not how we operate in the real world.

And the stricter we make those controls, the more humans are likely to try and circumvent them if they're just trying to do their jobs. So what we aim to do is reduce that friction while protecting that sensitive data. Find it, find it everywhere, make sure that we're automating that process to remove as much human elements out of the mix, both to ensure that we can scale but also to remove any of the human frailties, nearly all of us looking at the data in the same way, and then making sure we protect it, remediating it. And doing that in the different ways that might be necessary if you need to share that data, if you need to analyze that data, if you need to simply hold on to it for some period of time because you have some law that says so, we enable and empower businesses to do those things.

Erik: When we think about data that you would want to secure, at least from my layman's perspective, there's two big clusters that jumped out at me. One is the personal identifiable data. So anything that's related to human that for privacy reasons should be somehow protected. And the second is something that's related to intellectual property, whether it's research or how a manufacturing process works.

And it feels, again, from a layman's perspective, that there's very different threads behind this. So when you're dealing with personal data, then there's a lot of regulation that's driving and also a lot of brand damage that could be done if you lose people's data. And then if you're looking at from the other perspective, from an intellectual property perspective, that's maybe less of a regulated environment but more of an environment where the companies just have a natural sense of wanting to protect this data because it's related to their competitive positioning.

Are you focused primarily as a company on the personal data? You do cover both of these? And if so, is it primarily the same set of tools that you'd be applied in both cases? Or are there important differences in terms of how you would approach securing these two different buckets of data?

Gabe: If that helps you think through it, I don't think it's a bad cluster. There are however additional clusters that we should discuss. So, personally identifiable data is a very good label for all of the different data types that might identify you as a person. And some of the regulation out there GDPR, in particular, has gone as far as to describe that data as “directly or indirectly identifiable”, which I think is also a very good distinction. Because sometimes we don't think about something like maybe your telephone’s IMEI numbers being identifiable to you because you don't use it. Like you don't use that number, you don't get that number to your friends, but that number is tied to you in the dataset that your telecom company has.

And so that's a directly identifiable number. But then there's other indirectly identifiable data too. Like if I just gave you a cohort of individuals that all live in this same area, and share similar attributes, but very specific attributes, like they're all Hispanic females between the ages of 22 and 23 in the zip code, that does become very identifiable. You can start re identifying who those persons are. So generally speaking, your buckets are bad. However, there are lots of other different buckets. You also mentioned intellectual property, and that may be being less governed.

Well, intellectual property, if you are a pharmaceutical organization, that's going to be extremely governed, at least again here in the States, I'm sure many other countries. If you create encryption technology, you certainly have some intellectual property that you want to protect. But again, here stateside in the US, we've got the International Trade Arms Regulations, ITAR, and that restricts what we can do with that data, who we can share it with other countries, other individuals.

So we happen to live in an increasingly regulated world. And I'll avoid the political commentary for the most part there, whether you think that's good or bad we're there, and it's increasing. GDPR, CCPA, and the other 78 privacy driven regulations across the planet have all made it very clear, we are we're there. Those buckets are wide and varied. And it's almost guaranteed that there are some regulation that if you are a contracting business, you are going to be held to even if it's just at the state and local level.

And to answer your question very directly, we dabble in all of it, we specialize in all of it, we've been looking at this data for the better part of 16 plus years now. And so we understand that extremely well. And when we originally began some 16 years ago, we were a bit focused just on personally identifiable data at the time. But today, Spirion lives at that intersection of data privacy and data security. And so to do that, we examine all of that data.

Erik: And then you have a new set of data that you mentioned earlier, which is coming from devices. And of course, we've had devices producing data for many decades. But there's really been an explosion in the number of them and the volume of data they're producing. So, this produces two challenges. One is the volume, which you mentioned. And the second is that as opposed to let's say data from a bank or an insurance company that's primarily structured somehow in databases, or at least in laptop computers that have pretty good CPUs, now, the data could be in some fairly dumb device that might not be connected all the time, and maybe has firmware that hasn't been updated in three years and so forth. So I guess there's a certain set of challenges around that. Can you share some thoughts on how the proliferation of IoT devices is now shaping the landscape of data that's produced and how you secure that data?

Gabe: Well, it's certainly shaping the public perception is moving in this regards. There's been a numbness almost that has come across most lay people over the last couple of decades as they hear story after story of a breach of data and data breaches everywhere and things of that nature. And so the public perception around IoT data and privacy data, I think is slowly starting to shift also has this notion of taking back our data and taking control of it belonging to us. I’m having some agency over our data continues to move in that direction.

The IoT world is going to be challenged in coming to the table with solutions have their own that take into account not just data security, but data privacy. If I can pick on the industry for a while it's fair to say that data security has been nothing short of an afterthought in the IoT world. And data privacy is even further behind that, at the moment, I think the biggest impact has been and will continue to be perception. There are going to be a lot of other material impacts. You're going to have more attackers shifting their models in the IoT direction and that's just a byproduct of economics.

Attackers are extremely efficient. And they're also very good at finding the largest return on their efforts. And if that means attacking treasure troves of IoT devices for medical healthcare data because it fetches more than a black market to the tune of $25 per record versus $0.25 for a banking record, they will shift their modes directly in that space. And again, that's going to change perception. And perception is going to be the thing that will drive a lot of transformative change.

Technologist like myself, while we're trying to solve this problem with technology and with the combination of techniques and technology, public perception is really going to drive how we apply that technology to these problems.

Erik: You've mentioned that public perception is starting to shift, we see a few things that are at least seem positive, maybe they're just marketing efforts without much impact. Maybe you can share some perspective here. But we see Apple, for example, making it more challenging to collect data without explicit buy-in from the user. We see some controls on the use of cookies and browsers. I think in IoT, we see like, a little bit more effort to actually build insecurity. And I mean, it's certainly still not that we'd want it to be.

But at least people seem to be making a bit of effort now, or where that wasn't the case when it was just getting your pilot out to market as quickly as possible, even maybe a few years ago. Do you see us moving in a positive direction? I suppose it's probably messy that some things are improving, and some are not. But what trajectory do you think we are on right now in terms of data just from the infrastructure to allow people to control their data? Are we moving in a positive direction at all?

Gabe: We are. We're moving in a very positive direction. So I host a podcast of my own called Privacy Please, where we talk about security and privacy issues unsurprisingly. And on the show recently, we've had two guests from two different organizations. And both these organizations allow anyone, yourself, myself, just individuals to take financial control of their data, take more financial agency over their data, and it empowers them to control who can sell their data, who can share their data, etc, and return some of that economic agency to the individuals. That's a very positive direction. There are more of these organizations coming online every day.

The realization that Facebook shouldn't be allowed to make billions of dollars off of your data, and you not share in that wealth is one that public perception is turning towards. And to answer your question, are the tools in the avenues, are those things there? Are they developing to get there? And then the two organizations that I mentioned, they're just some of the players on the scenes doing these things.

Erik: Okay, interesting. No, I was just in a meeting earlier today where there was a chemical company that also produces a lot of consumer products. And they're now building some IoT products. So things like a consumer product that would analyze your hair, and then they could customize shampoos specifically for you based on the chemical composition.

And then I raised this question, I said, okay, that's great. You have one data set from your one device, and you're producing one product for that. But maybe another company has another set of data around your hair or around something on your body, and they have another set of products. If you're able to combine that data from multiple points, you could potentially provide something much more valuable to the customer. But you get into this very challenging situation, then of deciding how do we actually share data between organizations in a way that's legal and ethical?

And on the one hand, there's a lot of potential value there: value for the business, but also potentially value for the customer. Because maybe a lot of customers would say, if you want to use my data to provide better products to me, please go ahead and do it or better pay me a little bit of money and then and then go ahead and use my data in certain ways. So there might be a lot people that be open to that, and for companies, obviously a lot of value.

But still there's a ton of technical barriers and also probably legal barriers and just practical barriers. How do people actually sign off that you can use my data in a particular way and how technically do you scale that if you're only making a little bit of money from each person so it has to be a completely automated process? Maybe the companies that you mentioned are kind of working in this space.

But do you see efforts to solve this problem of how do you monetize data? I mean, obviously, Facebook is doing a great job of that, but they control all of the data. So if you're an IoT company that has like a one data stream and you want to combine that with other datasets, how do you monetize data but doing that in an ethical way where the data usage is being properly tracked? Do you see anything kind of interesting going on in this direction right now?

Gabe: Yeah. So let me first take you back to your example of the device like analyzing here. You opened a show talking about my background as an ethical hacker and kind of growing up in the security scene. As you describe the product they were building to me, my attacker mindset immediately goes to what happens if an authoritarian government uses that data to analyze here for the presence of drugs? Like they look at the molecular structure of your hair composition to see if you're using drugs? I'm going to pick on one just because it comes to mind.

But like the President of the Philippines, for example, he's notoriously known for being very anti-drug, which we can all agree drugs are bad. What if he got his hands on that data set and then just went around cleaning house? That's a very callous way to say, just killing people because he thinks they shouldn't be using drugs. And use that extreme attackers mindset example to answer the question of how do I use that data in an ethical way?

The first thing you have to do is you have to think about the dangers that data may have. You have to formulate in your mind those different scenarios where if that data were to be in the hands of someone like the one I just described, what would be the consequences? And so with the knowledge of what the attack surface of your data looks like, who may find this data useful? Who is going to attack this? Who's going to come at this data? And also, who's going to just simply mishandle this data? Who's going to leave a laptop in a car unlock? Who's just going to email something over to themselves in plain text? Who's going to do all of those things? Who's going to do all the things with that data that is going to make it problematic?

So then you say, well, maybe one way to do that, if I need to share the data is I can package it up tightly. I can encrypt it, and then send it off to someone else. But I still have the same problem once it gets to the other location. All I've done is I've taken a very fragile thing, and I put it inside of an armored car and sent it off to another fragile location. So that doesn't solve all the problems. So you still want to transport in that car.

To answer your question very directly, there's a lot of work being done in the differential privacy space. And differential privacy has been around as a field of study for 20 years, maybe more. But it's in practice in a lot of technologies right now, and has been making a lot of gains. I think of companies like Tonic AI out in the San Francisco area that have a platform that allow you to take a data set like the one you just described, remove the elements from it that might identify the individuals, and allow those two companies to share that data, allow them to share that data so they can explore these new possibilities without compromising the integrity and privacy of the individuals that that data belongs to.

Snowflake has a similar data plane that allows customers to share those. I think Microsoft is working on a similar platform as well that allows companies to share that type of data and get those outcomes. As a consumer, as someone who buys things is someone who exist in this material world of ours, I appreciate when an ad is well targeted.

I happen to be a customer of one particular company and one of their products. And I've left their brand twice and twice through advertising, I found my way back. I sat there one night on Twitter years ago, and it's like, oh, look at that, that actually looks like something I want and need. So when you tailored that to me, I am grateful for that. I don't want an ad targeted to me for products, I will never even consider because then the ad is truly invasive in my space. But if it's something that might actually enrich my life, then I want that. But I'm not willing to trade my privacy for that.

And so we have to first think about the implications of the datasets. And that's why, at the top of the show, I wanted to make it very human the conversation about data. It's not just random amorphous intellectual property in some cases. That data has a very real impact on you and I, the technology that you were just referencing, to examine hair follicles, that intellectual property, I'm sure is worth billions of dollars to that company. But the data that it's going to generate, that's me, that’s you.

Erik: Yeah. And that's what makes this discussion so challenging, you're dealing with two different situations you're dealing with, something that can be extremely abstract, and then you're dealing with something that's extremely concrete and meaningful for the individuals involved, and it's just two sides of the same coin here, which makes this kind of a challenging issue to deal with.

To make this a little bit also more concrete for myself and our listeners in terms of what this looks like in practice, are there 1, 2, 3 cases that you'd be able to walk us through maybe from different perspectives, different environments, different challenges, and looking very practically at what was the problem that you were facing, and how you develop the solution that allowed the company to still operate and make use of their data but do so in a way that was also respectful of the individuals who own that data?

Gabe: I got a lot of very good real world examples I spend all of my days when I'm not waxing philosophically on perils of dictators getting their hands on my hair composition. I spent a lot of time with real world practitioners. And that is say people in the business world who are going about their regular busy lives, but doing so with data and needing to, to protect it. So there are a couple of stories that stand out.

One in particular approached by an organization who really need to control how data was leaving their endpoints in particular, and it wasn't that they wanted to stop it from the endpoints, they very rightly, and I think, maybe ahead of where some organizations are understand that that free flowing of data was very necessary. Now in their case, they probably recognize that because they themselves build financial products, and to do so very quickly ahead of their competition like they must move with speed. But they needed to understand how the data was leaving their endpoints. And more importantly, what was the data that was leaving their endpoints? Like they knew it was there and even knew what some of it was, but they had no real control over this.

And so the very first challenge for that organization was understanding what might be have inside of their organization? So we sat down, and we mapped that out. We understood what it was. They had actually really done a lot of this work in identifying and laying out a data classification schema as well as some data handling practices. And so we took that and we said about…

Erik: So it sounds like they had a lot of this done. But if a company doesn't have this mapping already done, what type of effort is that? I guess it depends on the type of company. But is that something that a medium sized organization would be able to do in a couple of weeks, or is this a multiple months’ effort if you haven't done this previously?

Gabe: I find the larger the organization, the slower this process might be only because it has to be a group effort, which is to say you have to get all of the stakeholders at the table to understand who the data owners are the data creators, who the data stewards will need to be, and then also get everyone on the same page of understanding that everyone is a data handler.

At the larger organization level, you could go about the task of creating a data classification schema in a matter of weeks, if you got outside help. For example, if you contract an organization like ours, you could do that in a matter of weeks. If you did this on your own, which someone that at least had a base understanding of what a data classification schema looked like or did some research and came up with it, you could accomplish this again, in a matter of 12 weeks, like three months. That would be reasonable.

I'm of course, they're isolating a lot of factors that could impact that some businesses move a lot more nimbly than others, and some get mired in politics and bureaucracies. But if I did this in a vacuum, where it's just the people going about the task, it's not so much herculean, it's that as the larger the business, the more stakeholders you have, the more data creators you have; the more data owners you have, the more data stewards you have, etc. And you need to make sure that you've got input from all of those to understand where their concerns are, what data they have, and transact and work with. You need to know what the marketing team is doing.

So are they collecting customer data that includes just emails and phone numbers? Or is it more than that? You need to understand what it is. What the [inaudible 39:33] is doing, let's say that you're in the transportation business, you need to understand what information you have on customers. Do you keep very detailed logs of their travel activity, where they go, where they come, etc? That's also again, very important to know because those are different data types that will need to go fine but there are also different data types that need to be treated differently, and different data types that will need to be shared and secured differently, etc.

So I would expect that at this smallest level, you can do this pretty quickly. And at the grand scales, it'll take you a few months but it shouldn't take you much more than. If it takes you more than that, you might have other organizational challenges that need addressing as well.

Erik: So you do this mapping, and then what's the next step?

Gabe: The immediate next step was now that we know what all of the data is, and we understand the sensitivity of all this data, we know that we have a development department that creates products that test human here. And so they've got a bunch of trial patients, so they've got very specific data on those individuals. And we've got a marketing department and they collect a bunch of other data: continuing the same company, not picking on anyone there.

And then we say, alright, here are the handling policies for that data. For that trial data that we collected on customers, no one's allowed to handle that outside the company. That is company confidential and private, restricted information. And it's also sensitive data that personally identifiable information, and maybe even some my health information is certainly bio information. And so we're going to keep that under strict locking key also.

But this other set of data over here, all of these press releases, etc, those are all now public data. So we're going to go ahead and classify those as being public. And for all public documents, the handling policy is that anyone's allowed to share it and then we're going to clarify some of the things is restricted, where it's maybe an internal memo, it's not sensitive data but really no one outside the company should be seeing this, so just restrict data. And then we're going to dictate that handling policy of.

For restricted data, you're not allowed to share this with anyone outside of the company but you can share freely inside the company. So alongside that classic schema, you have to have data handling policies. How is this data allowed to be used? And again, that's why you have to have the stakeholders at the table. Because if you say things like that clinical trial data is not allowed to be shared with anyone, then there's someone else at table that says, wait a second, we're currently working with partner down the block to come up with this other great product because we overheard a podcast where we got this great idea that if our two companies work together we create even better product. And so we're doing that. But now you're telling me my data handling policy won't allow for that. So, that's going to cost the business another $10 million next year because we could create this new product.

And that's where you have to have everyone at the table because you need that input. You need to understand, okay, you will need to be able to share that data. So here are some additional handling policies for sharing that data. You will need to de-identify that data such that it is not re-identifiable with X number of tool box. So you got to get very prescriptive at that point, once you start getting more permissive with how the data will be handled.

And so then you have to have this data handling policy in place. And so we sat down and help them work out data handling policies, etc. Or we reviewed it with them, I should say. Again, they've done much of that work. And where we really came in was the hard work of operationalizing that.

So now that there was a data classification scheme in place that the organization knew what all their data was, that they collected internally, and they generated, and they knew what classification they wanted to give it all, some of it restricted, some of it internal, etc, they then had already determined how it needed to be handled. Now we came in to help them operationalize it.

We took a data classification policy, we said, alright, these are the 10 different data types that you have, etc. And we scoured their environment for that data. And then we labeled it with their same classification scheme as the stuff that they said was confidential, insensitive, we labeled and tag comments on the stuff they said was public we did, so restricted was it. And so then we started operationalizing that where all of our internal restricted vision isn't allowed to be mailed. So we made sure that their mail-blocking technologies knew how to read and pick up our restricted tags so that those things wouldn't happen.

And then they also had policies. They had similar customer data, but not the same type. They [inaudible 44:04] manufacturing world, but they had similar sensitive customer data. And that information wasn't allowed to be shared with third parties. And so we automated some restrictions on where that data could be stored, so that if someone were to try and save it to a location that that was shared, it would automatically be moved from there, etc.

We continue that exercise until all the things were in place. And after the course of a few months, and tuning and improving that, we were able to give them this knowledge of how their data was leaving their endpoints in particular, but equally, the comfort of knowing okay, it's leaving, we want can you leave? But we also now know what it is, where it is, and how it's protected. So it leaving and moving around freely is no longer the problem. Now, we found that we've identified and we're protecting it operationally.

Erik: So there's a point that you mentioned that I want to return to which is this issue of organizational alignment. So I can think of two different scenarios here. One is the top-down corporate scenario where corporate saying, okay, we want to map the data in our organization wide, and then come up with policies and frameworks around that, and there's a lot of alignment issues there. And the other option could be a product manager, or the R&D department or the marketing department, so some function or maybe the general manager of a country, like China says, okay, we want to control the China data.

So this could be also done by some business unit or some function or some product group within the organization. Then you would have a simpler situation, less complexity in terms of alignment. But also, you could come into kind of a proliferation of different approaches taken by different parts of the organization. You can share on the one hand, is there a best practice one or the other? And then, in reality, what do you see as being the more common way for companies to approach data? Is it more top-down, or is it more bottom up from different units within the organization?

Gabe: Companies tend to take different approaches. From a best practice standpoint, strictly speaking, there's very prescriptive ways that you can go about this. So NIST, the National Institute of Science and Technology, they're US based organization, but US-centric, so they have a data privacy framework, and they have a cybersecurity framework. And those certainly serve as very good best practices for how you can approach this.

The question of whether you approach it from the top-down or the bottom-up, I think is best left to the type of organization you might be. Some companies operate a bit more nimbly than others, in which case, taking the top-down approach, I think, works well for those nimble organizations versus the bottom-up, which has to be quite a bit more tactical. That works well if maybe you're not quite so nimble. If you are nimble, then you can get through that exercise faster. There's an argument to be made there for sure.

So there isn't a one size fits all. But I certainly would start by looking at things like NIST privacy framework as a way to approach this, as a very structured framework for whether you come at this from the top-down or the bottom-up. Although I should say that NIST does kind of approach it from both levels that is split into multiple different sections. Some of them are very policy-focused. Some of them are very technology focused. And so it does come at it from both ends.

Erik: So, Gabe, what are we missing here? I think we've covered a fair amount of territory already. Any critical angles that we haven't touched on yet here?

Gabe: Critical angles, I think we've touched on it. But I think the thing I'd reiterate it's the acting now. If we're not doing something about it now, if you're not doing something about it now, if everyone's not doing something about getting a handle on their sensitive data now, the problem will only get worse. The best time to be done was yesterday. The second best time is right now. That's how that saying goes.

And so I would say the same thing is true for the IoT world. If you are engineering products and not thinking about the privacy and security of the individual's data that you have, no systems that you're generating, the best time to do that certainly was yesterday. And the only better time than that is right now. Where is that data? Can you keep up with the scale at which you're generating that sensitive data to also find it protect it? And if not, why aren't you acting now?

Erik: I was talking with a company a few months ago, which they developed some technology, I forget exactly what, but it was like for keys for anonymizing data or something on devices. And they did a study of for their customers, the cost of building security into the product versus the cost of slapping security on after the product has been launched to market. And it was something like a 12x increase in cost for security if you try to do it after the product is already out in the market versus if you build it in during the design process.

So certainly, if you think your product might succeed, then build it in from the beginning. Don't get it out to market and then try to handle that situation once it scales because that's going to be very, very messy. Gabe, what's the best way for people to reach out to you or to the organization at large if they're interested in following up on this conversation?

Gabe: Yeah, absolutely. So you can reach out to me directly. You can find me online and on LinkedIn, Gabriel Gumbs. I'm also on Twitter at Gabrielgumbs. You can find Spirion online as well too in all those locations, Spirion on LinkedIn. We are at www.spirion.com, stop on by and grab some literature and check out all the great things that we do to secure a company’s data and protect the personal data of people everywhere.

You can also hop on by and drop in on our podcast as well too. So every Wednesday, Cameron Ivey, and myself, we launched a new episode of Privacy Please, you can join the conversation there. And you can email us and interact with us online and all those channels as well too.

Erik: Privacy Please, very cool, I'll check that out. Gabe, thank you so much for the time today.

Gabe: Thank you. The pleasure was mine.

Erik: Thanks for tuning in to another edition of the industrial IoT spotlight. Don't forget to follow us on Twitter at IotoneHQ, and to check out our database of case studies on IoTONE.com. If you have unique insight or a project deployment story to share, we'd love to feature you on a future edition. Write us at erik.walenza@IoTone.com.

Transcript.

Contact us

Let's talk!