IoT Spotlight - EP066 - Event streaming architectures enabling IoT applications beyond messaging - Kai Waehner, Enterprise Architect, Confluent

Podcasts > Technology > Ep. 066 - Event streaming architectures enabling IoT applications beyond messaging

Ep. 066

Event streaming architectures enabling IoT applications beyond messaging

Kai Waehner, Enterprise Architect, Confluent

Tuesday, July 14, 2020

In this episode, we discuss event streaming technologies, hybrid edge-cloud strategies, and real time machine learning infrastructure. We also apply these technologies to Audi, Bosch, and Eon.

Kai Waehner is an Enterprise Architect and Global Field Engineer at Confluent. Kai’s main area of expertise lies within the fields of Big Data Analytics, Machine Learning, Hybrid Cloud Architectures, Event Stream Processing and Internet of Things. References: www.kai-waehner.de

Confluent, founded by the original creators of Apache Kafka®, pioneered the enterprise-ready event streaming platform. To learn more, please visit www.confluent.io. Download Confluent Platform and Confluent Cloud at www.confluent.io/download.

_________

Automated Transcript

[Intro]

Welcome to the industrial IOT spotlight your number one spot for insight from industrial IOT thought leaders who are transforming businesses today with your host Erik Walenza.

Welcome back to the industrial IOT spotlight podcast. I'm your host Erik Walenza, CEO of IOT one. And our guest today will be Kai Vernor, enterprise architect and global field engineer with confluent. Confluent is an enterprise event streaming platform built by the original creators of Apache Kafka for analyzing high data volumes. In real time in this talk, we discussed event streaming at the edge and in the cloud and why hybrid deployments are typically the best solution. We also explored how to monitor machine learning infrastructure in real time. And we discuss case studies from ADI Bosch and Ian, if you find these conversations valuable, please leave us a comment and a five star review. And if you'd like to share your company's story or recommend a speaker, please email us at team@iotone.com. Thank you.

[Erik]

Kai. Thank you so much for joining me today.

[Kai]

Thanks for having me, Erik. Great to be here.

[Erik]

So Kai, before we kick off the discussion here is going to be a little bit more technical than usual, which I'm looking forward to. But before we get into the details, I want to learn a little bit more where you're coming from. I think you've, you've had some interesting roles. You're currently an enterprise architect and global field engineer. So I'd actually like to learn what exactly that means. And then previously you were technology evangelist with both your, your current company conflict, but also with Tipco software. So I also want to understand a bit more about what does that actually mean in terms of how you engage with companies, but can you just give a quick brief on, on what is it that you do with conflict?

[Kai]

Yeah, sure. So I'm actually working in an overlay role, so that means I speak to really hundred, hundred 50 customers a year. And if there is no travel band and really I travel all over the world and IOT and industrial IOT is a big topic of that. And I talked to these customers really to solve their problem. So it's really, while it's technology under the hood, we try to solve problems. Otherwise there is no business value out of that. And I think that's what we will also discuss today. And therefore what I really do is I analyze the scenarios that our customers have challenges and problems and how we can help them with even screaming. That's what we are going to talk about today and my history and my background is really, I've worked for different integration vendors in the past. And therefore this is also very similar to what I do today with and with event streaming. The key challenge typically is to integrate with many different systems and technologies. This is machines and real time sensors. And so on only one side, but also the traditional enterprise software systems, both for IOT like an ERP System, but also a customer relationship management or big data analytics on the other side. And that's really where I see the overview of these architectures and how event streaming fits into.

[Erik]

Okay, so you have kind of a, a technical business interface role where you're, you're trying to understand the problem and then determine what architecture might be right to support that.

[Kai]

So I'm really exactly in this middle point, I taught both. And even to the executive level, but then also to the engineers on the other side, which need to employ.

[Erik]

During those initial conversations, how much do you get into kind of the completely nontechnical technical topics about how an end user might potentially put in bad data, or, you know, these, these kinds of they're almost HR topics or topics related to, let's say the completely human aspect of how a solution might be is, do you get into those early on, or is it more, once you get into implementation, you figure out what, what those other challenges would be and you address them as you go?

[Kai]

No, it's really early stage. So, I mean, we talk to our customers on different levels. It's both on the business side and on the technical side. So, and before we really have something like a pilot project or proof of concept, and we really already talked to many different people from every level from very technical, but also to management and so on to understand the problem. So we plan this ahead of time. So it's not just about the technology and how to integrate to machines and software, but really how to process data. And what's the value out of that.

[Erik]

And then do you have a very specific vertical focus or are you quite horizontal in terms of the industries that you cover

[Kai]

We are not industry related. So event streaming to continuously process data is used in any industry. However, having said that with the nature of what machines are in industrial IOT is with producing continuous sensor data all the time and also of the big data and more and more of that with that, of course, industrial IOT is one of the biggest industries, but it's really not related to that. So we also working with banks, insurance companies on telcos in the end, they have very different use cases, but under the hood from a technology perspective, it's often very similar.

[Erik]

Yeah. One of the issues that's both interesting, but I suppose also challenging is that there's almost an infinite variety of things that you could analyze in the real world. Right. I suppose there's also some kind of 80 20 rule. Is it the case where you see, like there's a short list of five or 10 use cases that constitute 80% of the work you do, or is it actually much more varied than that?

[Kai]

It really varies, and it depends on how you use it and that's what we will discuss later today. But in some use cases, really all the data's processed for analytics, for example, the traditional use cases like predictive maintenance or quality assurance, but as more and more of these industrial solutions propose so much data. Sometimes the use case is more technical so that you just deploy the solution at the edge and the factory to pre-filter because it's so much data and you don't process all of that. And therefore the event streaming is getting the sensor data pre-filters and preprocessing, and then in chest, maybe 10% of that into an analytics tool for more use cases. So it's really many different use cases, but in the end, typically it's really to get some kind of value out of the daytime. I think that's really the key challenge today that most of these factories and plants, they produce more and more data, but people cannot use it today. And that's where we typically help to connect these different systems.

[Erik]

And I know confluent is, is it right to say that it's built on Apache Kafka or that's the solution that you use? Can you just describe to everybody, what is Apache Kafka?

[Kai]

That's a good point. And that's also explains how this is related. So Apache Kafka was created at LinkedIn. So the tech company in the U S around 10 years ago, they built this technology because there was nothing else on the market, which could process big data sets in real time. So we have integration middleware for 20 years on the one side for big data. And we have real time messaging systems for 20 years. But we didn't have technologies which could combine both, and that's what LinkedIn built 10 years ago. And then after they had it in production, they opened sources. And this is exactly what Apache Kafka is. So it can continuously process really millions of data sets per second at scale reliably. Then when they open sourced it. And the first few years only the other tech companies used it like Netflix or Uber or eBay.

And however, because there was nothing else on the market and there was a need for this kind of data processing all over the world in all industries. So they really, most of the fortune 2000 was a Patrick half kind of different projects. And with that in mind, five years ago, confluence was created by the founders, which were the creators of a Petrik half cuddling. So they got venture capital from LinkedIn and from some Silicon Valley investors and found that confluent with the idea of making CAFCA production ready, which means the tech times often can run things by themselves, but conference really helps to improve Kafka and build an ecosystem and tooling around it. But of course also had the services and support so that the traditional company, I always say also can run mission critical workloads with Kafka because they need help from a software vendor.

[Erik]

Okay. Very interesting. So this a little bit of the red hat business model, right, is like building enterprise solutions on top of open source software. It seems like that's, yeah, that's kind of a trend, right? Cause I guess an open source has a lot of benefits in terms of being able to debug and so forth, but at some point, right, people don't want to, they don't want to figure it out for themselves. They need a service provider.

[Kai]

Yes. And that's exactly how it works. So it's exactly like red hat. And the idea is really that everybody can use Kafka and, and many people use it even mission critical without any other vendor because they have to expertise by themselves. And on the other side, also these tech companies like LinkedIn, they also contribute to this framework because it's an open framework. So everybody contributes and can leverage it. And that's exactly also what we are doing. So we are doing most of the contributions to Kafka. So we many, many full time committers just for this project. But then in addition to that in the real world, like in industrial IOT, you also get questions for example, about compliance and security and operations, 24 seven and guarantees. And there's a place for that. And this is where the traditional companies like in, in industrial IOT have simply different requirements for something then a tech company, which runs everything in the cloud. And this is exactly where conflict comes in to provide not just a framework and support, but also the tooling and expertise so that you can deploy it regarding the USLS and your environment, which can be anywhere just in a factory or hybrid or in the cloud.

[Erik]

Okay. Very interesting. Well, let's get into the topic then a little bit here. So maybe a starting point is just the question, what is event streaming? So we have a lot of different terminologies around analytics, I guess people use realtime analytics a lot. And I think you also use the terminology on your, on your website to some extent, but how would you compare real time analytics to event streaming or what,

[Kai]

Just those two, two terms that's really very important because there are so many terms which are overlapping and often different vendors and projects use the same bird for different things. So this is really one of the key lessons learned in all my customer meetings define the terms in the beginning. And so I explain when I talk about events, streaming is really to continuously process data. That's the short version. So this means some data sources produce data, and this can be sensors for real time data, but this can also be a mobile, a very, you get a request from a click from the user button. So it's an event which is created and then you consume these events and then you continuously process them. And that's mainly the main idea. And other terms for this is realtime analytics or stream processing or streaming analytics. But the real important point is that it's not just messaging because that's what I really sometimes get upset when people say half guys, a messaging framework. And that's really the key point here. It's not, yes, you can send data from a, to B, with Kafka and people use it for that a lot, but it's much more because you can also process the data and you can build stateless and stateful applications with Apache Kafka. And that's really the key difference. And so in summary, half cars built continuously integrate with different systems, realtime batch and other communication signals and process the data in real time at scale, highly reliable. And that's in the end. What I mean with event streaming.

[Erik]

Okay, great. That's, that's very clear. And then there's another term, which is maybe not as common, but event driven architecture. Are you familiar with this? Would you say that's another thing that overlaps heavily with event streaming or is it a particular flavor or what would be the difference there?

[Kai]

Yeah, so it totally overlaps. So event streaming is more a concert and event driven architecture as the name says, is the architecture behind it. But how that works in the end is that you really think about events and that even can be a tender thing, like a lot, even from a machine or it can be a customer interaction from the user interface and all of these things are events and then you process them even phase. And this is really also key of this foundation. And that's definitely important to understand, because no matter if you come more from a software business or more from the industrial IOT and OT business in the past 20 years, typically your stored information in that database. So in the beginning it was like an Oracle database or file system. And you talk more about big data analytics or cloud services, but the big point here is you always start the data in a database and then it is addressed and you wait until someone consumes it with [inaudible] or with another client. And this is really more or less a too late architecture for many use cases. And what events, streaming and event driven architectures do. They allow you to consume the data while it's in motion while it's hot. And this is especially relevant for industrial IOT, where you want to continuously process and monitor and act on sensor data and other interactions. And this is really the key foundation and difference from event-based architectures. So traditional architectures with databases and vet services and all these other technologies, you know, from the past.

[Erik]

And then that maybe brings us to the, let's say the first deep dive topic of the conversation, which is event streaming at the edge versus hybrid versus cloud deployments. Because, you know, you just mentioned the certainly there's unique requirements around, for example, an autonomous vehicle, right, where a 10th of a second can be quite impactful. And in the real world, my assumption is, well, obviously you can, you can deploy this across, but of course it was initially developed for primarily cloud deployment. So I assume that the edge deployments are significantly more challenging just given the architecture of limited compute capacity and so forth. How do you evaluate deployments across these that say edge, edge cloud and then hybrid options?

[Kai]

Yeah, so that's a very important discussion. And so actually in the beginning yes, Kafka was designed for the cloud because I'm LinkedIn build it and LinkedIn, and that's the big advantage of all these tech companies. So they build new services completely in the cloud, and most of them just focus on information, right? So this is not a physical thing like in industrial IOT and therefore it's very different. But even at that time, cloud 10 years ago was very different than today. So even at that time, you had to spin up your machines in the cloud, like on AWS, you spin up a Linux instance, for example, and therefore it's not that different from on premise deployment. And with that in mind today, of course you have all the options. I mean, confluent only one side, we have contraband cloud, which is a fully managed service in the cloud, but you only use in a serverless way, so you don't manage it.

You just use it. However, having said that 90% or so of CAFCA today are self managed and they're not just mentioning cloud, but really on premise either in the data centers or at the edge. And this is especially for industrial IOT where you want and need to do edge processing also directly in the factory and with that all in mind. So there's all these different deployment options. We have use cases and just edge analytics and processing and a factory for use cases like quality assurance in real time. But then also we see in industrial IOT, many hybrid use cases where on the one side you do edge processing. As I mentioned before, either just for preprocessing and filtering, or maybe even building business applications at the edge, but then you also replicate data for another data center or the cloud for doing the analytics. And this is really all very complimentary. And especially in industrial IOT, it's really a, comment's a use case we'll have hybrid architecture because you need edge processing for some things. This is not just for latency, but this is also for costs. People often learn the hard way, how expensive it is to ingest all the data to the cloud and process it there, especially if you really want to see all the sensor data before you deleted it again. And therefore these hybrid use cases are the most common deployments we see in industrial IOT.

[Erik]

Yeah. I was actually a lets see. Was it last week or two weeks ago? I was on the, on the line with the CTO of Foghorn. Are you familiar with the company Foghorn?

[Kai]

I even listened to.

[Erik]

Oh, okay. Okay, great. And so I guess, you know, one of the things that they were emphasizing was the, let's say the challenge of doing a machine learning at the edge, right. Just do the compute power there. How do you view this? Or let's say when you're in a conversation with a client and the clients, it kind of discussing their business requirements, how do you assess what is actually possible to do at the edge? And, and then, you know, where at the edge are we talking about? You know, actually, I mean, let's say at the sensor, right, which is maybe very limited compute or at the gateway or at the, at the local server, how do you kind of drive that conversation to understand what is possible from a technical perspective based on their business?

[Kai]

Yeah, that's a good question. And this discussion of cross always has to be done. And therefore we really start this also from the business perspective, what's your problem and what we want to solve. And then we can dive in dive deep into what might be a possible solution, or maybe there are different options for you. I'm not just one thing that you have to do, but more like if you, if you do want to do predictions with machine learning and AI and all this, these passwords, and typically it's a separation between model training, which means taking a look at this historical data to find insights and patterns, and then deploying this model somewhere for doing predictions. And this is the most common scenario. We see that this is separated from each other. And therefore only one side you ingest typically at the center of better into a bigger data Lake or store where you want to do the training to find insights.

[Erik]

And this can be in a bigger data center, right? You need more compute power. And this often than is in the cloud, and this is the one part, but, and until you need really the more infrastructure, so you cannot do this often, shouldn't do this directly at HD wise, which is smaller, but when you have done the training in a bigger, with more compute power, then the model scoring or the predictions, this really depends on the use case, but this can be deployed much closer to the edge. And here, when we see different scenarios, depending on what the use cases, but you can either do also the predictions in the cloud or in the data center, or even ambit and bet this model into our lightweight application. So just from a technology perspective, I do have an understanding the model training has done, for example, in a big data Lake like tube or spark or cloud machine learning services, there's many options they're meant for the model deployment.

[Kai]

And this can also either be a Java application for example, and be really scalable in the distributed system or on the other side, you can also use, for example, the C or C plus plus with a Kafka client from confluence and deploy this really at the edge, like in a microcontroller, if it's very lightweight. And this of course, then also depends on the machine learning technologies you use, but most modern frameworks also have options here, like to give you one example, we see a lot of demand for TensorFlow, which is one of these cutting edge, deep learning frameworks, and which was released by Google. And here you also have different options. You can train a model and deploy it and it's then too big and it really has to be deployed in a data center or on the other side, you can use TensorFlow Lite and export it. Then for example, Rhonda model, like in a mobile client with Java script, or really in an embedded device with C and therefore you have all these options. And it depends on the use.

[Erik]

And I guess right now we have, let's say from a, just a fundamental technology perspective, we have trends that are kind of moving in both directions that are making it a little bit easier maybe to do compute on, let's say both levels of the architecture. So you have improving hardware at the edge, you know, so greater, greater compute power at the edge. You also have potentially 5g making it. Maybe people would disagree with this, but potentially making it more cost effective also to move data to the cloud, or if not more cost effective, at least, you know, better latency and bandwidth to move data to the cloud, which would, which would allow you to do more of those kind of real real time solutions without connecting to the edge. Do you see any trend based on just the underlying technology development dynamics that would drive us towards doing more work at the edge or more work in the cloud? I mean, obviously it's still going to be a hybrid, but do you see a direction one way or the other?

[Kai]

Actually, no, because it really depends on the use case. And also it's important really to define the stone terms like real time then, right? Because there's different terms on what that means, but in general, really, I can give you one example of where it will always be in this mixed state. So if you have different plants all over the world, on the one side you want to do real time analytics like predictive maintenance or like quality assurance, that's things that should happen at the edge. It doesn't make sense to replicate all this data to the cloud, to do the processing for latency and for cost even the five chins alone, it's always more expensive to first send it somewhere else and then get it back. And this is expensive from a cost and latency perspective. And so you want to do this analytics at the edge of victories.

However, having said that in this case model training or for doing other reports or for integrating with other systems, or for correlating between data from different plants to answer questions like we have one plant in China and one in Europe. So why is the same plant in China, much more problematic. And then you have to correlate information to find out what's the different temperature spikes and what's the different environments. And for this, then this doesn't make sense at the edge because you need to aggregate data from different insects, different regions and hear them typically the cloud is the key trend for this, because here you can elastically scale up and down and integrate with new interfaces. And for this, you want to do that in the cloud to replicate data in from many different other systems. So it's really I think the trend is that maybe two, three years ago, every time I talked about getting everything into the cloud and even the cloud providers of course wanted to do that.

But now the trend is to do it more in a hybrid way that it's both cloud for some use cases, but also edge for some others. And the best proof for this is if you take a look at the big cloud providers. So if you take a look at Amazon, Microsoft, Google, Alibaba, they all started with the story and just everything into the cloud and to all our IOT analytics they are. But today all of these vendors also release more and more edge processing tools because it simply makes sense to have some things that yet.

[Erik]

Okay. Okay, great. Then that's actually a good transition to the next topic here, which is event streaming for real time integration at scale. What type of integration are we talking about? You know, we're talking about integrating data. Are we talking about integrating systems?

[Kai]

That's a good question. That actually it can be both. So first of all, also to clarify here, Kafka or this conflict, doesn't whatnot. So what, what Casper really is it's about event streaming and that includes also integration and processing of data, but typically, especially in industrial IOT environments, but also compliments our solution. So if you're in a plant and want to integrate all these machines, or even directly to PLCs, you have different options and you can do direct integration to a PLC. So something like a Siemens seven or two mode bus, or you use a specific tool for that to give you one specific example. I see a lot in Germany, you, of course, people use a lot of Siemens, so they have Siemens as three as five or seven PLCs. And therefore you could use an IOT solution like Siemens MindSphere, which was exactly built for this integration for these kinds of machines.

On the other side, this is probably not the best solution also to integrate with the rest of the world, which means your customer relationship management system and with other databases and data lakes or cloud service. And therefore in most cases in industrial IOT customer really compliments other IOT platforms here. So it's more about the data integration and not so much about the direct system integration, but having said that you can do this. So we have customers which directly integrate to PLCs and, and machines. And on the other side, also dietary integrate and to any essence ERP systems like SAP, for example, this is always what you have to discuss in more deep dive. So there is all these options, and that's a great thing about caftan by people use it because it's open and flexible and you can combine it with other system. It's not a question one or the other.

And one last side note here, what might be also interesting for the listeners is that the modern European and the SSM and all of these tools, many of them also run cash under the hood because also the software vendors or these enterprise vendors, they have understood the value of CAFCA. And therefore also build their systems on that because these systems also have the same needs. So the legacy approach of storing everything in a database with web services, like rest or soap web services, this is not working for this new data sets, which are more realtime and bigger. And that's the earliest approach we see everywhere,

[Erik]

I guess, at the, at the it level integration is typically quite feasible at the OT level. At least my understanding is that we still have some challenges around kind of data silos that companies put up in order to protect market share. Do you see any trend here in terms of opening up the OT level to make integration say across vendors easier or, or let me ask when you're looking at a deployment, how significant of a challenge is this? Is it something where you can always find a solution? It's a, it's just a matter of putting in a bit of extra time or, or is it a significant challenge?

[Kai]

It's definitely one of the biggest challenges. And that's why people want to do this. As I said in the beginning today, when we talk to customers too big to not have access to that data because of the proprietary, because it's not accessible so far newer infrastructures that the vendors in the end are forced to use standards like OPC UA or [inaudible], they don't want to do that, but otherwise I'm, the customers would really get in trouble. And so they, the software vendors have to go into this direction a little bit on the other side. Also, as I said, there is technologies to directly integrate with PLCs too. For example, if you want to get a quick man. So if you want to see, I have all these machines in my plant, and I just want to get data out of it to monitor it, to get reports. And then you can also connect to the PLCs. So something like a Siemens seven. So having said this, this is definitely the biggest challenge to get all this data out. And however, this is also often why people then come to us because they say it's okay to me to do it to the last mile with a proprietary solution, like Siemens MindSphere but we are a global vendor all over the world, many different technologies. We cannot use every proprietary vendor everywhere because that's executive status seals and what CAFCA is. So it makes us strong as that on the one side you can integrate with all the systems, but you also decouple all of them from each other. So this means on the one side, you might have some Siemens on the other side, you might have some GE or whatever. And on the other side, you have direct integrations. You can integrate with all of that, and then also correlate all these different information systems and also combine it if your MES of your European system, or if your data Lake, and this is what makes Kafka so strong, so that it's open and flexible how you integrate it and what you use for the integration either directly or with a complimentary other tool. And this is why we see CAFCA used in IOT, but also in general for these use cases, because you can integrate with everything, but still you're open flexible.

[Erik]

Yeah, I suppose that's the, the real value of open source here is that you have a, a large community that's problem solving and sharing, sharing the learnings, right. Which you don't have with the next topic. And we've already touched on this a little bit, but is the machine learning element here, and we've already discussed, you know, model training in a data Lake that might be better stored on the cloud and so forth. But maybe the interesting topic here is when you're implementing machine learning and you're kind of segmenting this between different areas of your architecture, how do you view, let's say the future of machine learning for live data.

[Kai]

Yeah, that's a very good question. And that's really often why we talk to people about that because what we clearly see, and this is true for any industry is that there is an impedance mismatch between the data science teams, which wants to analyze daytime build models and do predictions and between the operations teams, which can either be in the cloud or which can be in a factory very, really deployed at scale. So you have seen, I've seen too many customers where they even got all the data out of the machines into the cloud. And so the data scientists could build great models, but then they could not deploy it in production anymore. And therefore you always have to think about this from the beginning. How, if you are a data science people access to all the historical data, but then also before you even start up with this, you need to think about what's my SLA is for later deployment of this, does it have to be real time? What's the data sets is this for big data, for small data. What's my SLS. And in production lines, it's typically 24 seven mission critical for the ride and then configure CAFCA differently. Then when you run it just in the cloud for analytics where it's okay, if it's down for a few hours and with this in mind, why here also we see so much Kafka is because there's a huge advantages if you built this pipeline once with Kafka. So let's say you have Kafka DH to integrate with the machines. And then you also replicate the data to the cloud and analytics. They are, this pipeline with Kafka is mission critical and runs 24 seven. So Kafka is built. It's a, it's a, it's a system which handles problems. So even if a notice down, or if there is a network problem, CAFCA handles that.

So that's how it's built by nature is a distributed systems, or it's not like an active passive system or where you have maintenance downtime that doesn't exist in Kafka. And if you have them this cuffed up pipeline, you can use it for both. You can use it for the ingestion, into the data analytics cloud, where the data used up the data in historical mode, in batch for training or for interactive analyzes. But the same pipeline can then be used for production deployments because it runs mission critical. And therefore you can easily use that also then to do predictions and to do quality assurance because these applications run all the time without downtime, even in case of failure. And that's one of these key strengths. You can build one machine learning infrastructure for everything. Of course, I'm some parts of the, of the pipeline and use different technologies, but that's exactly the key.

So the data scientists will always use a pipeline client, right? So they typically do rapid prototyping with twelves, like Jupiter and psychic learn. And this is frameworks, which data scientists on the other side in production, on the production line, you typically don't deploy pipe and for different reasons, it doesn't scale. Well, it's not a robust and performance. They are, you typically deploy something like a Java or like a C plus plus application. And as CAFCA in the middle of which handles the back pressure and is also the decoupling system, you can use these different connecting technologies and the data scientists can use Python while the production engineers uses Java, but you use the same stream of data for that.

[Erik]

You get involved also in, in building a machine learning algorithms or, or are you focused just on managing the flow of the data and then the client would have some, some system that they're using to analyze it.

[Kai]

So this is really, we are building the real time infrastructure, including data processing integration. And then this is really where the data science teams, for example, and choose their own technology. But this is also to understand and point out here. This is exactly the advantage because here all these teams are unflexible and different. I actually had a customer call last week, and this is really the normal that different teams use different technologies in the past. Everybody tried to have one standard technology for this, but in the real world, one data science team uses this framework like TensorFlow. And the other one says, no, I'm using Google ML with some other services like terminal and here, because cuff kind of middle is the decoupling system. You're also flexible regarding what technology you choose. And therefore the reality is that most of our customers don't have one pipeline where you send all data from a to B, but you typically have many different consumers, and this can include analytics tools where you really have to spoil for choice depending on your problem and use.

[Erik]

Okay. Interesting. I think that the next topic we wanted to get into is, was use case. And I think that's pretty important here because understanding how this is actually deployed, but before I want to go into, you know, kind of some end to end use cases in detail, I have a bit of a tangent here, which is a question that a number of companies have asked me recently, and I don't have a good answer. So I'm hoping you have a better one and that's, are there any use cases for five G that kind of really makes sense in 20, 20, 20, 21? Or are we really, you know, and I've thought about this and, you know, I've talked to some people and it seems like maybe augmented reality for industrial makes sense cause of high bandwidth requirements and, you know, wireless solutions. And AGVs probably make sense once you make them more autonomous because you have that same situation of, you know, latency, bandwidth wireless, but there doesn't seem to be so many yet.

And kind of my hypothesis was that over time as 5g becomes deployed, maybe the OT architecture of factories will start to change. There'll be less, you know, maybe a less wires you'll have the option to then build Greenfield somewhat more wirelessly. So that might change the architecture and then people would develop solutions specifically for this, this new connectivity architecture. And then, and then you might say, okay, now it's providing real value. But you know, aside from AGVs and AR was a little bit at a loss to identify anything that is really highly practical in the near term, anything that you've come across where you said, yeah, this 5G would really solve a real problem for one of your customers.

[Erik]

I think yes, because one of the biggest problems today is definitely network and data communication because today when I go to a customer for factory, which exists for 20 years and typically the integration mode, how we get the data from these machines is something like a windows server where you'll get connected and then you'll get a CSV file with the data from the last hour, because there is no better connectivity to integration. So I definitely think that in general, by their networks, I'm allowed to implement better architectures also for OT at the edge. But having said this, I also see these discussions about 5G with different opinions. So there was of course not just five cheaper, but also for factories. There is other standards and possibilities, how to do a network there. And also what I think if 5g gets into this industrial IOT, I'm I guess the bigger factories and so on, they will build a private five G networks for that.

[Kai]

So that's also possible. And I think that's great because what I don't expect to see at least from my customer conversations is that that's what the cloud vendors want. But if you directly integrate all these 5G interfaces from the edge and with the cloud, but that's probably not going to happen and because of security and compliance and all these kinds of things, but for private 5G networks, I think this would be a huge step for more modern architectures in OT. And that also, of course, then is the building block for building more or for getting more value out of that, because today again, the biggest problem in factories is that people don't get the data from the machines to other systems to analyze.

[Erik]

Okay. Gotcha. And I guess in brownfield, do you still need some kind of, yeah, you'll need some sort of hardware to deploy on that, but at least if you use 5g, then you can extract the data wirelessly without you can always just lay either net, I suppose. Right. But then that, that becomes a

[Kai]

Yeah, exactly. I mean, that's just different options. I mean, then you just somehow need to get the data out of these machines and production lines into other systems and it can be with Ethan and it can be with five G what's the best solution depends on cost and scalability and NCO.

[Erik]

Okay, great. But sorry for taking it down that tangent let's go into some of these use cases. So you've, let's see. I actually, I won't mention any of these until you do. I don't want to throw out names, but there's a connected car infrastructure. Should we start there?

[Kai]

Yeah, that's a good first example. And that's also a good relation to the 5G question. I think, let me explain this. So I think we can cover three or four use cases here because what's, for me, what was important when I talk about event streaming is to really talk about different use cases so that people see, this is not just for one specific scenario and a connected car infrastructure. As one great example, we see in many customers and Audi is one of these examples. So the German automotive company, and we have started building with them upon active conference structure on four years ago. And so what they actually did is they had the need to integrate with all the cars driving on the streets. I started this with the eight, with one specific more luxury car, but they are now rolling it out to all the new cars and what's happening there is that all these cars connect in the end to a streaming Kafka cluster in the cloud so that you can do data correlation in real time of all that data from a use case perspective, there's a demand for things like after sales, right?

[Erik]

So did you always in communication with your customers for different reasons? Like on the one side I'm sending them an alert that their engine has some strange temperature spikes and maybe he gets to the next repair shop, but also then to keep the customer happy to do cross selling, or if you know it from Tesla, you can even upgrade your car to get more horsepower. And there's plenty of use cases. And then you can even integrate with partner systems. Like for example, if the restaurant on the chairmen auto Autobahn or you're driving, and then you do a recommendation. If you stop at lunchtime at this restaurant, then you get 20% off and these kinds of things, and you'll see the edit value here is really not just getting the data out of the car into other systems, but really correlate and use this data in real time at scale 24 seven. And that's exactly one of these use cases for confluent, what we are doing and from a technical perspective of what these cars are doing. I mean, I'm Dara of course, using the, in this case for cheat today. And this is a great example. If you have five cheat here you can do many more things because still data is a data transfer from the cars, the most limiting factor regarding cost and latency and all these things.

[Kai]

Okay. Okay. Very interesting. One of the topics, maybe we don't have to speak specifically to ADI here because this may be topics a little bit more sensitive, but once you get into these situations where you have aftermarket services, for example, of course, there's not just value for the OEM, but there's potentially value there for a lot of different companies that might also want to be selling services to this this driver, this this vehicle owner, for example. So, and this becomes then an issue of not just moving data, but also regulating who has access to data in, in which way to what, to what extent it's anonymized or not. So what metadata is available and so forth, do you get into these discussions about this gets smart into the legal privacy discussion about, well, what can we do to like monetize this data that we have?

[Kai]

Yeah. So, so actually this is all this part of the problem. I mean, especially in Europe term chairman you've, our privacy is really, really hard, right? It's not much different than in the U S for example, therefore we get industry introduced discussion all of the time and you have to be security compliant, which is part of the conversation of course. And you need to be, for example, GDPR compliant and Germany and Europe. So this is part of the problem. And this is also, I'm very need to think about this from the architecture perspective. So who has access to what data? And so, and that's also the point where for example, confluence comes into play because if open source Kafka, you would have to implement this by yourself with confluent yet, then you have things like role based access control and audit logs and all these kinds of features, which, which help you here with multitenancy and all these questions.

And with that in mind, also the brings up more problems and questions also for all these vendors, because as you said, not just Audi or let's get away from audio, but in general, an automotive company wants to get the added value, but also the tier one and tier two suppliers. And this is really a big discussion. And this is where all of these vendors today have a lot of challenges and nobody knows where it's going, but today everybody's also implementing its own connected car solution. So if you Google for that and you will find many automotive companies, many suppliers, and also many third party companies, which implement this today, and nobody knows where it's going. But today already I have seen a few automotive companies where the car is not just sending out data to one interface of one vendor, but to two or three different interfaces because everybody wants to get the data out.

And so this is really where the next years we'll definitely consolidate things and new business models will emerge. And in my personal opinion is really that the only realistic future is that these different vendors also partner more together. And that will happen because it's not just for the automotive company, but it's also for the suppliers. If you take a look at these innovations, they are, they are all working on software. If you go to some kind of conference, they are not talking about the hardware, they are talking about the software on top of that. And therefore this is really where the market is completely changing because in this automotive example, in some years, many people will not care about if it's an Audi or Mercedes or BMW, but how well it's integrated with your smartphone and with the rest of the technology. And therefore this is a complete shift into the market. And we see this at every automotive or at every IOT company today.

[Erik]

Okay. Very interesting. Yeah. This is a topic that comes up a lot with our customers who are sometimes automotive tier one, tier two suppliers, right. And then they face the challenge of getting data out, you know, from an OEM. And, you know, we produce the air filters something right now, the OEMs are never going to give us our data. Right. But we have these, we have these business cases. Right. So yeah, this is a very interesting discussion. Okay. Then the next one we are covering here is Bosch a track and trace for construction. I think track and trace is very interesting because it's, it's applicable to you know, basically anybody who is managing assets that are kind of in motion. What was the problem here and what did you do with Bosch?

[Kai]

So, so that's another great problems in India and just also clarifies the different use cases. The first one was getting all the attended cloud for analytics and using the data. This is normal hybrid one. Did the I don't interesting party because before I talk more about the use cases that here you see, and this is also not all real time data or big data. So in this use case, it's really about smaller data sets and about also request response communication and not just streaming data. The use case here is that Bosch has several different construction areas, but they use together with their partners and where they build new buildings, for example. And then you have a lot of devices that are in machines and only one side, of course, and the new devices and machines have sensors, which continuously give updates to the, to the backend system.

But then also they had many different problems and use cases here. Like the workers in the construction area didn't know where the machine or the devices, or when to do maintenance for the device and replace batteries or other things. And therefore in this case, it's really a track and trace system and where you monitor all the information from all the systems. And actually it's not just a machines and devices, but also track and trace information about our customer. So whenever a worker has finished something, he uses his mobile app. And in this case, so it's not streaming data about, he does a button click and then it's even, but just sent to the backend and the data is stored there and correlated and this way, Porsche chief solution, so that they really know all the right information in the right context for each contract construction area.

And just as important for the edge in the end, which is the construction area. But then in the backend, of course, this is also important for the management and for monitoring all the different projects. And also all this data goes to analytics tools because the data science team takes a look at all the construction areas and what's going on and how to improve the products or the services they offer for the new products they build. And if this solution has deployed it also to the cloud, so that they can integrate with all these different edge systems and store information and correlate it, and here, it's also important I'm in this use case that they don't just continuously process the data, but they also start the data in Kafka so that you can also consume old events. And this is a part we didn't discuss yet, but this is important.

So in Kafka or even streaming systems, everything is upended only. So it's even based guaranteed order of flocks or events. And then you can also take a little bit old data. So the data scientist doesn't consume all data in real time like others, but they say, give me all the data from this construction area from the last few months. And then they want to correlate it with the last three months, from another core construction area. And they see maybe this construction area had some specific problems and then they can find out what the problems were. And so this is a great other use case because this is hybrid and this is not big data, and this is not only real time data. But this is still about Kafka makes so much sense for the integration and processing of these events.

[Erik]

Yeah. That's a very interesting one from a, let's say an end user perspective, right? Because you really have just, even within the construction site, you have a number of different end users that have quite different requirements around the data from the, the person looking for the tool to the maintenance team, to, to management, that's making decisions about how many, how many assets do we actually need and so forth. And I suppose, yeah, again, coming back, you don't get into, you're laying down the architecture. So you cover what architecture is necessary for this, but you're not going to be advising them on these individual use cases. Is that correct? Or do you ever get involved in advising what use cases might make sense or helping to?

[Kai]

That's why we also have, I mean, because we have the experience from all the other customers. So we also do them under consulting and the help of the engagement and approach. We are not doing the project itself. So this is typically what a partner does or what they do by themselves. And we help really with the even streaming part and the infrastructure, but only from a perspective of calf, because we are not doing a bulk project on that. And that's maybe also important. So as I said before, really, even streaming is not competitive, but really complimentary, also total solution like for the management team, which has some MBI tools in the back end. So this is not CAFA, right? So this is where you connect your traditional BI tool like Tableau or power BI or a click all of these vendors and connect the two parts of the data. So this is really complimentary.

[Erik]

Okay. Okay, great. Yeah. We were working with a European construction company about a month ago on a track and trace, well, we were, we were surveying how in China, they are there. I'm able to ramp up operations at construction sites by tracking where people are. And, and, you know, our, our people grouping together are people wearing masks, et cetera. So it's kind of a track and trace for people and it's been okay, extremely effective in China. And then the question is, how do we translate this to the European market where this would probably all be highly illegal? And then the last one that we wanted to look into was energy, an energy, a distribution network for smart home and smart grid. So this, yeah. Completely different set of problems. What, what was the background with this case?

[Kai]

Yeah, so, so this is one example is Aeon, which is an energy provider. And these kinds of companies also have a completely changing business model. And that's often actually where Kafka comes into play to really reinvent a company. Often the problem for them is that in the past, they only produce their own energy like nuclear energy. And of course, this is obviously changing to more green energy and so on, but the business model also had to change because they can not just sell energy anymore, but they also see more and more customers or end users, which produce their own energy, like for solar energy on their houses. And often they produce more energy that they even used by. So they want to sell it. And therefore I'm for this example Ian has built on streaming IOT platform, which is also hybrid or some of the analytics in the cloud, but some other processing is more at the edge and what they are doing in the end.

They are no more like on distribution platform. So this means only one side, they still integrate with our own energy systems to sell their energy and do the accounting and billing and monitoring. And all of these things here, of course they have to get mentioned is it's still in real time. And even for the bigger data sets, these produced two data systems, they can handle it. But on the other side, they also know integrate directly with smart homes and smart grids and other infrastructures. So they can that they can get into the the, the system of the end user, like the customer has the smart home. And with this, they are providing now much more services. And in this case, for example, you could sell your salon, achieve to another person and they provide the platform for that. And this is really just one of the examples, or they have tens of them because these companies and energy, they have to completely change and in a way, their business models, and this is fair, CAFCA helps.

So good because again, only one side it's realtime data. So you can scale this and process data continuously, but on the other side, it also decouples the systems again. So the smart home system is completely decoupled from the AI stuff. Sometimes it sends a new update, like a sensor information to the system so that the system knows, Hey, this house has produced a lot of energy. Now we can sell it. And so please distribute it somehow. And this is again where many different characteristics come into play. So it's only one side hybrid, very do analytics in the cloud, and then also agentic ration. But on the other side, also, this is really a mission critical system. This has to run 24 seven. So it's distributed over different geo locations. And with this infrastructure, this is really the critical center of their system to integrate with their own infrastructure, but also with all the customers and end users. And also then of course, with partners like your, this it's the same strategy like in automotive and the future of these companies will not put everything by themselves, but they complimented with partner systems, which are very good in one specific niche. And they provide the distribution system for that.

[Erik]

Okay. Yeah. And this is quite a contrast of systems, right? You have a mission critical utility, and then you have your grandfather's home where, you know, I suppose you have a lot of different types, cause we're not here talking about always enterprise scale with smart grips, but we're talking about also home deployments with probably a, quite a range of different technologies, different connectivity solutions and so forth. Was, was this a challenge there, or is it already fairly standardized that when they do a, when they install a solar deployment on a home that the, the right connectivity infrastructure is already deployed there for, for an easy integration or, or is that a, is that a challenge

[Kai]

In this case? It's much easier than in plants and factories, because here you don't have to challenge that every vendor is very proprietary and doesn't really want to get the data out. And in this case, it's typically only one site and it's also not 30 year old machines, like in a production line, but it's really maybe ammonia to your old small devices. And so here are these manufacturers also a bit more modern technologies and she had a difference also is they want you to integrate it with other systems. So this typically has a standard interface or API, something like MQTT or HTTP. So this is actually pretty straightforward to integrate because here the business model and the integration idea much different than in production lines and plans. So this is really pretty straightforward. It's really more the challenge that, again, some of these interfaces are real time and sensor base. Some others are like more like pull based where you just ask the system every hour. And this is exactly what Kafka was built for with, it's not just a messaging system. It also has integration capabilities. And so it's pretty straightforward with Kafka to integrate these different technologies and communication, communication, power Dignitas, and still correlate all of this different data sets and protocols to get value out of that. And sent an X or alert or whatever the use case is on that.

[Erik]

Okay. Interesting. Yeah. I was reading a, an article maybe a month or so ago, which said that I think it was in Germany or UK, the percentage of energy on the grid from renewable spiked up to something like 33%, which was a significant high, right. And it was due to a few factors, I think like lower energy, lower air pollution, because factories were shut down and a few other factors. But I think that's something that five years ago, people were projecting that to be kind of an apocalypse, right? You couldn't, you couldn't handle that kind of swing in terms of renewable energy. But I suppose Kafka is part of the reason that energy grids are now able to handle a lot more variance in the load than they were designed to right. 10 years ago.

[Kai]

Well, it's really changing how you stay. So every year you see new innovation on that. And really Kafka is at the heart of that in many different infrastructures. Often you don't see it because it's more because it's under the hood, right. But it's really not just that these typical end user projects are using Kafka, but it's really that also these software and technology vendors under the hood and use Kafka for building new products.].

[Erik]

Okay. This has been really interesting and what, what have we missed here? What else is important for people to understand about events streaming?

[Kai]

The most important thing is really that today it's really much more than just ingesting data into data Lake. That's what people know it from in the last five years, but really today and half can event streaming is for mostly mission critical systems. So that's what 95% of our customers do. That's why they come to us because we have the expertise with Africa and build it more or less many parts of it. And therefore this is really the most critical thing. And it doesn't matter if it's just at the edge or really not global deployment. So we provide technologies that you can deploy. CAFCA globally. We have many industrial customers which run and plants all over the world and still you can replicate and integrate in real time for big data sets all over the world. There's different components here on different architectural options with different SLA is of course, but this is really the key power to take away from the session for industrial IOT.

[Erik]

Kai thank you so much for taking the time. Last question from my side is how should people reach out to you? I would be glad if you connect to me on LinkedIn or Twitter. So I'm really present. They are into a lot of updates about use cases and architect just there. And also of course, you can check out my blog. Hi, Vina, hi, minus wine or tea, or your check, the links maybe. But this is really where I blog was about IOT a lot every week or every two, and have a lot of different use cases and architectures around events streaming and different. Okay, perfect. Then we'll put those notes in the in the show notes guy. Thanks again. Yeah, you're welcome. Great to be here.

[Outro]

Thanks for tuning in to another edition of the industrial IOT spotlight. Don't forget to follow us on Twitter at IoTONEHQ and to check out our database of case studies on iotone.com/casestudies. If you have unique insight or a project deployment story to share, we'd love to feature you on a future edition. Write us at team@iotone.com

Contact us

Let's talk!