Microsoft Community Insights Podcast
Welcome to the Microsoft Community Insights Podcast, where we explore the world of Microsoft Technologies. Interview experts in the field to share insights, stories, and experiences in the cloud.
if you would like to watch the video version you can watch it on YouTube below
https://youtube.com/playlist?list=PLHohm6w4Gzi6KH8FqhIaUN-dbqAPT2wCX&si=BFaJa4LuAsPa2bfH
Hope you enjoy it
Microsoft Community Insights Podcast
Episode 43 - Real Time Analytics in Fabric with Thrushna Matharasi
We explore how real-time analytics in Microsoft Fabric turns raw events into decisions within seconds while keeping the strength of batch for complete, trusted reporting. From OneLake layering to agentic AI, we share practical patterns, pitfalls, and skills to get started fast.
• OneLake bronze, silver, gold layering for reliability
• Event Hubs to OneLake pipeline setup
• real-time dashboards, monitoring and alerting
• hybrid architecture for BI and operational analytics
• data quality rules, schema checks and replay
• skills to start with Fabric using SQL
• common streaming pitfalls and latency issues
• roadmap to agentic AI that lets users talk to data
• personal journey, community work and speaking plans
Hiya, welcome to Microsoft Community Insights Podcast, where we share insights from community experts today in Microsoft. I'm Nicholas, I'll be host today. In this podcast, we will dive into the world of real-time analytics in Microsoft Fabric. And today we have a special guest called Krishna Karus. Sorry if I pronounced it wrong. Could you please introduce yourself, please?
SPEAKER_00:Yeah, sure. Good morning, good afternoon, and good uh good evening to all the listeners joining into this podcast. I'm Sushramatharasi, Director of Engineering at Solera Holding. I've spent over a decade of time in the data space across multiple industries, uh, be it finance, healthcare, banking, apps services, and transportation. My core expertise lies in developing unified data platforms and real-time operative systems that directly uh drive business value. Uh, throughout my career, I've focused on transforming data into actionable insights that helped organizations unreal or unlock new revenue opportunities, optimizing pricing strategies, and improving uh custom uh uh monetar monetization. Uh, by leveraging uh real-time analytics, I've enabled companies to make data-driven decisions that accelerated revenue growth and create sustainable competitive advantage in their market space.
SPEAKER_01:Okay, so before we get started, what drew you into real-time data engineering into the world? What drew you? So what get you started? How did you get started?
SPEAKER_00:How did I get started? All right, yeah, sure, sure, sure, sure, sure. I can add that. Um, so um when uh I guess back in uh, like I said, when we were doing app services and applications back in with Digital Turbine, we were using um, you know, batch and uh real-time applications. But before we dive deeper into the real-time applications, I would like to pause and break down uh the difference between the real-time and batch and where it evolved and how it evolved. For the batch analytics, uh technically we just collect the data from different sources and then uh we publish the data or process it in a skin told intervals. And the use cases were very uh you know uh uh old school, which is uh you know, historical trending analysis analysis that we were doing, or sales report by month or by year or by day, or uh, you know, missed mileage reports, um, etc. Right. So they were all going back towards history of the data and and you know, trying to make analytics out of it. Uh, but with real-time streaming, what has changed is we are trying to process the data immediately as it arrives, uh, between milliseconds to seconds. And the use cases for this is mostly the sensor data, IoT sensor data that we receive for monitoring and alerting, and uh in in different compliance in today's world, how we use this for compliance and violations, um, and how we can uh you know get uh those out of uh, you know, how we can quickly address them and get to our customers um as quickly as we could um as we receive it. So that's the basic difference. And when I was working at uh um uh uh digital turbine, uh the biggest uh space in the app industry, uh we were processing most of the data, even data using uh uh you know data bricks and Kafka and streaming, and and then we just ingested the data into uh Data Lake or Delta Lake, and then we had a very similar structure that we have in fabric now, millennial architecture, gold, silver, and uh bronze. And and ultimately the purpose of that was we wanted to read out of it as quickly as we could uh to maintain uh you know uh reliability for the end customers. So that's the background of how it is involved.
SPEAKER_01:So what's such a typical day look like at your workplace, Krishna?
SPEAKER_00:Oh, it is uh it is a lot of data, I would say. So it starts off with uh us doing uh, you know, we have different pipelines um uh that we have built, and you know, in in a real world, uh nothing just pauses for you. Uh it just keeps running, and you have multiple issues that happen to be around uh something that's breaking, there's a production issue, or you know, you're talking about new builds, new pipe deployments, and then uh always uh strategizing how we want to see um you know uh the future of the company and strategies. So if you know it's mostly a lot of strategical meetings and how we want to uh you know make sure the customer 360 is achieved.
SPEAKER_01:Okay, uh so you before earlier you said there's a difference between uh uh uh real time and is it batch?
SPEAKER_00:You said correct, yes.
SPEAKER_01:Batch so could you explain the difference? Because uh we I didn't we get the difference.
SPEAKER_00:Yeah, yeah, sure. So for batch data, uh as so the real time let me start with batch, right? It's very date. I mean, it's been there for quite some time, and we have a lot of data coming in from many, many source systems, be it uh you know, transactional systems could be um, you know, um SQL Server, Postgres, IBM DB2, Oracle, you have name it, and you can just get those data from different source systems as your company publishes the data. Uh, you the data is always processed over a period of time and it's not like you know processed as it arrives. Um, so it is usually scheduled at different intervals to ensure the data is getting batched up from the previous offset, and then you are uh collecting the data at maybe daily, uh hourly.
SPEAKER_01:So it's similar to like a throne job.
SPEAKER_00:Correct.
SPEAKER_01:Like a schedule. Okay, go ahead.
SPEAKER_00:It's a scheduled job. Uh whereas when you talk about real-time analytics, it's usually immediately as it arrives from a streaming platform, you just have the data landed in your um landing uh layer, any landing layer here. We are talking about uh one lake where we want where you know Microsoft Fabric has uh different layer flavors of uh data uh processing within one lake. We have uh gold, silver, bronze, and uh um you know that the data gets uh you know curated and processed and put in different layers for the end customer to utilize that. So um that's that's the key of what real. So when I say real time, it doesn't just stop at uh you know uh any of these layers, right? So the once it gets into uh the bronze, it just we immediately process the data, push it to the silver, and then we uh curate the data, then push it back to uh validate, and then uh the whole layer sees it less than micro batches. And uh technically we are talking about um having the real-time data available uh basically for time series um you know uh dashboarding, right? So it's not like you know a report that you want to see. Um it doesn't it doesn't matter if it's it's real time or batch.
SPEAKER_01:It's for money, it can be used for monitoring and noise or whatever.
SPEAKER_00:Exactly. So I think there is a key difference between and when to use what is very, very important. In many cases, I think many companies are doing a hybrid approach uh where they modernize the stack uh to have both real-time and batch processing, and for real time they use immediate alerting and monitoring, like you've said, uh targeting certain customers to uh you know have some monetization aspects there. Uh, but if it's for batch, it's usually comprehensive reporting and analytics, right?
SPEAKER_01:So that's great. Uh do you mention you want to show the use case that you say you can you often use it for?
SPEAKER_00:Um so I guess we could uh we could have um um like like I said, the modern architecture platform that um Fabric serves is you know you have events coming through Event Hub, and then you have a streaming data platform that sits in and it ultimately you just push that data to one lake, right? And once you have the data landed in one lake, you have bronze, silver, and uh gold layer. And you know, once you get to the gold layer, you can use that for immediate BI reporting or uh, you know, real-time dashboarding feature that uh you know Microsoft has within the uh within the uh platform as well. So uh that helps in both the streamlines where you could uh for anything real time, you could just plug in any layer, be it bronze, silver, or gold, uh, to process and do your uh you know machine learning or deep learning initiatives. And for anything that's you know, like I said, which is uh delayed reporting, can still be batched out of um, you know, BI dashboards or Power BI.
SPEAKER_01:Okay. I because often data is a fine line between data and AI because you can still integrate that. So are you doing anything with AI in the workplace with data?
SPEAKER_00:Yeah, so we are trying to um so with our um with my with our latest stack that we're trying to go to, we are trying to build a new application. It's called um uh I can't give the name, but it's it's it's the fleet application that we're trying to build, where uh the biggest problem that we were trying to solve over a period of time was having one unified data platform uh for our customers, and we are trying to integrate all the data into one lake, and then from the lake, you will uh it will be uh a data lake where you will have all the data streaming from different systems. From there, we are trying to provision our application to read the data and have more agentic AI uh questioning where you could just uh talk to the data. Um, that's the feature that we're trying to implement to enable the end customer and have uh uh you know uh flexibility of customer just talking to our data.
SPEAKER_01:Okay. Is there any from your experience, is there any best practice of like refining your data or from being like from your time as a as an engineer working with like data engineering?
SPEAKER_00:Yeah, uh so I um we I think when I say refining the data, I will go back to the core principles of having the data curated well it well before it gets to the whole layer. So for according to the standards, I think we should always play by having uh raw data always stored in a landing area or a landing lake area uh where we could anytime go back to find the diffs between uh what has been ingested today versus what was there in a fraction of seconds, right? So it's it's always good to have that landing area. And once that's there, the best standards that I think we should always put together is ensure that data cleanliness is there. Without that, I think any data that you process further downstream is not valuable. So when I say data cleansiness, uh we are talking about ensuring the data, uh data format, data uh data types, and then we're also talking about uh you know ensuring the quality of the data is there, and we are always trying to ensure that uh we are not missing the data uh right from previous batches that we we ingest the data. Um so I think these are very core. Uh we should always help have a healthy state of data um in our uh you know uh in our databases at Silverlayer to further process it downstream. Um also I think the biggest consideration that we should uh we should take into play when we are doing data clean uh uh data cleansiness is um we should uh uh help uh with uh um making uh a pre uh quality checks in terms of data rules. Um when we have preferred uh data rules that we are trying to uh um address, we have to always test them before we push them to the next layers when we are building the goal layer.
SPEAKER_01:Okay. Uh if someone wanted to be like a data engineer or work with fabric, what kind of experience do you think or skills do you think they need?
SPEAKER_00:I go, I guess they wouldn't need any huge experience around that thing. They should be good enough with uh SQL and everything else is very self-explanatory. I've also tried a lot of um you know videos that are available online. You could just take a um quick uh MVP of what if of the tool and then start building uh things out of the box. Um you know, there are a lot of open source videos available to just get started. I think it's not too hard. You should just have SQL background, and I think anybody could get started there.
unknown:Yeah.
SPEAKER_01:Is there any recommendation that you recommend?
SPEAKER_00:I guess uh the recommendation is um, you know, you follow the steps right during the setup. The bigger challenge that you would have is especially when you're trying to set up the streams or when you're trying to set up the hub, uh, if it's not uh organized in the right fashion, you will end up having um, you know, uh latencies and uh issues around those lines. So I think um that's one takeaway from um you know the streaming aspect of things and uh you know always structuring the data right is very important. What goes in and what goes out from the uh you know from the environment is also very important.
SPEAKER_01:Yeah. Is there anything like before when you get started as uh working with data? Is there anything that interests you or excited you about this field? Is it data like engineering?
SPEAKER_00:Yeah, so I have uh been um in the engineering world for like I said, a world decided, but uh I always loved data and I think the problems that data could solve was something that interested me. It was not uh so I started my uh career very early as a data uh, you know, um report developer, right? So the kind of problems that I was trying to solve, especially the um on the reporting side, was interesting. So eventually I moved from uh reporting and then I found that enterprise solutions were you know designing and data modeling were interesting. So I explored there and then I felt like, oh now I want to learn more and I want to do data engineering. Where is the data coming from? Where is this, you know, where is the source of it? How does the uh thing uh things work? Then I moved into data engineering world, and then eventually, like you know, I kept drawing down that uh route. But I think uh uh the the key takeaway is data can solve a lot of problems. Uh it was uh you know, until uh 2013, 2014, it wasn't taken seriously. But after that, when uh things started moving, and with AI, I think data is gonna be uh like key player.
SPEAKER_01:Yeah, because in order to uh solve any AI problem issues, your data have to come first. So you have to know how to refine your data. It's like refining your prompt as well. So data is really matters. So in our in our podcast, we always like to guess get to know our guests. So what would you like, what do you like to do aside from like uh you're passionate about data engineering? Do you have any hobbies in your do in your spare time?
SPEAKER_00:Yeah, uh I guess I love cooking and uh uh I also hike a lot. So those are the two main hobbies I think I would love to do. Uh, if and also I like interior designing. So I I try to explore a little bit of it, but I felt like I didn't have time for it. But uh, if I'm given when I'm retiring, I think that those are the two things that I wanted to become a chef and uh have an interior designing company.
SPEAKER_01:Okay. Are you uh are you are you part of any like data communities? Like whatever events or conferences as well?
unknown:Yeah.
SPEAKER_00:Yeah, no, I'm not uh to so I do that after work, so I'm part of uh the a co-chair for a CDO magazine, and uh um I'm also uh mindful is something that I've uh also you know part of uh AI groups uh in that are hosted in um Dallas area, so I usually try to uh be around uh the places there.
SPEAKER_01:Yeah, because I think there's quite a lot of events in in America on that side about like whether it's AI or data. So it's quite cool. Would you ever speak on it about data, whether it's one like fabric or anything?
SPEAKER_00:Uh can you come again? What is the question?
SPEAKER_01:Uh would you ever like speak? Did you speak on it? Yeah, yeah, yeah.
SPEAKER_00:I had uh much speaking opportunities, yeah. So I did I think yeah, Microsoft uh community. Um there is one that's hosted in Orlando, so I'm going there and I have one event coming up, like I said, um the the uh week after this, which is CDO magazine, and being the co-chair for that.
SPEAKER_01:So I know I know a friend that went to Fabricon in Vegas or something. Yeah, because that was on, I think, at the same time. It's like I'm not sure if it was summer or something, it was like September. Like it's it was like in Vegas, like fabric, all about fabric and data. So it's quite good to know. Uh yeah, so thanks a lot for joining this episode, Krishna. So hopefully everyone gets to know more about like the importance of using data, even if there's in the world of AI now, as well, like how you can get started. So hopefully you get some tips and tricks on this episode. And stay tuned for the next episode. Thank you.
SPEAKER_00:Thank you. Bye.
Podcasts we love
Check out these other fine podcasts recommended by us, not an algorithm.
The Azure Podcast
Cynthia Kreng, Kendall Roden, Cale Teeter, Evan Basalik, Russell Young and Sujit D'Mello
The Azure Security Podcast
Michael Howard, Sarah Young, Gladys Rodriguez and Mark Simos