
In Episode 97 of Great Things with Great Tech, Anthony Spiteri hosts Kishore Gopalakrishna, CEO and Co-Founder of StarTree, to discuss how real-time analytics is revolutionizing industries, supercharging AI capabilities, and redefining observability. Kishore shares his journey from DOS gaming and early programming in Bangalore to building pioneering distributed systems at Yahoo and LinkedIn, where he developed Apache Pinot—a cutting-edge real-time analytics platform. StarTree leverages Pinot to empower global giants like Uber, LinkedIn, Walmart, and DoorDash with instant, scalable insights. Learn how real-time analytics is reshaping digital experiences, powering next-gen AI agents, and enabling businesses to make smarter, faster decisions in milliseconds.
Did you know every time you order food, book a ride, or even check who viewed your profile, real-time analytics is powering your experience behind the scenes?
In this episode of Great Things with Great Tech, we dive deep into the power of real-time analytics with Kishore Gopalakrishna, CEO and Co-founder of StarTree. StarTree leverages Apache Pinot, a high-performance real-time analytics database, revolutionizing how leading companies like Uber, LinkedIn, Walmart, and Etsy provide instant insights and personalized experiences at massive scale.
Kishore shares his journey from a gaming enthusiast fascinated by distributed systems to building mission-critical platforms at Yahoo and LinkedIn, eventually creating Apache Pinot. Discover how StarTree is powering billions of real-time queries per week, enabling businesses to enhance customer interactions, optimize operational decisions, and supercharge modern AI and observability.
Key Takeaways:
- How real-time analytics transform industries, enabling instantaneous insights and rapid decision-making.
- The evolution from traditional databases to highly efficient columnar, real-time analytics systems.
- Real-world applications of Apache Pinot, from consumer apps to enterprise observability and operational excellence.
- How real-time data is accelerating innovations in AI, specifically through Real-Time Retrieval-Augmented Generation (RAG).
- The future of analytics: seamless data ingestion, enhanced concurrency, and the growing demand for sub-second response times.
Links & Resources:
Web StarTree: https://startree.ai
Kishore Gopalakrishna on LinkedIn: https://www.linkedin.com/in/kgopalak/
Apache Pinot: https://pinot.apache.org
☑️ Support the Channel: https://ko-fi.com/gtwgt
☑️ Be on #GTwGT: Contact via Twitter @GTwGTPodcast or visit https://www.gtwgt.com
☑️ Subscribe to YouTube: https://www.youtube.com/@GTwGTPodcast?sub_confirmation=1
Check out the full episode on our platforms:
Spotify: https://open.spotify.com/episode/2l9aZpvwhWcdmL0lErpUHC?si=x3YOQw_4Sp-vtdjyroMk3Q
Apple Podcasts: https://podcasts.apple.com/us/podcast/darknet-diaries-with-jack-rhysider-episode-83/id1519439787?i=1000654665731
Follow Us:
Website: https://gtwgt.com
Twitter: https://twitter.com/GTwGTPodcast
Instagram: https://instagram.com/GTwGTPodcast
☑️ Music: https://www.bensound.com
actually here is the transcript with time stamps for accuracy, not guessing
Transcript: "(00:00) did you know that every time you order food book a ride or even check who's viewed your LinkedIn profile powerful realtime analytic engines are making it happen in milliseconds do you ever wonder what it takes to really run those platforms apache Pino does just that handling billions of queries every week from the top web platforms in existence in this episode of Great Things with Great Tech I'm joined by Kishaw Gapala Krishna CEO and co-founder of Star Tree to explore how real time anal ytics is reshaping entire industries powering (00:33) next generation AI agents and transforming observability in the age of data and AI it's never been more important to access process and leverage data in real time this is episode 97 of Great Things with Great Tech with Star Tree hey Kaw welcome to the show and thanks for joining it's it's a great time to be talking about real time data analytics as I said in my intro um before we get into Star Tree and everything that you've done it's it's a great journey but let's let's go back in time to start about that journey and what got you into (01:11) computing where were your early days i believe you're from Bangalore um just give us a little bit about yourself as a bit of a background and how you ended up in in the states working for LinkedIn and now founding a really innovative company in real time data analytics that's awesome uh thank you so much Anthony for having me on the show it's it's a pleasure um I think uh if I have to talk abo ut how I got to US I think that itself will be another episode but I'll keep that really short i mean it's my my then my wife now but then my (01:43) girlfriend she was in uh she was studying her doing a masters here so that's kind of what really got me to got me to US and um what got me into computer science I think that's uh another story by itself but it's really gaming um so back in college I was I really got excited about uh gaming i'm electrical by background uh but I got so excited with g aming that I uh started building some of these games and uh that's kind of what got me excited into programming okay so that's uh even though I didn't do my computer science (02:17) that got me got me excited into coding and uh learning about all the different coding languages go ahead what sort of what sort of games so what what what games we number one a lot of you you you'll probably guess my age by talking about Well I think we're I think we're about we're almost the same right but yeah let' s play that let's play that game tell me the game and I'll guess your age yeah so this this is Pac-Man and Paratroopers and uh Tetris and games like that so yeah exactly just the 2D DOS games that you (02:48) would build and um and I think these were the times where NFS need needed for speed and all those things other things were also very popular good then Halflife um those Yeah yeah halfife is still I still play HalfLife actually i don't know if this is as a very very quick side note because t his is interesting um the RTX edition of HalfLife 2 came out last week which was the the enhanced Nvidia version with all the beautiful graphics and and so I'm actually playing it again it's it's it feels like a different game with like (03:18) beautiful sort of still the same graphics but all the AI innovation that Nvidia brings into their RTX technology just brings it it's crazy it's crazy good i recommend it if you if you if you want to get back into it and have some fun go there no I I don't think I can i if I if I get sucked in yes too much time i'm I'm big big time into gaming again but um yeah I I can't really afford that right now of course we've got we've got busier things to do is it isn't it sad how we kind of give up a little bit of those (03:45) things but that's so that so that sent you So that sent you sent you off so you started to code these games you obviously had an interest in it um and then you know from the point of view of going to university again did you do com puter science or anything or engineering you said no it's it's just from there I I directly got into jobs and I started off with yeah with Oracle in India and then when I came to US I started off with at Yahoo i think that's where I really got interested in distributed systems i was uh working in (04:13) the Hadoop projects and in Zookeeper and all these other systems and to me the distributed system concept itself got me really excited and I think it's especially the part where I mean it's kind of Leslie Lamport's quote which says like a distributed systems is one where you kind of uh machine that um one machine failing renders an completely another machine useless right which is which is very interesting And that's kind of what got me interested in distributed systems because there's so many ways a system can actually fail and (04:49) the fact that you have to design a system that handles all these failures and yet still be functional uh was a pretty big challenge and I saw it from o utside when Hadoop was happening i didn't really work on those systems but uh that's kind of what got me excited and really wanted to build a distributed system from scratch uh to understand all these things and then I got to LinkedIn and that's where got the opportunity to build the systems from uh from ground up and espresso was the first system uh this is like a MongoDB uh equivalent uh (05:23) NoSQL data store okay that we built at uh LinkedIn built a cluster management framework as part of that called Apache Helix uh which allows you to build a lot more distributed systems very easily so that you don't have to worry about all the fall tolerance part and from then I got an opportunity to build Apache Pino i think this was a a very interesting um challenge that we had at LinkedIn where we as a company wanted to provide insights to millions and hundreds of millions of LinkedIn users right i think that was the first time a company (06:00) embarked on this kind of a mission to say like hey analytics should not be restricted to just the people within LinkedIn uh why can't we show this to our users and I think that was a big hit um where they you saw who viewed my profile on LinkedIn i think that was the flagship project where we started this where you said like hey just give them the insights on who is viewing their profile and something which is as simple as that was a was a big success at LinkedIn a lot of people started seeing hey I I see this profile I worked with (06:35) him I I worked with him at this company he's from my college and then all of a sudden the number of connections started increasing so there were a lot of these second order effects and that's kind of what led to creation of Apache Pino that's interesting yeah so you know a lot of necessity in these projects inside of the Yahoos and LinkedIn obviously have have led to such systems being built and interesting that you build it internally um and then also it's for me it's interesting that you know it's all sort of there's a lot of (07:05) Apache here a lot of open source and stuff is that how it started within like was that always the idea when you're building these things within these within these organizations to um do you build them to only benefit internally to start with and then at what point does a decision come where you want to then say open it up on an Apache license and whatnot i'm very intrigued with that obviously yeah actually that's a that's a great point and I I think ever y company has a different um reason for open sourcing and I think LinkedIn was one of the best (07:34) I've I've seen i think um what we realized I there was an open source committee back in LinkedIn as well and I was a part of that and what we found was um actually promoting open-source uh culture actually had a lot of benefits i mean one the project itself would be really good because you're now docu writing a lot of documentation you're making sure that your code is actually seen by hundreds of thousands of developer outside so you actually focus on the quality of the code documentation and everything else and the second part (08:08) is also on um the hiring so you actually end up attracting the best talent so while one way of thinking is like hey why are we giving this software away for free for other companies to use but the second order effects of open sourcing is very very high and I think that's something that LinkedIn understood very early on and we actually we almost had the mantra internally like hey why are we not open sourcing this versus why are you open sourcing this so I think it was a great uh uh especially for a developer (08:42) it's it's great to see something like that where you're constantly promoting open source I think there was a great analogy I think u one of the VPs back at LinkedIn who used to say is It's like how you keep your house when the guests come in right open source is equivalent to that right it's like you always tidy up your house make s ure that it's it's the best version of the house that you're presenting so so open source is equivalent to that and we kind of saw it firsthand and when we asked a developer to open source it they would actually (09:16) just go ahead and clean up all the things write the proper documentation comment the code recipes i think it just had a lot of benefits uh so I think it's it's always good i think a lot of people think about open source as hey why are we doing this who is it going to benefit but hat something part of that it's actually a great part of that right because now you are um because when you open source something people are using it in ways that you had never thought about right i mean that's one of the things that we saw and when they do that they stumble upon bugs and uh gaps and friction and other things that you had never (10:45) anticipated so in a way it's actually making it better and better you're getting more testing more coverage and at the same time it's actually ma d like hey let's figure out what it takes to build an efficient system to serve this kind of a use case where we can provide uh the freshest analytics on lots of data coming in streaming as well as to millions and millions of users uh without compromising the latency i think that in shortly like freshness concurrency and latency i think those are the three key things that we wanted to solve all at once and once we built this we came down from,000 nodes to like (12:26) 75 nodes with the traffic g hat hosting game and to think that you (13:26) would have had to sit there and do a thousand servers a thousand nodes for that one particular you know feature it blows my mind in in that sort of scale it's just it's just completely different scale but out of that that's the necessity that's where you go how can we make this better how can we make it more efficient how can we if we're going to scale this to you know 50 million 100 200 million even a billion users like how are we going to make tha h we are kind of um the creators and the co-authors of the project we the idea is that hey this is a project that can continue even after the um the creators actually are not (15:01) invested in that so that's kind of what ASF provides which kind of has the governance model uh behind it make sure that everything is done in the right way um so I think that's the benefit of ASF and I've been involved with lot of lot of ASF projects before as well and I think there is I think now the ASF was the on pps that you see either Uber Eats or if you're taking a Uber ride a lot of numbers that you see on those apps are all being powered by Pino as well so Uber was the next company that took this technology did massive um number of projects which are very very innovative i mean think about a (16:37) restaurant owner getting analytics in real time right i think that's that's an amazing story by itself and seeing like what kind of change such a technology can bring in this uh indust industry so that's n if it is 5 10 minutes later you already ordered it and probably yep needs to be instant that's a really good analogy to think about the use case yeah yeah so I I think that's uh that's how Uber was able to leverage uh this and then same with LinkedIn as well um but what we realized was a lot of companies should be doing it but they're not uh because they're again even though they have Kafka they're just putting it in the data lake and then (18:15) they have this latency which is it's going to ou can think (19:18) of system issues all these that's good yeah two different use you've obviously started at the consumer sort of front end let's let's that's let's that's let's let's understand the the end user who's clicking on the on the application make money there but then you've really talked about that other use case or a couple of use cases to actually help be more efficient in in whatever enterprise you're doing at that point as well so there's two different use cases for the real tim ng at an aggregate level so now they can see like if within this zip code or within this area what is the supply and demand that's happening so if there is (20:42) less demand can we actually send out more coupons so that more people actually order or if there is too much demand can we shrink the restaurant number of restaurants that we show so there is this is exactly the same data but you can actually think of so many different use cases one is trying to make money the other one is trying to s y very long i don't think that's that's new i mean all these uh databases existed Vertica and all these column databases existed for a long time i think it's the the rise of SSDs that actually allowed us to build a different architecture for the OLAP systems so you don't have to because you're no longer (22:16) limited by the disk and you have SSD systems um you can actually do a lot of random access u and um you don't have to res um completely rely on sequential access that allowed us to think tand a little bit about what the difference is that's a really good way to do it okay so Star Tree so talk through the founding um the name i'm always interested in the name of the company and how that came about i've got a little bit of an idea given Stars and whatnot but yeah um talk about Star Tree and the founding and then the name yeah no absolutely as I as I mentioned that the really the reason for us to start was um there are a lot of companies that (23:49) should be doing the same and on the name based on the the starter (24:54) index which is one of the most powerful indexes in Pino wow okay there you go yeah makes sense right so it was it's a name based out of a really cool functionality and feature in the product i love that yeah and and it makes sense and then um what's also interesting is is that so again you late 2019 the company started um obviously and we talked about this a little bit in the pre-show you know that we don't talk about the dark days of co much anymore be product uh over zoom and then how do you even make this happen so there's so many unknowns that uh I think it's uh it was it was a scary (26:22) situation and I think we we had to make some um uh tough choices in terms of where we hire how we actually get to the next step and how what kind of a bar we need to spec um put in terms of our hiring so for almost first 6 months uh we didn't hire anyone beyond the founding team it was uh it was very hard because a lot of people wanted remote and peopl just an organic exponential sort of growth it it's it's a combination of both i think the first was definitely the organic explosion in terms of just the pen usage uh because there were so many companies that suddenly got uh got an idea of like hey this is this is (28:00) something uh that allows us to do um something that we were not able to do before i think that was the key thing because they were always uh ideas was were never a problem people had a lot of ideas of okay this is the product I about for example today in uh just the delivery and food and logistics area you have like Uber Door Dash Cloud Kitchens Zomato Grab Kharim so many of these companies are actually leveraging these technologies and so someone ordering something today at any point of time is probably leveraging Pino in one way or the other it's probably like the technology that a lot of people leverage (29:43) and use but don't know exists in a way that perspective you're rattling off these names and they're all na id of that ETL so what this means is if you have some streaming systems like Kafka Red Panda Kinesis or any of these you don't need another system in between to pull the data from there and write it to Pino so we kind of embedded the injection into the system as a first class citizen so you just point at the source and you forget it and we take care of everything in terms of ingesting now it's a it makes the problem a lot harder because you now have to think about so many different things in ter makes sense connect the connectors connectors connectors are the ones that can be configured so as new systems come (32:20) like let's say Kinesis or PubSub or EventHub we end up building those connectors but you just build once for one of every uh main uh system that comes out there but we do the hard work there in terms of making it easy for the end users that was a point I was Yeah so you guys are doing that work yeah that that's that's awesome they talked about ETL etl comes up a lot recent u have you probably have like hundreds of thousands maximum but I think what lot of companies are our users and customers are doing is think about generating embeddings for your events right every event that is happening today it's a (33:55) structured event but imagine turning that into an embedding model right and that is actually something that people can actually use and now they can actually search for a lot of lot of different things related to that um and that allows them to build lot of se now you are no longer limited by people clicking on it it's like agents are like constantly querying the system to actually make better decisions for you so we feel that the the need for higher concurrency and lower latency is going to continue to keep growing um growing exponentially in terms of uh based on the products that we see in the AI space yeah it makes (35:33) sense and we talked about like earlier in the pre-show we talked about the the real time rag or real time retrieval augmente nk one other thing that we are excited is in the observability space observability yeah yeah because we pen is uh increasingly being used in uh logs metrics and traces as um data as well and if you kind of think about it there's the way people are asking questions on the log is going to change right i mean people used to do all these uh complex queries and they need to understand the semantics of how how the log is structured they need to do a and set and grap and all these (37:08) piping piping on of knowledge plus new information that's being available every second and I think (38:19) that's that's super exciting for us and I Yeah that's awesome yeah I think that's a really good use case i've I've talked about this in a sense of you know um observability and natural language observability which I think just reduces as someone who's historically had to look at dashboards and work out what's going on visually and then you got to go down the rabbit hole of trying to work that out if you um you got you guys have got a data summit coming up um and then the company's got so much growth I think and it's really an amazing story i I love everything it started from computer games we're talking about Yahoo LinkedIn and now (39:49) you're just touching almost everyone who's ordering food around the world so Zry it's a great story thank you so much for being on episode 97 of Great Things with Great Tech thank you so much Anthony for having me on the show it was a pl"