January 27, 2025

Powerful SQL Data Management with Yellowbrick | Episode #94

Join us as we explore how Yellowbrick’s Kubernetes-powered SQL data platform is transforming enterprise data management. Mark Cusack, CTO of Yellowbrick, discusses their hybrid and multi-cloud capabilities, innovations in real-time analytics and AI integration, and how they outperform competitors like Snowflake and Redshift. Discover their journey from on-premises applian…

Join us as we explore how Yellowbrick’s Kubernetes-powered SQL data platform is transforming enterprise data management. Mark Cusack, CTO of Yellowbrick, discusses their hybrid and multi-cloud capabilities, innovations in real-time analytics and AI integration, and how they outperform competitors like Snowflake and Redshift. Discover their journey from on-premises appliances to a fully cloud-native architecture and learn about their new Community Edition.

The player is loading ...
Great Things with Great Tech!

Revolutionizing Data Warehousing on Kubernetes with Mark Cusack, CTO of Yellowbrick

Discover how Yellowbrick’s Kubernetes-powered SQL data platform is transforming enterprise data management. Mark Cusack discusses their hybrid and multi-cloud capabilities, real-time analytics, and AI integration, highlighting advantages over competitors like Snowflake and Redshift. Learn about their journey from on-premises appliances to a cloud-native architecture and the introduction of their free Community Edition.

Uncover how their Kubernetes-powered platform is revolutionizing the data warehousing space.

Key Topics:

  • Yellowbrick’s hybrid cloud and multi-cloud capabilities
  • The unique Private Data Cloud approach
  • Innovations in real-time analytics, AI integration, and streaming workloads
  • Why Yellowbrick outperforms competitors like Snowflake and Redshift

Links:
☑️ Web: https://yellowbrick.com
☑️ Crunchbase: https://www.crunchbase.com/organization/yellowbrick-data
☑️ Sign Up: https://yellowbrick.com/community-edition

 

☑️ Support the Channel: https://ko-fi.com/gtwgt
☑️ Be on #GTwGT: Contact via Twitter @GTwGTPodcast or visit https://www.gtwgt.com
☑️ Subscribe to YouTube: https://www.youtube.com/@GTwGTPodcast?sub_confirmation=1

Check out the full episode on our platforms: YouTube: https://youtu.be/kmB_pjGb5Js Spotify: https://open.spotify.com/episode/2l9aZpvwhWcdmL0lErpUHC?si=x3YOQw_4Sp-vtdjyroMk3Q Apple Podcasts: https://podcasts.apple.com/us/podcast/darknet-diaries-with-jack-rhysider-episode-83/id1519439787?i=1000654665731 Follow Us: Website: https://gtwgt.com Twitter: https://twitter.com/GTwGTPodcast Instagram: https://instagram.com/GTwGTPodcast

☑️ Music: https://www.bensound.com

Transcript

Transcript: "(00:00) as data continues to shape our world the importance of data warehouses will continue to grow imagine a data warehouse that scales effortlessly runs faster and keeps your data secure all while offering realtime insights in this episode of great things with great Tech I'm joined with Mark kusac CTO of yellow brick data we'll explore how yellow Brick's kubernetes Power Platform is revolutionizing dat warehousing with hybrid and multic Cloud capabilities high performance and complete control of your data if you're looking for a more (00:33) efficient way to manage your data this conversation is for you this is episode 94 of great things with great Tech with Mark kusac of yellow brick Oh welcome Mark cusac CEO of yellow bricks so Mark just before before we get into a bit of your background in yellow brick what what is a data warehouse let's explain that first because I think in today's day and age data is so critical so important um and definitely this notion of storing data in a pl ace where you can do more stuff with it is obviously what companies are (01:17) looking for so how does data warehousing fit into that yeah so a data warehouse um is a centralized repository where lots of data from operational systems all over your business could be supply chain systems Financial systems transactional systems that are kind of at the front facing end of your business making these you know closing transactions doing ATM things and all this kind of stuff all flow into this centrali zed data warehouse for longer term analysis and so data warehousing is typically um associated with descriptive (01:51) analytics you're you're looking at for patterns that happens in the last four or five quarters of your business and then use those patterns to make predictions and business decisions to uh to guide where your Enterprise is going yeah and typically what sort of sources are they typically coming in from is it is it is it generated from line of business applications or um someone' s um other type of front-end app or you know what what is or is it just basically all the above it it it can be all of the above but typically there are um kind of (02:22) oltp systems there it's very highly structured data they're online transactional processing sources that end up um storing these closed transactions into a data warehouse so um yeah and the data warehouse is typically you know it's highly structured data SQL is the language that's used to process that data and access it but yo u arrange the data in particular schemas within the way you model the data that's in there and so you'll you'll hear things like star schemas and Snowflake schemas and these are really just ways that we (02:53) we for efficiency perspectives um model the data and store the data and separate it um to to do that analysis more more effectively right okay we'll touch on the specifics around you know the SQL side of it and how the data is accessed and how to optimize and that's kind of you know why a company like Yellow Brick exists right in terms of optimizing that data and making it easier and more efficient for businesses to get that data out but let's start a little bit with yellow brick itself give us a little bit of an introduction to Yellow (03:22) Brick what you guys do and what's your value prop yeah so yell provides an SQL data platform for satisfying these data warehousing use cases and a number of other use cases as well which we can get into later on but uh the company's been a round for around 10 years founded in 2014 uh we first went to Market in 2017 and we went to Market with an SQL database that ran on our own Hardware so we we shipped an appliance that we' install in a customer's own data center it was very TurnKey in terms of its operation this thing would stand up and (03:54) run and they'd be able to do all of this this data warehousing they would want to do in there um about three years ago we actually took the software that runs within um that Appliance and ran it on the cloud and turned it into a kubernetes based application um that runs now on all three public clouds so we've actually gone on this journey from on Prem Hardware into the cloud um and uh and there's a ton of good reasons why we did that partly because at the time Cloud Hardware really wouldn't tap into the some of the efficiencies that we do (04:26) in our technology at the time it does now so a lot of the goodness and the IP that we used to do on Prem we can apply to Cloud now yes okay so that's a really interesting you know sort of PIV but again being there for 10 years um when you say you deployed an appliance I'm just thinking about a oneu server it probably had the proprietary software on there and then it allowed the data that you talked about before to be ingested into the appliance and sit there for the purpose of the of the data warehousing and then the SQL querying at the other (04:56) end that yeah that's that's exactly right and you know the one of the things t hat we really focused on was um very very small footprint appliances at that time in fact we still sell appliances today and I can talk about that journey and and where we're going in the future on that but it was a blade form factor so what you'd have these really really high density converged Compu and storage units that get deployed in in the data center that run our software um all all uh nvme SSD based storage we're an MPP scaleout data warehouse so what that (05:27) really means is we we'r e really really good at um scaling and parallelizing workloads whether data loading workloads or query workloads across all of these blades or in the cloud across a whole bunch of you know uh compute instances in the cloud as well yeah I understand so you were talking we doing the research the lineage or the DNA of the company is quite interesting um why don't you go into that because you've been there for I I think you mentioned about four years am I am I right in saying roughly Spen yeah and y ou're at (05:58) terod Data before um through an acquisition of of a company that you co-founded as well but I'm interested in your background just a little bit and I want to talk about your journey in this world as well but give us the lineage of yob Bri because I think a lot of people uh will know where the lineage comes from yeah so the the founders of yellow brick uh Neil Carson and Mark briam were at Fusion iio so they were very heavily invested in the in the storage space and flash storage um and then they obviously spun (06:27) up and looked at how they could add value by taking their knowledge of of storage and applying that up the chain in terms of analytics okay so you know what was really interesting was when um Neil and Mark were at uh Fusion iio there were a lot of data warehousing companies like Terra dat and others that wanted to take advantage of of flash storage in their data warehousing appliances um and what they what they were doing was just simply taking these NVM ssds putting them and replacing the spinning discs that they had in these (06:58) data warehouse appliances um and then expecting kind of a boost in throughput and and a reduction in in data access and latency and things like that what they found was they weren't getting that because unlike a pure kind of storage use case when you're doing data warehousing you're pulling vast amounts of data out of storage and then pushing that via typically buffer cash in dram on its journey into the CPU and an d what they found was um the they were getting bandwidth um they were getting their bandwidth tied up in in the Chann in the (07:32) memory channels that flow from the storage into the buffer cash and dram and out back into the CPUs there just wasn't enough bandwidth there to take advantage of the of the ssds and so what um sort of the core IP that that really um uh kind of led to yel this this idea was well let's bypass main memory and as we run queries we take data straight out of MVM ssds and any like like Yellow Brick does so that's that's kind of what I took from that right it's that you know we're not just talking about storage here it's actually compute and memory and smart and the software right very much and you look at when you look at the kind of (09:09) real world deployments of data warehousing out in companies they have their workload balance can be very very different it can be IO bound or it could be CPU bound or it could be balanced as well and so you have to have a a s important as well I mean there are three there are three core kind of uh legs to the the data warehousing stool you know there a storage networking in the compute right those are kind of critical um so so yeah when you go to the cloud there are there are opportunities and there are kind of uh drawbacks as well to it I mean the main opportunity is I've suddenly got almost Limitless amounts of storage at (10:38) my fingertips and and limit Limitless amounts of compute I can make my compute elasti for (11:39) example so yeah and with that the scaling and the efficiency that the public card offers is obviously something you can take control of right and I guess that's allows you to also potentially um I guess address customer needs without having to have the bottleneck of an appliance so that this is my positive on the cloud is that you know while a customer who bu an appliance they're they're kind of limited to the bounds of that Appliance and you know right sizing by the buy if you're in t lineup by being able to deploy on premin and all clouds because we think there's a lot of businesses out there that that (13:06) want to the right size and right place their workloads in for financial reasons as well as kind of sovereign data sovereignty reasons and others yeah data sry is huge like you mentioned gdpr in the in the in the intro part as well but so being the hybrid cloud data warehouse platform is is your is your kind of stance right so and building on kubernetes as well um I d kubernetes today it's bare metal but early next year uh we're going to take that kubed software and bring it back home and run it on not Appliance Hardware but on commodity Hardware on Prem um so very exciting about that and (14:40) so again like I think I think the future Market is people are looking for that cloud-like experience but in some cases they want to run it themselves in their own data center yeah we've talked about the computer the networking obviously very important from a data per gigabytes of data then postgress is a great solution and it's free right um but if you want to scale out and have Enterprise class resilience um performance and not a kind of a bank breaking price and then we're we're worth a good look at um and a lot of that is driven by the efficiency we can do we can do a lot of of data warehousing in a in a a tiny amount of (16:15) Hardware whether it's on the cloud or on Prem yeah because typically when someone thinks about Warehouse I think you know massi S applications so yeah as the nature of those sources changes data warehouses have had to adapt over time to be able to to um to to to support these new new forms yeah it's interesting yeah that's good really getting through the the nuts of of of the why what is the data warehouse here um so here's a question so why another data hit another data warehouse vendor there's a few of them out there obviously so you know I think the biggest part that we're going to probably touch on is the fact that ( S and selling it back to your customers at a marked up value um and and then that (18:48) that's what you main mainly worry about and we felt that um we didn't want to make money from customers like that we thought we could pass on cost savings improvements in Cloud uh performance directly to customers by saying here's the software we'll give you an automated experience to deploy in your own cloud account um so the data plane and control plane are totally owned by you Mr customer Mrs customer um and what that potentially offers in terms of just recharging or rebilling or passing on the cost that you're incurring of your own um consumption from the SAS platform you don't get the (20:19) benefits of multi tency which gives drives efficiencies in Revenue as such but doesn't sound like that's what you're about which I think is quite a unique proposition yeah in fact a lot of our customers for security reasons don't want multi-tenancy you know they they want to they they want their data com what industries are you hitting what type of use cases um is the right fit for Yellow Brick yeah and in fact you know what's one one common theme across a lot of our customers is that they want their data warehousing to be reliable at massive scale typically so they Enterprises that that range in (21:48) terms of verticals from some of the top global insurance companies the biggest hedge funds in the world um the largest Telos the biggest credit credit card companies and a lot of uh government thbrushes to to aircraft carriers and and nuclear reactors and stuff like that and all those little bits that go into it so all that Isn't that cool nice and track yeah yeah cool stuff and very important in today's world um I think that that is that is interesting in itself and in terms of the real time an analytics and workload management so is it typically someone that's getting there and sucking the data out (23:25) themselves for SQL queries is is it I'm picturing data scientists sitting the ack of you know in in Kind real time and then as you say we're being used as a foundational piece within an application stack as well a good example of that is a customer called Lex Lexus Nexus and they have a a product called threat metrics which does realtime fraud detection for e-commerce sites so if you swipe your credit card online by anything online the chances are that transaction will ultimately be analyzed by yellow brick to help that e-commerce uh business decide whether that transacti everybody every company on the planet has got some sort of cloud strategy or in some way on some (25:53) part of their customer Journey but in many cases there are workloads that will never go to the cloud either because in a particular region of the world there is no cloud provider and so they that work clod will remain on Prem um for the foreseeable future in some cases data is considered too sensitive to to leave uh the walls of a Data Center and so yellow brick is pretty much one of the onl e 2004 so what what do you think it looked like now compared to then what's the biggest difference in this warehous was it even called Data warehousing back in 2004 oh it was I mean dead warehousing goes back into the kind of 1980s so that that was a very very (27:23) established term and in fact you know it's what's really interesting is some sometimes we forget the historic things that were done and we end up Reinventing the wheel and and I think that's happened quite a few times in the world en source as well and we we're making this the stack entirely open and then you kind of give up control entirely and you move to a SAS model where basically you say I'm going to give all my data to a a data warehouse SAS provider to manage on my behalf and then I will pay them to access my data which sounds very odd um but You' what you've done is got you've got total (28:47) convenience but you've lost control of your data and what we're doing yellow brick is saying well you can have the conven hey all have some generative AI initiative going on and and you typically the use cases are you know a chatbot or a virtual assistant so when I talk to some of my insurance customers they're working on Virtual assistance to help support their actores when when they're making and you know running claims processing and things like that for insurance payouts so they they want smart assistance that can help um make that task more efficient and let them get to the data they need to to support those c g other work around natural language to SQL conversion um again that's available in open source today and we've got some (31:20) other other ideas of how we're we're applying this for financial control and workload analysis and and other areas too so so in the context of a chatbot um you know when rag comes into play retrieval augmented generation part of that whole process so the agent is calling some sort of typically an API or some sort of data source so in that sense your customers in a chat uter whatever it is but from the yellow brick perspective is it's is it putting any additional pressure because of the nature of the data that they're trying to pull well what's interesting it's putting pressure I think on our road map more than anything and you know (32:54) can you can use you know when when I give when you give the rag examples that with involving Yellow Brick today they're typically using some open or you know Internet available llm like you the open AI Suite or whatever but adjacent yeah we call it bring your own llm b y o l LM yeah yeah yeah yeah yeah that that's exactly you bring your own containerized llm that runs on on Hardware in your own data center so very much ah very very interesting so yeah so it's it's it's more of a natural fit than you know than anything I think because ultimately what um feeds these model these or the chat Bots which (34:27) leverage the model the rag is the data and so if you can get that data quickly and efficiently you're going t operational system is being streamed into something like as well in real time which means you can make real time inference based on that data as well so I think data warehouses are at the the center point of a lot of Core Business decisions and a great place to locate your your future sort of generative AI workloads actually that's interesting so the data streaming in as opposed to coming out so just just level set that just quickly because I think that's important to sort of especially in the analysis to thing to something to being right on the front line going like I need to make a business decision now based on I get it (36:52) and and speed and efficiency is the key there right so if you're taking data from an Internet of Things system pulling and streaming it in you want it to be stored efficiently in the structure that makes for you guys quickly to be able to pulled out at the other end as efficiently as possible that that's that's for me the that's for me the value prop and tha t to be able to elastically scale this stuff in their own data center so bring all of those cloud-like experience all those cloudy features of agility and all of that and elasticity deploy them in their own Data Center and kubernetes enables that and what's really exciting (38:15) is we're not this isn't that if we build it will they come we've got pull through from new prospects and customers asking us and and and running us ragged doing this as quickly as possible because they they want it nee is bringing your own container based analytics this is this is a way of of bringing third-party applications and container based software and integrate it with yellow brick in a very very easy way um one of the other there's a couple of kind of I would say catchup things we're doing because we're certainly not perfect we we we we haven't got a really good story around support for geospatial (39:49) analytics okay um have that in the product today we're working on it we we'll deliver the first p 0:48) jtjt domcom for more great content and all past episodes if you enjoyed this episode make sure to subscribe on your favorite podcast platform and on YouTube please spread the word and if you feel like it drop a review thanks for joining us and we'll see you next time on great things with great day [Music]"