Okay people, we have to talk about Lake Base from Data Bricks. People are
furious. [Music]
I mean, let's be real. The future is here and it is eating us up and spitting
our bones out. And the lake base is here. And the internet went crazy. I
mean, just when you thought the lakehouse was the latest and greatest
concept and technology, not to be outdone anytime soon, data bricks went
and had their little party and they announced a lakebase and there's a lot
of mad people and it's pretty funny. I mean, it was funny when they announced
lakebase, which by the way is managed OLAP postgress at scale inside the
databicks platform. We'll talk about that more, but man, when they released
that, people were mad and went crazy. I could just hear the clicketity clack of
all the fingers and all the CTOs and middle managers just eating this thing
up. They are going to adopt Lake Base. It's the best thing ever and I think
it's pretty cool. A lot of people said, "Hey, called Follow Pulled the red flag
and said, "Hey, follow man, what's going on? This is not new." But let's get into
it. I mean, let's be honest. Lake Base has been a decade in the making. That's
the truth. What is Lakebase? Data Bricks announced the lakebase. is a fully
managed Postgress OLAP engine that lives inside the databicks platform. You can
provision it as a databicks instance. So like it's a new type of compute. It's a
Postgress row level semantic transactions. It's basically meaning
this is a Postgress database that scales infinitely quote unquote. It's basically
a Postgress cluster inside your analytics platform. So it's Postgress
compatible. It's managed change data capture. They provide a unified
governance. So it's inside the data bricks platform with unique catalog. So
it's tied into all the governance and security has lakehouse hooks. So of
course it you know you can hook it up to data bricks apps feature engineering
serving SQL warehouses all that kind of thing and it's elastic scale you know it
separates storage and compute. So the idea behind lakebas is now you have
right this OLTP online transactional super lowle response time but it's at
scale aka you know postgress cluster hidden back there somewhere but it's
kind of like simplified and it's just like sits there and you can use it
however you want. So again why would people go fits over this? Why would one
side been saying this is the best thing ever and the other people saying hey you
suck this is not a big deal. What is what is going on? I mean to get some
context on this, let's just do a little history lesson. Okay, let's talk about
OOLAP versus OOLTP. So analytics versus online transactional processes. These
two have always been at odds since the beginning of time and a lot of companies
and platforms run both of these things at the same time. I mean the story
begins with the advent of relational databases, SQL Server, Oracle, MySQL,
Postgress, right? And that's kind of where the classic data warehouse came
along. The roots of our current lakehouse started there in concepts like
Kimble's data warehouse toolkit. You know, we didn't have tools like we have
today. You had to use SQL Server, Postgress, Oracle, whatever to run your
OLAP analytics, data warehouse, and that was really basically data modeling
things different inside those systems designed for OOLTPs, designed for low
latency transactional database, not necessarily analytics. So we solved that
problem with data modeling circuit 2007 you know the world's going fullborn SQL
server everything OLTP databases they are where all the transactional
processing took place and we use data modeling to shift into an OLAP mindset
that was well sered suited to serve up analytics on a large scale but you know
there's only a few people running hive and pig and htfs it's not that many
people were doing it you know they were making the SQL server work that simply
wasn't enough from a storage or compute viewpoint and that's where the data lake
came in you S3 got popular, people started pounding in Parkquet files which
got super popular at that time. Avaro, CSV, JSON files, dumping them in S3 data
lake baby put spark, bigquery, Atheni on top of it and now we have an analytic
system terabytes of messy data. We can crunch it and spend some money. Then
came the lakehouse and it saved us from that you know iceberg delta lake data
bricks came on the scene. It was like yeah now this is this is screaming. We
don't have to have anything to do with Postgress for analytics and this is
great. It scales. It's fast. You know, forever the line between the OLAP and
OLTP has been clear. And now Lake Base muddies that water. I mean, here's a
little image for all you hobbits who didn't already know it. But, you know,
if you don't know that, you probably shouldn't be watching this channel. I
mean, let's be honest. Recently, you know, we've had like Aurora clusters,
things like that, but things have been pretty different, you know. So, what is
Lake Base? Lacebase is trying to narrow that gap. It's trying to narrow that gap
and solve business problems. I think a lot of people complaining about
Lakebase, which I understand. It's not really anything new. It's the
implementation that's new. They're presenting it in a new way that provides
business value. That's what they're trying to do. So, if you want OOLTP, you
want OOLAP, data bricks is saying we can give you both of those inside our
platform. I mean, it's obviously smart, right? I mean, when you do architectural
drawing in the past, you would have those things totally separate. You would
maybe spin up an Aurora cluster in AWS over here, and then you've got your data
bricks or snowflake over here. or what data bricks is saying now is like hey
those boxes can be those two things can be inside the same box they don't have
to be different you know so they're trying to break down barriers between
data and increase innovation and reduce technical barriers that's what lakebase
does so again traditional architectures is a lot of like pumping data in between
back and forth between postgress and delta lake or iceberg back and forth ETL
pipelines d right I mean that's what we've been doing for 20 years lake base
they're And you know these things sit next to each other. You can treat them
inside this data brick system like the same thing. They're just a different
data source. You can you want the low latency transactional thing like here's
your lakebased Postgress thingy and you want to do your analytics at pedibyte
scale here's your lakehouse thingy and they work together and it's almost like
using the same tool except they have different use cases and you know that's
got obvious benefits. I mean, I don't really feel like going into tech
overview of Lakebase. I will probably do that at some point. It's on my list.
I'll get there. But basically, if you're interested, data bricks offers lakebases
like a type of instance compute, right? You could start it and run it. Going to
be expensive. Boy, you know that. I mean, it does have some limitations.
They fine print tells us got like a two TBTE max for an instance. I don't know.
That's actually not a lot of data. It's not going to work for a lot of people.
And also a,000 concurrent connections will turn some people off. I don't know
what to say. I'm sure that'll go up. I mean, the type of people that use this
stuff, you'd think they'd want, you know, something a little bit better, but
whatever. We'll see how it goes. Of course, data bricks made it easy to sync
data to and from Unity catalog tables. So again like I talked about breaking
down the barriers and going between the lakehouse and the lake base is super
easy you know and again you can access your now Postgress cluster database
thingy with your SQL editor with a datab bricks notebook with external tools like
you could with anything else like you can just treat it like a you know use
some Python get a Postgress package and connect to it like you could or you can
just play around in a data bricks notebook I mean that's kind of cool I
mean the cost let's not talk about the cost man 40 cents per DB CPU hour plus
the storage cost. Oh boy, man. Hide your wallets, kids. Dang it, it's expensive.
But again, is it really that expensive? I don't know. If you are running an
Aurora database or something or running a Postgress cluster, that's exactly not
like a cakewalk and it's expensive anyways. So, I suppose if you're a CTO
or some engineering leader, you could look at that and say, "Hey, this
architecture is way simpler. We're going to delete all sorts of code. This is
going to increase innovation all right here." You know, I get it. you know,
it's going to be worth it to some people. I mean, I did a little survey
when I wrote about Lake Base. I'll leave a note to I'll link to that Substack
post that I wrote about it and it's pretty interesting. I was like, "What do
you think the future of Lake Bases?" Most people are like, "I don't know."
50% of the people that took it said, you know, I don't know what's going to
happen. You know, another quarter of them said, "Nah, it's not going to
happen." And then another quarter said, "Give it to me." So, the world seems
split. Who knows what'll happen. Like Base is very much a database product.
Again, they're just making it super easy for data bricks people to do stuff. Is
it earthshattering? Yes and no. You know, scalable Postgress clusters have
been around for a long time in this format. Not really. That you know, this
is new. They're the first ones to say, "Dude, look at it's right here next to
your lakehouse. Like these things are basically can talk to each other. You
can use my platform and do massive analytics at scale, but you can also
use, you know, this Postgress cluster to do low latency stuff and it's right
here. And yeah, I get it, man. This is more of like a business thing, an
innovation thing. That's what data bricks is doing. They want to provide
you one-stop shop to do everything and they were adding OLTP as a piece of that
shop. That's what lakebase is. What do you think? Let me know.