Video Thumbnail 09:10
LakeBase from Databricks Is Changing Everything and People Are Mad!
5.4K
90
2025-07-01 09:32
Lake Base from Databricks is causing quite a stir, drawing comparisons to the data **lake house**. The excitement highlights the ongoing evolution of **data architect job** roles and the future of **data engineering**. Learn more about the **data lakehouse** and related concepts in this discussion of the future of **big data**.
Subtitles

Okay people, we have to talk about Lake Base from Data Bricks. People are

furious. [Music]

I mean, let's be real. The future is here and it is eating us up and spitting

our bones out. And the lake base is here. And the internet went crazy. I

mean, just when you thought the lakehouse was the latest and greatest

concept and technology, not to be outdone anytime soon, data bricks went

and had their little party and they announced a lakebase and there's a lot

of mad people and it's pretty funny. I mean, it was funny when they announced

lakebase, which by the way is managed OLAP postgress at scale inside the

databicks platform. We'll talk about that more, but man, when they released

that, people were mad and went crazy. I could just hear the clicketity clack of

all the fingers and all the CTOs and middle managers just eating this thing

up. They are going to adopt Lake Base. It's the best thing ever and I think

it's pretty cool. A lot of people said, "Hey, called Follow Pulled the red flag

and said, "Hey, follow man, what's going on? This is not new." But let's get into

it. I mean, let's be honest. Lake Base has been a decade in the making. That's

the truth. What is Lakebase? Data Bricks announced the lakebase. is a fully

managed Postgress OLAP engine that lives inside the databicks platform. You can

provision it as a databicks instance. So like it's a new type of compute. It's a

Postgress row level semantic transactions. It's basically meaning

this is a Postgress database that scales infinitely quote unquote. It's basically

a Postgress cluster inside your analytics platform. So it's Postgress

compatible. It's managed change data capture. They provide a unified

governance. So it's inside the data bricks platform with unique catalog. So

it's tied into all the governance and security has lakehouse hooks. So of

course it you know you can hook it up to data bricks apps feature engineering

serving SQL warehouses all that kind of thing and it's elastic scale you know it

separates storage and compute. So the idea behind lakebas is now you have

right this OLTP online transactional super lowle response time but it's at

scale aka you know postgress cluster hidden back there somewhere but it's

kind of like simplified and it's just like sits there and you can use it

however you want. So again why would people go fits over this? Why would one

side been saying this is the best thing ever and the other people saying hey you

suck this is not a big deal. What is what is going on? I mean to get some

context on this, let's just do a little history lesson. Okay, let's talk about

OOLAP versus OOLTP. So analytics versus online transactional processes. These

two have always been at odds since the beginning of time and a lot of companies

and platforms run both of these things at the same time. I mean the story

begins with the advent of relational databases, SQL Server, Oracle, MySQL,

Postgress, right? And that's kind of where the classic data warehouse came

along. The roots of our current lakehouse started there in concepts like

Kimble's data warehouse toolkit. You know, we didn't have tools like we have

today. You had to use SQL Server, Postgress, Oracle, whatever to run your

OLAP analytics, data warehouse, and that was really basically data modeling

things different inside those systems designed for OOLTPs, designed for low

latency transactional database, not necessarily analytics. So we solved that

problem with data modeling circuit 2007 you know the world's going fullborn SQL

server everything OLTP databases they are where all the transactional

processing took place and we use data modeling to shift into an OLAP mindset

that was well sered suited to serve up analytics on a large scale but you know

there's only a few people running hive and pig and htfs it's not that many

people were doing it you know they were making the SQL server work that simply

wasn't enough from a storage or compute viewpoint and that's where the data lake

came in you S3 got popular, people started pounding in Parkquet files which

got super popular at that time. Avaro, CSV, JSON files, dumping them in S3 data

lake baby put spark, bigquery, Atheni on top of it and now we have an analytic

system terabytes of messy data. We can crunch it and spend some money. Then

came the lakehouse and it saved us from that you know iceberg delta lake data

bricks came on the scene. It was like yeah now this is this is screaming. We

don't have to have anything to do with Postgress for analytics and this is

great. It scales. It's fast. You know, forever the line between the OLAP and

OLTP has been clear. And now Lake Base muddies that water. I mean, here's a

little image for all you hobbits who didn't already know it. But, you know,

if you don't know that, you probably shouldn't be watching this channel. I

mean, let's be honest. Recently, you know, we've had like Aurora clusters,

things like that, but things have been pretty different, you know. So, what is

Lake Base? Lacebase is trying to narrow that gap. It's trying to narrow that gap

and solve business problems. I think a lot of people complaining about

Lakebase, which I understand. It's not really anything new. It's the

implementation that's new. They're presenting it in a new way that provides

business value. That's what they're trying to do. So, if you want OOLTP, you

want OOLAP, data bricks is saying we can give you both of those inside our

platform. I mean, it's obviously smart, right? I mean, when you do architectural

drawing in the past, you would have those things totally separate. You would

maybe spin up an Aurora cluster in AWS over here, and then you've got your data

bricks or snowflake over here. or what data bricks is saying now is like hey

those boxes can be those two things can be inside the same box they don't have

to be different you know so they're trying to break down barriers between

data and increase innovation and reduce technical barriers that's what lakebase

does so again traditional architectures is a lot of like pumping data in between

back and forth between postgress and delta lake or iceberg back and forth ETL

pipelines d right I mean that's what we've been doing for 20 years lake base

they're And you know these things sit next to each other. You can treat them

inside this data brick system like the same thing. They're just a different

data source. You can you want the low latency transactional thing like here's

your lakebased Postgress thingy and you want to do your analytics at pedibyte

scale here's your lakehouse thingy and they work together and it's almost like

using the same tool except they have different use cases and you know that's

got obvious benefits. I mean, I don't really feel like going into tech

overview of Lakebase. I will probably do that at some point. It's on my list.

I'll get there. But basically, if you're interested, data bricks offers lakebases

like a type of instance compute, right? You could start it and run it. Going to

be expensive. Boy, you know that. I mean, it does have some limitations.

They fine print tells us got like a two TBTE max for an instance. I don't know.

That's actually not a lot of data. It's not going to work for a lot of people.

And also a,000 concurrent connections will turn some people off. I don't know

what to say. I'm sure that'll go up. I mean, the type of people that use this

stuff, you'd think they'd want, you know, something a little bit better, but

whatever. We'll see how it goes. Of course, data bricks made it easy to sync

data to and from Unity catalog tables. So again like I talked about breaking

down the barriers and going between the lakehouse and the lake base is super

easy you know and again you can access your now Postgress cluster database

thingy with your SQL editor with a datab bricks notebook with external tools like

you could with anything else like you can just treat it like a you know use

some Python get a Postgress package and connect to it like you could or you can

just play around in a data bricks notebook I mean that's kind of cool I

mean the cost let's not talk about the cost man 40 cents per DB CPU hour plus

the storage cost. Oh boy, man. Hide your wallets, kids. Dang it, it's expensive.

But again, is it really that expensive? I don't know. If you are running an

Aurora database or something or running a Postgress cluster, that's exactly not

like a cakewalk and it's expensive anyways. So, I suppose if you're a CTO

or some engineering leader, you could look at that and say, "Hey, this

architecture is way simpler. We're going to delete all sorts of code. This is

going to increase innovation all right here." You know, I get it. you know,

it's going to be worth it to some people. I mean, I did a little survey

when I wrote about Lake Base. I'll leave a note to I'll link to that Substack

post that I wrote about it and it's pretty interesting. I was like, "What do

you think the future of Lake Bases?" Most people are like, "I don't know."

50% of the people that took it said, you know, I don't know what's going to

happen. You know, another quarter of them said, "Nah, it's not going to

happen." And then another quarter said, "Give it to me." So, the world seems

split. Who knows what'll happen. Like Base is very much a database product.

Again, they're just making it super easy for data bricks people to do stuff. Is

it earthshattering? Yes and no. You know, scalable Postgress clusters have

been around for a long time in this format. Not really. That you know, this

is new. They're the first ones to say, "Dude, look at it's right here next to

your lakehouse. Like these things are basically can talk to each other. You

can use my platform and do massive analytics at scale, but you can also

use, you know, this Postgress cluster to do low latency stuff and it's right

here. And yeah, I get it, man. This is more of like a business thing, an

innovation thing. That's what data bricks is doing. They want to provide

you one-stop shop to do everything and they were adding OLTP as a piece of that

shop. That's what lakebase is. What do you think? Let me know.