339

March 29th, 2021 × #GraphQL#databases#performance

Hasty Treat - What is the n+1 problem?

Wes and Scott discuss the n+1 database query problem, explaining what it is and solutions like batching IDs, DataLoader, and Prisma.

or
Topic 0 00:25

What is the n+1 problem?

Wes Bos

CSF.

Scott Tolinski

Welcome to Syntax. In this Monday, hasty treat. We're gonna be talking all about A common question that is typically seen in things like interviews or perhaps just a general thing that you're gonna run into at some point in your development career, And it involves with the amount of times that you need to hit to load data. And this is called the n plus one.

Scott Tolinski

I don't know if it's called the n plus one, but it is an n plus one issue. And we're gonna be talking all about what the heck of that is or why it might be relevant to you, when you might come I'm across it, and what are the some of the solutions here that are out there? My name is Scott Tielinski. I'm a full stack developer from Denver, Colorado, and with me as always is Wes.

Wes Bos

Hey, everybody. I'm excited to talk about this specific n plus one problem.

Scott Tolinski

I am Excited to talk about it because it's not one of my favorite things in the whole world, and it's definitely one of those traps that's really easy to get into, especially in GraphQL, and It's exceedingly easy to just drop yourself into an n plus one issue and then have to learn about it. And there's not a ton of resources out there. It's like, oh, Oh, yeah. This thing that I might not even know is going on is happening. Oh, guess I gotta figure it out. Alright. This episode is sponsored by a couple of great companies.

Sponsor mentions Hasura and Sentry

Scott Tolinski

It is sponsored by Hasura as well as Sentry.

Scott Tolinski

Now Hasura is really, really cool. Now What it is is it's a data service that basically allows you instant access to get jamming on a database.

Scott Tolinski

And it opens up all sorts of API CRUD operations for you just out of the box. So let's say if you wanted to to get started with a GraphQL API, you didn't wanna have to think about a database. You didn't wanna have to think about Any of those layers, you could just create a HESR instance which uses Postgres, and it allows you to quickly and easily create all of those routes that you might need to use and Queries and mutations. I think that they're really good sponsor for this episode because it really helps you with some of these maintaining our GraphQL API situations.

Scott Tolinski

So Hasura gives you instant GraphQL on your data source, whether that is a Postgres or a Postgres family of databases, SQL, BigQuery, any of those things. I've only ever used it with Postgres, but it was very, very easy to get up and running. You don't need to write your own GraphQL server, which is a definitely a a pinpoint for a lot of people. And Hasura generates a whole bunch of stuff for you. Like I mentioned, all of those mutations and queries that you might want to have for your CRUD operations.

Scott Tolinski

The GraphQL servers are also the real time is really super easy with Hasura.

Scott Tolinski

And if you don't want to run your Stuff on Hasura's site, you can use an open source version of Hasura 2 and run it yourself. It's really pretty slick.

Scott Tolinski

So if you want to try Hasura, head on over to Hasura, h a s u r a dot info forward slash free trial.

Scott Tolinski

Use the coupon code tryhasura.

Scott Tolinski

That's n. T r y h a s u r a, all one word. Give that a go. And, man, this thing is super duper cool. So give it a try at hasura.infoforward/free trial and give it a rip. It's really, really, really cool. Also gonna talk about

Wes Bos

Sentry, which does Error and exception monitoring also does performance monitoring. I'm gonna talk today about their error and exception tracking. So what what it does is you install it in your app. They support literally every language out there, .net, PHP, Flask, iOS, JavaScript. Specifically, They can get into React, Laravel, Node. You you name it. They support that type of thing. So you install this thing into your application. Usually, just paste a couple lines of code, And it gives you information about what went wrong when something happens. So syntax error, 500 error, Cannot read property x of undefined. It will give you these little breadcrumbs that show you what happened, the what did the user click In order to make this error actually happen, which Git repo commit is this thing tied to specifically? Who should fix it? Who is assigned to this? How often is this happening? What browser is this happening on? All the information you need in order to Both jump on issues before your your users email you about something not working as well as just, like, clear insights into What was actually causing this? So check it out. Century dot I o. Use a coupon code tasty treat for 2 months for free. Thanks so much to Century for sponsoring.

Wes Bos

Alright. So let's talk about the n plus one problem. I thought we'd go through this because this sign that comes up in interview questions, or This is like a very computer sciency problem, and it does like Scott says, it pops up a lot in GraphQL Because often in your database, you have, relationships from content. So let's Say you wanna fetch a list of 10 podcasts, and then each podcast has 1 or 2 or 3 hosts.

Topic 2 05:00

Explain n+1 problem with podcast/host example

Wes Bos

You have a relationship between your data type, which is podcasts, and you have, that will have 1 or more hosts. Right? And then you need to grab a list of names for the podcast hosts, and then maybe you also wanna Go even further. And with GraphQL, it's really easy to be like, oh, well, I also would like to grab a list of other podcasts this person hosts. Right? And and from those podcasts, maybe grab the host for that. And then you can go you can go infinitely nested in GraphQL.

Wes Bos

And then sometimes we forget that there actually is a database behind the scenes in GraphQL that is actually running all of these queries.

Topic 3 06:10

n+1 happens in GraphQL with nested queries

Wes Bos

And and what can happen is that, Alright. A list of 10 podcasts. That's 1 query in your database. But then what happens is that it says, oh, for the 1st podcast, I need to get the list of Hosts for that podcast. And then the second one, it it does another one. So even though you just you did 1 query for 10 podcasts, It might have gone ahead and done 11 requests to your database because for every podcast that you have, you need to go ahead and Do another request to get the a list of hosts where the the host podcast is in this array. And that is what is referred 2 as the n plus one problem, meaning that for every n things that you have, it's possible that you Exponentially increase the amount of queries that go to your database, and that could lead to a potential issues in being that your Your requests are very, very slow. Right? Because if it takes a 100, 200 milliseconds for 10 podcasts, okay, but then then you gotta query Maybe 20 authors. Before you know it, you've got 5, 6 second requests happening Mhmm. And that's way too slow. Right? Can I give a,

Topic 4 07:18

Real example with Level Up Tutorials data

Scott Tolinski

an explanation of it too is how it relates even to the level of tutorial site, how we we fit it? Yeah. Yeah, please. So If you think about it like this, again, anytime you have data relationships, especially in a GraphQL context, this is could be a problem because the relationship is being taken place at the API layer. Right? So let's say we were to say, give me every tutorial in this specific series. There's 24 tutorials. But then I need to get Some information about the series itself. In the series, maybe the series title or it's whatever. So then I would say, alright. On each of these tutorials, also give me the the The playlist to DOTS title. Right? And what that does is, like you mentioned, is every single time I have to load the tutorial, then that takes that tutorial and it takes it to the next resolver, and that next resolver then does the playlist query.

Scott Tolinski

That playlist query has now been executed 20 some times or however many times exists in that playlist. However many videos are, even though The data that's coming back from that playlist query is going to be the same every single time. We're basically having 22 unnecessary DB calls because of that relationship at the the API layer. If this sounds over over your head or something like that, If you work in GraphQL for a couple months, you're gonna hit this error. So or it's not an error. You're gonna hit this problem where it's an over fetching From your your database problem. And is it crucial? It's gonna be crucial to understand long term, and it's gonna be crucial to fix long term. But there are many queries on the level of trail site where we're just kind of eating the cost because the queries are fast enough right now. Yeah. And some of the solutions are a little bit they're they're time intensive to implement. So That's a little bit of a a personal anecdote about how this can come across in any sort of normal database or normal data solution. You have a playlist. You have videos. Query the videos. Great playlist. It's that relationship happening at the API layer. Yeah. That I think that's a really good point that you said. Like, the solution often is don't worry about it because it's fast enough. You're really not having any queries that this thing is becoming a significant issue, and sometimes people like to just sit around and talk about this issue a lot more than is actually affecting you. So the 1st solution is don't worry about it. Is it actually an issue for you, or Yeah. Is this just a theoretical issue that is popping up in in your specific use case.

Topic 5 09:39

Solution is to batch IDs and query once

Wes Bos

The way that you actually do solve this thing is that let's go back to our podcast example. We have a list of 10 podcasts, And before you query the actual authors, in each podcast, you'll have, like, IDs, which are our relationship to a host.

Wes Bos

And what you can do is you loop over all 10 of those.

Wes Bos

And for each one, you don't go off and fetch the hosts. You just keep an array of all of the hosts that you do need to look up. So after you've looped over 10 podcasts, you're now gonna have an array of a. Anywhere from 8 to 30 podcast hosts that you say, okay. Now I have an array of, podcast hosts that I need to go look up, And then you can do a single request saying, give me all the podcast hosts that where their IDs match Any of them in this array of 8 to 30 IDs, then that's that's only 2 queries. And then you come back, and then you can go and and hydrate those back And so you can say, alright, the here's the ID for for Scott. It's 1, 2, 3, 4. Now put Scott's data in this relationship. Oh, here's Wes 4, 5, 6. Let me put Wes's info in this one, and that's just just two two things. And with MySQL and a lot of other languages, you can even go even further and do, like, left joins and go pretty complicated on this type of thing. Again, maybe not necessarily needed. Sometimes 2 queries is totally fine.

Wes Bos

So that's the solution to the n plus one problem. Save an array of IDs and then look them up in one go.

Topic 6 11:11

Tools like DataLoader can help solve n+1

Scott Tolinski

Now the next question we have here is, like, what do you Actually use, like, when it comes to tech. Right? Yeah. Are are you rolling that by hand? I'm not rolling that by hand. Maybe I am at some points, but I don't want to. You know? What are some of the the ones that you've heard of to solve this, Scott? So the big one is DataLoader. Right? There's this thing made by Facebook called DataLoader.

Scott Tolinski

One of the biggest pain points in GraphQL, especially when you're getting started with, like, Apollo like, I I feel like Apollo server has, like, a thing about And plus 1 queries. And their solution is just use DataLoader. And then you go to DataLoader's you go to DataLoader, and DataLoader n. It has no mention of Apollo or it has no mention of any of these other things. It's just like kind of abstract.

Scott Tolinski

The repo is And this is no shade on DataLoader because it is in an kind of an intense project. But the ReadMe for DataLoader let me see what this actually comes ends up coming out To be, I'm seeing how many lines of code this thing is, is 700 lines of code or 700 lines in the read me. And it is just a huge a file of kind of wishy washy explanations on on stuff. Again, no shade there. It's just that this is a very generalized tool. And so for me, I would come into DataLoader, and people just be just use DataLoader.

Scott Tolinski

Oh, okay. Let's see, some examples of that. Let's I'm trying to see examples here. So what's nice is that I the GraphQL API layer that we started using, which is Mercurius, which is a Fastify based, it's very, very good. It's very fast. They actually have data loader built into it, and it makes it really easy. You just have a you know, when you register your API, you'd say, you know, here's my schema, here's my resolvers.

Scott Tolinski

With Mercurius, you say, here's my schema, here's my resolvers, and here's my loaders.

Scott Tolinski

And it's basically like an object of what those loaders are, and you get f. Even access to, like, turning off or on caching for them. So be curious, really shout out to not only making it a part of your documentation, but making it easy to parse and understand. And it, like, says right up front, a loader is a utility to avoid the one plus end problem in GraphQL.

Wes Bos

Thank you. Thank you. Here's how to use it. I think the answer to a lot of these is your software will take care of it for you, which is ideal because, like, this is not something that A front end dev should be having to to deal with. Right? So Right. The MongoDB Ruby API has this thing called eager loading, so does Laravel.

Wes Bos

They have eager loading built in. So if you know ahead of time that you could possibly have an n plus one problem, which is why we're doing this show, Then you just have to use this thing called eager loading, and it will fetch them in the way that that we described. You can also use, like, aggregation. Yeah. Yes. That's what I was just gonna say. If if you're using MongoDB aggregation again, this this problem applies to all databases. But if you're using MongoDB, You can use aggregation pipelines, and aggregations basically is allowing you to run these Multiple queries being like, look up these and then group them and then count them. And then for the property author, look up the authors based on the ID, and You can basically do these really, really big queries all inside MongoDB because MongoDB has its own query language. You can even run produced inside of MongoDB, which is really cool, and that is a 100,000 times faster than fetching the data And running the reduces in your own JavaScript on the server. Like like, if I, like, run, like, stats on my entire year's worth of sales, and I wanna see how much I made and How many free courses and and grouping the courses by and how much average the average cost per course and all of that data? If I query all that data and then loop over it, it's megabytes of data, a JSON, and then I have to loop over. It's very, very slow. But if you run that actually all in the database, It's much faster, and, also, you don't have to worry about those possible problems.

Scott Tolinski

Yeah. This it's it's so funny because The aggregation pipeline in MongoDB is exceedingly useful and very, very good.

Topic 7 15:09

MongoDB aggregations can solve n+1

Scott Tolinski

But, man, is that syntax of 2 sometimes. You I feel like I cannot write it without, like, living in that documentation for a little bit. I just have, like, a folder of

Wes Bos

Previously used, aggregations.

Wes Bos

Yeah. Because, like, once you get into it, you you understand it, but, like, at once every 3 or 4 months, you're like, oh, how do I How do I look up how do I count the number of times that this course

Scott Tolinski

has sold? I keep forgetting. Yeah. And I I found the documentation to be, Along with any of Mongo's documentation, a little iffy.

Scott Tolinski

But it's okay. It's definitely doable. Have you used Mongoose has some sort of populate. Have you used populate in Mongoose?

Wes Bos

I have. Yes. That's way way better syntax, I think. I use that that quite a bit. So the way populate works is if you query the you say, find podcasts where the date is in the last f. 10 days and limit 20 of them. And then you just tag on a dot populate, and you say populate hosts. And it will go ahead and populate all the data for those. I don't know what it does Under the hood? I don't think that it breaks it down to an aggregation. I think it does. Oh, really? Mongo d b well, actually sorry.

Scott Tolinski

I've read I've read 1 sentence, and I kinda half read it. And I saw the word dollar sign lookup, and I was like, oh, yeah. Of course. It no. I I don't know if it does either. It'd be under the hood. It says, MongoDB has the join like syntax, dollar sign lookup aggregation operator. Mongoose has a more powerful alternative called populate. Yeah. So I don't know what it uses under the hood, but, it's definitely for the same thing if you're using Mongoose, which we are. Yeah. I'm using Mongoose as well. I use it quite a bit. I don't have

Wes Bos

Very many relationships where this could be a problem, so I've not run into it specifically myself. But I would like to look into whether what populate does use. And the the way you could tell what it uses is you just turn on query logging in Mongoose.

Topic 8 16:58

Mongoose populate helper

Wes Bos

And then every time that you make a query, it will log out to the Your console, what the query was. And if you're seeing 7 or 8 queries run versus 1, then you know, okay, it's running multiple queries for me In order to to get all of this data, and it'll also tell you how fast those queries run. And it's important to test those on a Hosted version of MongoDB and not your local version because the local version will be much faster because there's no there's no network transit time when that happens. That's also one of the neat things about

Scott Tolinski

man, Apollo keeps changing the names of everything. I don't know if it's Apollo engine. It used to be Apollo engine

Wes Bos

Where you can explore

Topic 9 17:52

Apollo Studio monitors n+1

Scott Tolinski

what is taking so long. Yeah. So there's an Apollo product. Like, Apollo, if you're always wondering how They make money. This is kind of one of the ways that they make money is to say, you know, here, you can use our our free tools for creating GraphQL stuff, But we also have these like paid tools that are really full featured and help you. Right? It looks like is it called Apollo Studio now? Yeah. It looks like it's called Apollo Studio. It it helps you build, validate, secure your organization's data graph. And one of the things that this does is it gives you a ton of metrics on potential n plus one errors issues. So I've used this in the past. It did work phenomenally well with our API. Now that I'm not on Apollo anymore for our server, I wonder If there's a different solution for that, but that's what I I've used in the past, and it's very nice. Awesome. Last thing I'm gonna put here is Prisma. I don't know if you would call them an ORM or whatever, but, basically, they The whole world confuses me, that language. Yeah. Prisma sits on top of multiple databases

Wes Bos

And we'll give you an a GraphQL API.

Wes Bos

So they're responsible for the n plus one problem, And they have a talk. I've only watched a little bit of it. It's about half an hour. I'm gonna watch it, once I get a little bit more time, but they say how Prisma solves the n plus one problem in GraphQL resolvers. So if you use Prisma, then they just take care of all that for you. Sick. Alright. Hopefully, that was helpful.

Topic 10 18:56

Prisma solves n+1 for you

Scott Tolinski

If you get a job because you nailed what the n plus one problem is, then send Scott and I $5 each. If they say, what is the n plus one error? You just say, Too many DB calls hits DB too much, too much, and that that's it. I mean, that's what I would say, like, just pounding like a caveman. It's too much DB.

Scott Tolinski

Too much DB. Yeah. That's really it.

Wes Bos

Alright. That is all we have for you. Thanks so much for tuning in, and we will catch You on Wednesday. Peace.

Scott Tolinski

Peace. Head on over to syntax.fm for a Full archive of all of our shows. And don't forget to subscribe in your podcast player or drop a review if you like this show.

Share

Play / pause the audio
Minimize / expand the player
Mute / unmute the audio
Seek backward 30 seconds
Seek forward 30 seconds
Increase playback rate
Decrease playback rate
Show / hide this window