Skip to main content
792

July 8th, 2024 × #SEO#Sitemaps#Web Development

Perfect Sitemaps for SEO

Wes and Scott discuss why you need a sitemap, what should be in it, and how to generate and submit it properly for SEO.

or
Topic 0 00:00

Transcript

Scott Tolinski

Welcome to Syntax. In this Monday, hasty treat, we're gonna be talking about how you can have the perfect sitemap for your application. So that way you can, land some SEO scores that are awesome, but not only that, get all your pages listed accurately and all the things we love about search engines. My name is Scott Tolinski. I'm a developer from Denver, and with me JS always is Wes Bos. What's up? Hey. Hi. Good. I'm kinda back. I'm I'd I recorded 1 before this, but I was on, paternity leave for a couple weeks. And,

Wes Bos

shout out to CJ for filling in there. But it feels good to be back in the horse

Topic 1 00:34

Wes is back from paternity leave

Scott Tolinski

and, recording back in my office. It's real nice. It does feel good to be back in the horse. I also, am back. I I was, I did a so I was at JS Nation and Yeah. How's that? Amsterdam. Credible. Amazing conference. Maybe maybe the best conference I've been to in terms of, like, organization structure and all that stuff. 1 of the things I really love for me as a speaker is they they put every single thing that I was supposed to be a part of on my Google Calendar. That way, I was just like that. I I had everything I knew where I was supposed to be at all times. And, for somebody like me, that's, really, really necessary.

Wes Bos

I love that because you don't have to fight with the time zones either. You just, like, you just look at your phone. Where do I need to be right now? Although Wes I was there, I did I literally missed the boat.

Topic 2 01:15

Scott almost missed the boat to a conference

Scott Tolinski

So that because there's a there was a boat to the boat, and I missed the boat to the boat. Like, I ran up to it, and everyone's like, they were like like, it was like 1 of those cruise ship things. They're, like, leaving. But, luckily, 1 of the attendees there had a car, and he's like, don't worry. We'll drive to it. So we, we we drove to the big boat. I honestly wish I would have missed the boat because that boat was hot, and there was a lot of people in it. And it was like Yeah. It felt like ESLint. Yeah. It felt like it was going to sink. I was very, I'm very nervous. But, you know, other than that, it was awesome, man. And then we got to explore Amsterdam for a week, and then we flew over to Italy and went to Venice and Florence. Like, I did the trip with my kids, so it was, like, a really cool experience for them getting to see all that stuff. Awesome. Yeah. But, also, 1 nice thing about being back is that I get to see all of the errors that occurred while I was gone in our century, which that always happens. So, if you if you wanna be able to track all the stuff that happens while you're away, so that way you can fix it when you get back or maybe get the alerts while you're on vacation and have them go somebody else to have them fix it. Head on over to century. Io. It tracks all your errors and the issues in your application, and it really make solving bugs easy JS really what it comes down to. And this show is brought to you by Sanity. So thank you so much, Sentry.

Topic 3 02:33

Scott built a new website and realized he needs to optimize for SEO

Scott Tolinski

So let's talk about site maps. I, while I was on the plane to the conference, I decided to build a new website for myself as part of, like, my my talk, and it's not done. It's they're still dusty. There's still a lot of, like I haven't even touched the mobile layout for this thing yet. But what I wanted JS I wanted to have all of my demos be, like, embeds in this website. And Yeah. I've never had a, like, a blog or anything. My my websites have always just been very simple. And this time, I was like, well, if I'm gonna do all these demos for the talk, they're gonna all be embedded iframes. Let's kinda have this be a a blog of some kind. So I made it a blog. And the next thing you know, I'm thinking, well, now I got AAAA blog.

Topic 4 03:21

Scott wonders why we need sitemaps and what should be in them nowadays

Scott Tolinski

I should be really thinking about SEO on this thing. So sitemap time. And then it got me thinking, what about sitemaps? It's something that I've kind of just done on every site. I have a a site map sort of route for SvelteKit that I just port over to each application Yeah. Or I I run a generator or something, and I just I just use a site map.

Scott Tolinski

And I I began thinking about the individual fields. Like, what's actually necessary? What do I need in here? And how is the how JS it changed over time? Like, what is a site map doing for me? So I I figured this would be a really great episode to just get into all about site maps. Why do you need them? What can you do with them? What do you even need within a site map in Wes 2024?

Wes Bos

Yeah. That's good. And, you've built the initial site map for the syntax website as Wes, and that was really nice because I've been going through not anymore, but probably over the last 6 months, I've been watching the Google Webmaster Tools trying to get our content indexed. Since we made, like, a pretty major shift in terms of, like, additional pages, there was a lot more content on the website from the old 1. It was kinda interesting to see, like, how do you tell Google, hey.

Wes Bos

There's now, what, an extra 1500 pages on this website.

Scott Tolinski

Yeah. And that's really the answer JS the sitemap. So you might be wondering why why do I need a sitemap? What does it do for me? How does it improve SEO? All of those things. Well, 1st and foremost, a site map really allows for better indexing and monitoring of your indexed pages.

Topic 5 04:51

Sitemaps help search engines index and monitor pages better

Scott Tolinski

So it lets the search engines know exactly the map of your site, hence the name site map. It lets it lets the search engines know the structure of them. And in turn, it allows the search engines to also be aware of potential changes and updates to certain pages and when to essentially relook at those pages.

Scott Tolinski

And with that improved idea of structure for your application comes along better SEO. Now a site map itself doesn't, like, help with the ranking itself. Right? It's not going like, having a site map isn't just going to get you ranked high, but it's going to help the crawlers find the relevant content on your applications

Topic 6 05:30

Sitemaps don't help ranking but help crawlers find relevant content

Wes Bos

and understand, like, what the general structure of your application is without it having to guess. Right? Yeah. Yeah. You can't use it to trick Google into pages that are not linked from anywhere. Like, Google still has to be able to find that this is a page you're telling me about, but where have you linked from it? Right? Is it is it being linked from another website? Is it being internally linked from inside of the page? Like, for for us, it was the transcript page, which is Mhmm. It was a brand new page, and I wanted all of that to be indexed because it's a lot of lot of good information. And that's, like, very good for SEO if you're searching for a specific topic. In fact, I find that when I Google for a specific syntax episode, I'll often the transcript page will actually come up before the actual show notes page because the transcript page has literally every word we've we've spoken inside of it. Yeah. But, initially, I had a hard time getting those, like, indexed from Google. And, it was a mix of, like, how often is it updated, should I be crawling this page, and all the stuff we'll talk about today.

Scott Tolinski

Yeah. Totally. So 1st and foremost, let's get started. What kind of formats do you need, or can you use for a sitemap? You know, I always personally just thought it was XML only. And looking into it, you can I'm surprised to see this. So yeah. So according to Google and all the information out there, you can have a sitemap that's in XML, like what most of them are. You can also have a site map that's just RSS.

Topic 7 06:55

Plain text sitemap is just URLs, one per line

Scott Tolinski

So if you're publishing an RSS feed of your entire application, you can use that as the site map. And 3rd, you can have a text file, a TXT file, which is literally just 1 page per line.

Scott Tolinski

Incredibly simple. Right? Just straight up 1, full URL per line. That's it. That works as a site map. That's a totally valid site map. And, what's the file called? Sitemap dot txt? Correct. Yep.

Wes Bos

And it does does sitemap have to be I might be getting ahead of us ourselves right now, but does a sitemap have to be named sitemap.xml? Or That's just a good question. Like, a meta tag that you can you can put?

Topic 8 07:23

Sitemap can be named anything, not just sitemap.xml

Scott Tolinski

This is 1 I have to Google. I did not even think about this. I it's something you just always do. You know? Which is funny. You can name your site map anything you want according to Google. You just have to when you submit it to the search engines, developer consoles, or whatever they are, search consoles, you just specify the path to it. I had no idea. I would have assumed that you would Node to name it sitemap just based on the convention. But according to Google, you could name it anything you want, and you can place it anywhere on any route. So it doesn't need to be just forward slash sitemap.

Wes Bos

That's really handy if you have for whatever reason, you don't have control over top level routes, because your application doesn't allow you to do that, it would be nice to to be able to do that. I probably would still try my darnest to make it sitemap.xml because it like robots Scott txt, it's a standard

Topic 9 08:09

But sitemap.xml is a standard worth following

Scott Tolinski

that There's no reason not to follow pnpm. Wes absolutely can't. Right? Yeah. So on top of that, you might be wondering why. Would I use Scott text or RSS or XML? You'd use RSS if you already had an RSS feed for your entire application. It exists. You don't need to make another 1. XML is probably what you're going to go with anyways because JS you'll see, it allows for more information. It also allows for indexing of, like, media itself Wes you can't do that with the text file.

Topic 10 08:54

XML sitemap is most flexible and allows more metadata

Scott Tolinski

The types of things and we'll even get into after this, like, which additional metadata even makes sense in a, site map today. But XML is you know, it's the most verbose of these options probably at maybe other than RSS. I mean, a TXT is, like, super simple. But pnpm XML gives you flexibility to be able to add more information, so you're probably gonna use XML anyways.

Scott Tolinski

Apparently, the limit for a sitemap is 50, 000 URLs on a sitemap and 50 megabytes. If you go over 50, 000 URLs, you can create a multiple different sitemaps. You can have Sanity sitemaps.

Wes Bos

Yeah. If you if you go and peruse I I do this a little bit myself to find unlisted URLs on websites.

Wes Bos

Like, my my wife was really excited about this dress coming out once, so I wrote a little scraper that would download the site map every so often, and it would the site map often lists even all the images that were uploaded to the website, all of the pages that are on the website. And, often, those pages are public, but they're not linked anywhere just yet. So it's kinda security bay by obscurity. So you can download the site map and, and see all of the pages of the website, and you can sort of peruse through that looking for unlisted pages.

Wes Bos

But often especially with, like, Shopify websites, you'll find Scott of like an index site map, and then it links off to tags site map and product site map and blog post site map. Each 1 has their own site map.

Scott Tolinski

Yep. Yeah. And that that's particularly useful when you have a whole lot of content.

Scott Tolinski

You also want for your URLs in a sitemap to be fully qualified, which means you need the whole dang URL.

Scott Tolinski

Relative URLs aren't gonna cut it. You want that HTTPS in there. You want the whole deal. You want that whole dang URL to make sure that it that that's what it's looking for. Right? It's not looking for anything relative here. Let's talk about fields and metadata for XML based, site maps. And then I'm gonna got a pop quiz for you, Wes. We have several different potential metadata fields. Let's say priority, change frequency, and last mod.

Topic 11 11:06

Last mod date is the only sitemap field search engines use now

Scott Tolinski

Which of those actually matter in 2024?

Wes Bos

Priority, change frequency, and last modified.

Wes Bos

I would say, like, priority doesn't matter because the days of telling Google what's important are are are over. They can figure that out themselves.

Wes Bos

I'm gonna say frequency is is important because if you have a page that is frequently updated, that needs to be reindexed every hour or something that's like a blog post and you'll never update that again The answer is is that change frequency

Scott Tolinski

is completely ignored by both Bing and Google.

Scott Tolinski

Priority is also ignored by Google JS you mentioned, but but Bing does use priority.

Topic 12 11:50

Bing uses sitemap priority but Google ignores it

Scott Tolinski

So it does kind of make sense to still use priority. It doesn't make any sense to use change frequency, but, apparently, Bing does use priority. Google ignores it.

Scott Tolinski

Last mod, though, is the 1 that all of them care about.

Scott Tolinski

And it only matters last mod only matters if you're using it consistency consistently.

Scott Tolinski

So, like, let's imagine every single time you update your entire application, the sitemap rebuilds and sets all of your pages to be, last mod today. You know? Okay. Then that's not good. You really want it to truly be when this page was last modified for Google to actually, use that in an important way to know which content has been recently updated. So, yeah, this was new to me because I I personally used, priority and change freak, but did not use last mod. So I was, like, using the 2 that didn't matter. Oh.

Topic 13 12:45

We should update the Syntax sitemap fields

Scott Tolinski

What about the syntax website? Maybe we need to change that? I believe the syntax website has priority, and that's it. Maybe it does have change frequency because we should definitely change that, though. It's a bit of a a bit of a pain to do that, but it we should definitely change it. It's changed frequency and priority

Wes Bos

are on the syntax 1. And you're telling us we you only need last mod?

Scott Tolinski

Yeah. And even then last mod is like you know, it only helps if you have content that is changing. Right? Yeah. If your content is published once and doesn't change, last mod isn't really helping you out that much. Okay. Yeah. So maybe maybe just nuke them all and like, be because we've been going through

Topic 14 13:25

Getting all Syntax content indexed recently got much easier

Wes Bos

the syntax website, and it's crazy looking at the webmaster tools both, like, over the last 6 months, a year, getting all of the pages indexed and finally to a point where Google knows about every single page. Because, like, even when we migrated, there was a point where, like, you couldn't find specific episodes on Google. Like, it was not finding at all, so we had to really work at that. But the Google changed their algorithm recently, and I I posted a tweet about this. We've mentioned it on the last episode as well.

Wes Bos

The amount that we're showing up on search results JS just we went right up with that algorithm change. So you Sanity.

Wes Bos

We're not even doing the best practice here, and Google's, like, obviously showing our stuff a lot more frequently,

Scott Tolinski

recently, which is good. And you're not gonna get docked for using priority and change frequency. It's just that, Google does not care about them. Priority, like I said, Bing Bing has it, so you might as well have it anyways. What types of things shouldn't be in your sitemap? Typically, dynamic user pages and account pages, so, like, my account or, my account for like, the stuff that's not going to really need to be indexed. Right? You you don't want a search engine to take you to the my account page because that's something that's hidden behind the login wall. Right? So any dynamic user pages shouldn't be in your sitemap. Yeah. URLs with parameters. So parameter based URLs should not be in the sitemap because Yeah. If you think about it, parameters can be anything. Right? That's another 1 we had to figure out with the canonical

Wes Bos

URLs on the syntax website is we have forward slash shows, and that needs to be indexed.

Topic 15 14:56

Parameters and future/unpublished pages should not be in the sitemap

Wes Bos

And then we also have forward slash shows and type equals hasty, tasty, or supper. That needs to be indexed.

Wes Bos

But the pages of every single 1 of them, like page 1, page 2, etcetera, those don't need to be indexed because well, no. The pages do need to be indexed, but the some of the search filters do not need to be indexed. And I remember I had to write a very complex thing to sort of figure out what the canonical URL was because there's unlimited combinations of the query params of, like, pages, how many per page. That was the other 1. And a couple other filters that, like, there's there's unlimited. And if you go into the Google Webmaster Tools, it says something like 6, 000 pages are not being indexed Yeah. Because you told us not to.

Wes Bos

And I was like, good. Like, those yeah. We don't want you to index the page 4 of 15 per page.

Scott Tolinski

Yep.

Scott Tolinski

Redirects also should not be in your sitemap or duplicate or disallowed pages,

Wes Bos

things that you have, like, being blocked in your robots Scott txt. I got 1 more here, and this JS, a problem we had is these shows.

Topic 16 16:17

Only published, non-redirect pages should be included

Wes Bos

The basically, the way that we create our site map is we just query the database for all the shows, and we query the database for all the guests. And, basically, anything that's a page, we just query it and and use a function to generate the URL for it. Right? But in that case, we were we forgot to filter out for future shows, and it was telling Google, hey. There's a page here.

Wes Bos

And then Google would go to that URL, and it would find this page is coming soon.

Wes Bos

And that was a bit of an issue because when it was published, then Google would would not know about the content until it eventually crawled that page again. So we had to filter that out and say,

Scott Tolinski

only show pages that are obviously in a published state Yes. And not Node future 1. Yeah. Right. That's a a good good 1 here. So, practically, how do you make a sitemap? Well, there's lots of ways. If you're using a meta framework, many of these meta frameworks these days have, like, a generator plug in that will just do it for you. I know there's certainly 1 for SvelteKit, and I can't imagine there's not 1 for Next. Wes. That seems like something that'd be very obvious to have. So you could just Google Next. Js sitemap or any of that. And and oftentimes, what they are Deno, the way they work JS that it's a part of, like, a build step. You put in, like, a post build script. Once your application is built, it would run a post build script. It would scan the routes the way that the application works, and then it would publish your site map for you. It's hard to do with the last modified. It was the, yeah, the last mod. So that 1 might be 1 that you might have to, you know, work in how you do it dynamically. That gets a little bit easier in, like, a CMS based site if you have that kind of metadata property on your field.

Topic 17 17:00

Ways to generate a sitemap: meta framework built-ins or custom route

Scott Tolinski

The way that I've always built the site map, in I I don't personally feel like I need these generators, although the generators are great because they can, save you from missing things.

Scott Tolinski

But, essentially, you're just you're just creating an XML file. And the way that I've always done it is just have a route that returns an XML file that I've created as a string. So you create the string. You load in your your collections from data. You loop over them, and you output those routes. In that way, you have full control over it. You could even insert the last mod property if you have a updated at field in that data, and then that becomes dynamic and easy for you to do that instead of these generated ones which have a little bit harder time with that type of thing. So that's all I've always done it. But, again, the generated ones are nice and simple and make sure you don't forget anything, or you can create it yourself. I I don't recommend just making these by hand. That would that would take forever.

Scott Tolinski

Even, like, even my website, Tolinski, has, like, you know, like, 70 links or something on it, 70 pages or something already, and I just made the dang site. So it it just feels like that would be a hard thing to do. So find find a generator or a way to do it that's baked into your platform. Typically, every platform has a way.

Topic 18 18:49

Hand-writing a large sitemap takes too much effort

Wes Bos

Yeah. Like, they they have this concept of pages. If it's, like, totally from scratch, like your personal website or, the Syntax website where, like, there's no concept of a page, right, You can just like Scott said, you can concatenate a string and throw it out the door. I would probably keep a array of pages and just store them as, like, objects and then grab some sort of, like, JSON to XML plug in off NPM and then convert it out the out the door.

Topic 19 19:37

Store sitemap pages as data objects first before outputting as XML

Wes Bos

Sitemaps are pretty simple, so I don't know if that's if that's overkill versus just concatenating a string or not. But when it comes to, like, oh, did I already add this 1? You Node, does this URL exist previously? Well, let me search for it in the array.

Wes Bos

If that's the case, then it's it's sometimes nicer to to deal with, like, a actual data object first. And then right before you kick it out the door, convert it to XML because that sucks working with XML.

Scott Tolinski

Yeah. For sure. So what do you do once you have it? You validate it. There's ton of these validation things online. You just do site map validator, paste in your site map link. It'll typically tell you if something's broken in it. Even if you submit it to webmaster tools, it will tell you what's wrong with it.

Scott Tolinski

And then after you have it, you'll want to submit it to your search engine so that way they become aware of it. So the the 2 ones that you really need to worry about are the Bing webmaster tools, bing.comforward/webmastersforward/about.

Topic 20 20:21

Validate sitemap with online tools before submitting it

Scott Tolinski

We have the link in the show notes or Google Search Console.

Scott Tolinski

Both of those 2 are are are the 2 big ones. Right? Because I believe DuckDuckGo even uses Bing's, search.

Scott Tolinski

If if I'm if I'm not correct about that, please somebody update me. But either way, these are the 2 places you wanna create an account, submit them. I know Google, requires you to have a text record within your DNS, to connect your site, then you submit the site map. Bingo. Bingo. It is all good. And it typically tells you again, like, which pages are being indexed, and and it gives you some good feedback on your application. So if you have anything that's a long running website that you want to be indexed, having these, webmaster tools and Google Search Console are are 2 things that you're gonna want to be familiar with regardless. You you know what's 1 thing?

Topic 21 20:44

Submit sitemap to Bing and Google webmaster tools

Wes Bos

I was just looking at our search console, and it says discovered videos.

Wes Bos

That's probably worth doing. I always I often wonder that. You Node, like, you go to the video tab of Google search, how to get your video to show up on that tab. I think I thought it was a mix of, like, the proper XML or or that what's that? LD JSON? Yeah. JSON LD, which is for linking data. That's used sort of like a meta tag. But instead of putting it in the head of the document, you simply just dump the JSON into the body, and Google will pick it up there. But it looks like you can also there's also specific video tags for sitemap.xml, which will tell Google about videos, which is neat.

Scott Tolinski

Sick.

Wes Bos

Cool. Yeah.

Wes Bos

1 more tip I have here is cache them. Your sitemap can be 1 of the largest files that is accessible to your website. And if they are generated on demand, that can be very taxing on your database. Yes. If it's it's literally querying every single record in your database in a lot of cases and looping over it or at least pages. And and then that file itself is is fairly large because it's it's all text. Right? And it's possibly an attack vector against your bill, both your database bill as well as your if you're if you're using something like, a render or a Vercel to to generate the site map .XML and you don't have the proper caching headers on those, then that could be somewhere where somebody could just continually hit it, and it will it will cause a very large bandwidth bill on your end. So throwing caching headers, putting a CDN in front of it, probably a good idea.

Topic 22 23:10

Cache sitemaps to avoid heavy DB and bandwidth loads

Scott Tolinski

Sick. Cool. Well, that's all I got for site maps. If y'all have any additional tips or tricks, leave a comment in the video below. But that should be enough, and you can find this video on YouTube if you wanna leave that comment. Youtube.comforward/adsyntaxfm.

Scott Tolinski

That's all I got for you. Thanks so much for watching or listening.

Wes Bos

Grab a t shirt. Century.shop.

Wes Bos

Peace.

Share