Hubbry Logo
SitemapsSitemapsMain
Open search
Sitemaps
Community hub
Sitemaps
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Sitemaps
Sitemaps
from Wikipedia

Sitemaps is a protocol in XML format meant for a webmaster to inform search engines about URLs on a website that are available for web crawling. It allows webmasters to include additional information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs of the site. This allows search engines to crawl the site more efficiently and to find URLs that may be isolated from the rest of the site's content. The Sitemaps protocol is a URL inclusion protocol and complements robots.txt, a URL exclusion protocol.

History

[edit]

Google first introduced Sitemaps 0.84 in June 2005 so web developers could publish lists of links from across their sites.[1] Google, Yahoo! and Microsoft announced joint support for the Sitemaps protocol in November 2006.[2] The schema version was changed to "Sitemap 0.90", but no other changes were made.

In April 2007, Ask.com and IBM announced support for Sitemaps.[3] Also, Google, Yahoo, MSN announced auto-discovery for sitemaps through robots.txt. In May 2007, the state governments of Arizona, California, Utah and Virginia announced they would use Sitemaps on their web sites.[4]

The Sitemaps protocol is based on ideas[5] from "Crawler-friendly Web Servers,"[6] with improvements including auto-discovery through robots.txt and the ability to specify the priority and change frequency of pages.

Purpose

[edit]

Sitemaps are particularly beneficial on websites where:

  • Some areas of the website are not available through the browsable interface[7]
  • Webmasters use rich Ajax, Silverlight, or Flash content that is not normally processed by search engines.
  • The site is very large and there is a chance for the web crawlers to overlook some of the new or recently updated content[7]
  • When websites have a huge number of pages that are isolated or not well linked together, or[7]
  • When a website has few external links[7]
  • The website contains a large amount of rich media content (such as video or images) or is included in Google News.[8]

File format

[edit]

The Sitemap Protocol format consists of XML tags. The file itself must be UTF-8 encoded. Sitemaps can also be just a plain text list of URLs. They can also be compressed in .gz format.

A sample Sitemap that contains just one URL and uses all optional tags is shown below.

<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
    xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
        <loc>https://example.com/</loc>
        <lastmod>2006-11-18</lastmod>
        <changefreq>daily</changefreq>
        <priority>0.8</priority>
    </url>
</urlset>

The Sitemap XML protocol is also extended to provide a way of listing multiple Sitemaps in a 'Sitemap index' file. The maximum Sitemap size of 50 MiB (uncompressed) or 50,000 URLs[9] means this is necessary for large sites.

An example of Sitemap index referencing one separate sitemap follows.

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd"
    xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>https://www.example.com/sitemap1.xml.gz</loc>
      <lastmod>2014-10-01T18:23:17+00:00</lastmod>
   </sitemap>
</sitemapindex>

Element definitions

[edit]

The definitions for the elements are shown below:[9]

Element Required? Description
<urlset> Yes The document-level element for the Sitemap. The rest of the document after the '<?xml version>' element must be contained in this.
<url> Yes Parent element for each entry.
<sitemapindex> Yes The document-level element for the Sitemap index. The rest of the document after the '<?xml version>' element must be contained in this.
<sitemap> Yes Parent element for each entry in the index.
<loc> Yes Provides the full URL of the page or sitemap, including the protocol (e.g. http, https) and a trailing slash, if required by the site's hosting server. This value must be shorter than 2,048 characters. Note that ampersands in the URL need to be escaped as &amp;.
<lastmod> No The date that the file was last modified, in ISO 8601 format. This can display the full date and time or, if desired, may simply be the date in the format YYYY-MM-DD.
<changefreq> No How frequently the page may change:
  • always
  • hourly
  • daily
  • weekly
  • monthly
  • yearly
  • never

"Always" is used to denote documents that change each time that they are accessed. "Never" is used to denote archived URLs (i.e. files that will not be changed again).

This is used only as a guide for crawlers, and is not used to determine how frequently pages are indexed.

Does not apply to <sitemap> elements.

<priority> No The priority of that URL relative to other URLs on the site. This allows webmasters to suggest to crawlers which pages are considered more important.

The valid range is from 0.0 to 1.0, with 1.0 being the most important. The default value is 0.5.

Rating all pages on a site with a high priority does not affect search listings, as it is only used to suggest to the crawlers how important pages of the site are to one another.

Does not apply to <sitemap> elements.

Support for the elements that are not required can vary from one search engine to another.[9]

Google ignores <priority>and <changefreq> values.[10]

Other formats

[edit]

Text file

[edit]

The Sitemaps protocol allows the Sitemap to be a simple list of URLs in a text file. The file specifications of XML Sitemaps apply to text Sitemaps as well; the file must be UTF-8 encoded, and cannot be more than 50MiB (uncompressed) or contain more than 50,000 URLs. Sitemaps that exceed these limits should be broken up into multiple sitemaps with a sitemap index file (a file that points to multiple sitemaps).[11]

Syndication feed

[edit]

A syndication feed is a permitted method of submitting URLs to crawlers; this is advised mainly for sites that already have syndication feeds. One stated drawback is this method might only provide crawlers with more recently created URLs, but other URLs can still be discovered during normal crawling.[9]

It can be beneficial to have a syndication feed as a delta update (containing only the newest content) to supplement a complete sitemap.

Search engine submission

[edit]

If Sitemaps are submitted directly to a search engine (pinged), it will return status information and any processing errors. The details involved with submission will vary with the different search engines. The location of the sitemap can also be included in the robots.txt file by adding the following line:

Sitemap: <sitemap_location>

The <sitemap_location> should be the complete URL to the sitemap, such as:

https://www.example.org/sitemap.xml

This directive is independent of the user-agent line, so it doesn't matter where it is placed in the file. If the website has several sitemaps, multiple "Sitemap:" records may be included in robots.txt, or the URL can simply point to the main sitemap index file.

The following table lists the sitemap submission URLs for a few major search engines:

Search engine Submission URL Help page Market
Baidu https://zhanzhang.baidu.com/dashboard/index Baidu Webmaster Dashboard China, Singapore
Bing (and Yahoo!) https://www.bing.com/webmaster/ping.aspx?siteMap= Bing Webmaster Tools Global
Yandex https://webmaster.yandex.com/site/map.xml Sitemaps files Russia, Belarus, Kazakhstan, Turkey

Sitemap URLs submitted using the sitemap submission URLs need to be URL-encoded, for example: replace : (colon) with %3A, replace / (slash) with %2F.[9]

Google retired sitemap submissions using URLs in late 2023.[12]

Limitations for search engine indexing

[edit]

Sitemaps supplement and do not replace the existing crawl-based mechanisms that search engines already use to discover URLs. Using this protocol does not guarantee that web pages will be included in search indexes, nor does it influence the way that pages are ranked in search results. Specific examples are provided below.

  • Google - Webmaster Support on Sitemaps: "Using a sitemap doesn't guarantee that all the items in your sitemap will be crawled and indexed, as Google processes rely on complex algorithms to schedule crawling. However, in most cases, your site will benefit from having a sitemap, and you'll never be penalized for having one."[13]
  • Bing - Bing uses the standard sitemaps.org protocol and is very similar to the one mentioned below.
  • Yahoo - After the search deal commenced between Yahoo! Inc. and Microsoft, Yahoo! Site Explorer has merged with Bing Webmaster Tools.

Sitemap limits

[edit]

Sitemap files have a limit of 50,000 URLs and 50MiB (52,428,800 bytes) per sitemap. Sitemaps can be compressed using gzip, reducing bandwidth consumption. Multiple sitemap files are supported, with a Sitemap index file serving as an entry point. Sitemap index files may not list more than 50,000 Sitemaps and must be no larger than 50MiB and can be compressed. You can have more than one Sitemap index file.[9]

According to Google, a single property in Google Search Console can include up to 500 sitemap index files. Additionally, sitemaps that are referenced in a sitemap index file must be located in the same directory as the sitemap index file, or in a subdirectory lower in the site hierarchy.[14]

Best practice for optimising a sitemap index for search engine crawlability is to ensure the index refers only to sitemaps as opposed to other sitemap indexes. Nesting a sitemap index within a sitemap index is invalid according to Google.[15]

Additional sitemap types

[edit]

A number of additional XML sitemap types outside of the scope of the Sitemaps protocol are supported by Google to allow webmasters to provide additional data on the content of their websites. Video and image sitemaps are intended to improve the capability of websites to rank in image and video searches.[16][17]

Video sitemaps

[edit]

Video sitemaps indicate data related to embedding and autoplaying, preferred thumbnails to show in search results, publication date, video duration, and other metadata.[17] Video sitemaps are also used to allow search engines to index videos that are embedded on a website, but that are hosted externally, such as on Vimeo or YouTube.

Image sitemaps

[edit]

Image sitemaps are used to indicate image metadata, such as licensing information, geographic location, and an image's caption.[16]

Google News Sitemaps

[edit]

Google supports a Google News sitemap type for facilitating quick indexing of time-sensitive news subjects.[18][19]

Multilingual and multinational sitemaps

[edit]

In December 2011, Google announced the annotations for sites that want to target users in many languages and, optionally, countries. A few months later Google announced, on their official blog,[20] that they are adding support for specifying the rel="alternate" and hreflang annotations in Sitemaps. Instead of the (until then only option) HTML link elements the Sitemaps option offered many advantages which included a smaller page size and easier deployment for some websites.

One example of the multilingual sitemap would be as follows:

If for example we have a site that targets English language users through https://www.example.com/en and Greek language users through https://www.example.com/gr, up until then the only option was to add the hreflang annotation either in the HTTP header or as HTML elements on both URLs like this

<link rel="alternate" hreflang="en" href="https://www.example.com/en" />
<link rel="alternate" hreflang="gr" href="https://www.example.com/gr" />

But now, one can alternatively use the following equivalent markup in Sitemaps:

 <url>
   <loc>https://www.example.com/en</loc>
    <xhtml:link
      rel="alternate"
      hreflang="gr"
      href="https://www.example.com/gr" />
    <xhtml:link
      rel="alternate"
      hreflang="en"
      href="https://www.example.com/en" />
 </url>
 <url>
   <loc>https://www.example.com/gr</loc>
    <xhtml:link
      rel="alternate"
      hreflang="gr"
      href="https://www.example.com/gr" />
    <xhtml:link
      rel="alternate"
      hreflang="en"
      href="https://www.example.com/en" />
 </url>

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A sitemap is a structured file, typically in XML format, that lists the URLs of a website's pages along with optional metadata such as the last modification date, change frequency, and relative priority to help search engines discover, crawl, and index site content more efficiently. The Sitemaps protocol was introduced in 2005 by Google to address challenges in crawling large or dynamically generated websites, and it gained broader adoption in 2006 when Yahoo and Microsoft announced joint support, leading to the establishment of sitemaps.org as the official collaborative resource. Sitemaps conform to a specific XML schema that requires elements like <urlset> and <loc> for each URL (limited to 2,048 characters and from a single host), while optional tags such as <lastmod> (in W3C datetime format), <changefreq> (values like "always," "hourly," "daily," "weekly," "monthly," "yearly," or "never"), and <priority> (a decimal from 0.0 to 1.0, defaulting to 0.5) provide additional guidance for crawlers. Each sitemap file is limited to 50,000 URLs or 50 megabytes (uncompressed), with support for gzip compression, and for larger sites, a separate sitemap index file can reference up to 50,000 individual sitemaps. Website owners submit sitemaps to search engines via tools like , by adding a directive in the site's file, or through HTTP requests, enabling faster discovery of new or updated pages that might lack internal links. Benefits include improved indexing for sites with over 500 pages, those featuring rich media like images or videos, content, or international versions in multiple languages, though small, well-linked sites may not require them. Specialized sitemap variants exist for images, videos, and , extending the protocol's utility beyond basic URL lists. All sitemaps must be encoded and entity-escaped to ensure compatibility with parsers.

Fundamentals

Definition and Purpose

A is a file or structured source that lists the URLs of a website's pages, videos, images, and other files to inform search engines about content available for crawling and indexing. Accessing a publicly available sitemap.xml file allows one to view these listed URLs, which can reveal the website's subdirectory structure through the paths included in those URLs. This protocol enables webmasters to provide structured information about site organization and relationships between resources, supplementing traditional link-based discovery methods. The XML format serves as the standard under the official Sitemaps protocol, supported by major search engines including , Bing, and Yahoo. The core purposes of sitemaps are to assist search engines in discovering new or updated content that might otherwise be overlooked, especially on large, dynamic, or poorly linked sites. For SEO purposes, an XML sitemap should include a list of all important pages to improve indexing by search engines. They achieve this by including metadata such as the last modification date (), expected change frequency ( values like "daily" or "monthly"), and relative priority (

on a 0.0–1.0 scale) for each . This guidance helps optimize crawling efficiency, allowing search engines to prioritize high-value pages and allocate resources more effectively. Key benefits include minimizing crawl budget waste— the limited resources search engines dedicate to site exploration—by directing bots toward important content and away from irrelevant paths. Sitemaps help with discovery of new content, potentially accelerating indexing, though times can vary from hours to weeks depending on factors like site size and crawl budget. They boost overall visibility in search results without dependence on internal hyperlinks alone. In contrast to robots.txt files, which specify access permissions to block or allow crawling of certain directories, sitemaps emphasize content suggestion and metadata to enhance discovery and indexing processes.

History

The concept of sitemaps first emerged in the late as part of early practices aimed at improving user on increasingly complex websites. Publishers and guides, such as the Web Style Guide, recommended including hierarchical site maps—often as simple pages or diagrams—to help visitors understand site structure and locate content efficiently. By the early , with the rapid growth of search engines, these user-focused maps began evolving toward machine-readable formats to assist automated crawling and indexing, addressing inefficiencies in discovering new or updated pages across large sites. A key milestone came in June 2005 when introduced the initial Sitemaps protocol (version 0.84) in XML format, enabling webmasters to submit lists of URLs along with metadata like last modification dates and change frequencies to guide crawlers more effectively. This addressed post-search engine boom challenges, such as incomplete crawling of dynamic or poorly linked content. In November 2006, , Yahoo!, and jointly announced support for the protocol, formalizing it under version 0.9 and establishing sitemaps.org as the central documentation site managed by a working group of representatives from these companies. The protocol saw rapid extensions to support specialized content: a news extension was added in November 2006 to prioritize timely articles with publication timestamps, followed by image extensions in April 2010 for enhanced media discovery, and video extensions in December 2007 to include details like duration and thumbnails. These developments were driven by Google engineers, notably Vanessa Fox, who contributed to launching sitemaps.org and building the associated Webmaster Central tools to facilitate adoption. In recent years, the protocol has remained stable with ongoing maintenance by major search engines, though without significant overhauls. A notable change occurred in June 2023 when deprecated the Sitemap Ping Endpoint—a mechanism for notifying engines of updates—which ceased functioning by December 2023, encouraging reliance on direct sitemap submissions via tools like and accurate lastmod tags for discovery.

Core Formats

XML Sitemap Protocol

The XML Sitemap Protocol defines a standardized XML format for listing URLs to facilitate discovery by crawlers. It specifies a that encapsulates all entries, with each individual URL represented as a child <url> element. The protocol mandates inclusion of the declaration xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" in the <urlset> tag to ensure compatibility and validation. Within each <url> element, the <loc> tag is required and contains the canonical of the page, limited to 2,048 characters, which includes the full path and thus can potentially expose subdirectory information to anyone accessing the sitemap. Optional elements include <lastmod>, which records the last modification date in W3C Datetime format (equivalent to ); <changefreq>, indicating update frequency with values such as "always", "hourly", "daily", "weekly", "monthly", "yearly", or "never"; and <priority>, a floating-point value from 0.0 to 1.0 that suggests relative importance within the site (defaulting to 0.5 if omitted). These components provide metadata hints to crawlers without guaranteeing specific crawling behavior. Sitemap files following this protocol are typically named sitemap.xml and placed at the website's for easy access. They must be encoded in and adhere to XML 1.0 specifications, with a maximum uncompressed size of 50 megabytes (52,428,800 bytes) and no more than 50,000 URLs per file. Validation against the official at http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd ensures conformance, as demonstrated in this basic example for listing URLs:

xml

<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/</loc> <lastmod>2005-01-01</lastmod> </url> <url> <loc>http://www.example.com/page1.html</loc> </url> </urlset> ```[](https://www.sitemaps.org/protocol.html) Unlike HTML sitemaps designed for human navigation, the XML format is machine-readable and optimized exclusively for [search engine](/page/Search_engine) processing, omitting any presentational elements. Detailed specifications for individual elements, such as the precise usage of `<loc>`, are covered in the element definitions section.[](https://www.sitemaps.org/protocol.html) ### Element Definitions The XML Sitemap protocol defines a structured set of elements to describe URLs on a [website](/page/Website), enabling search engines to understand the site's content more efficiently. The [root element](/page/Root_element), `<urlset>`, serves as the container for all URL entries in the file and must include the namespace attribute to reference the protocol standard. Specifically, it is declared as `<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">`, ensuring compliance with the [schema](/page/Schema) for validation. This element encapsulates the entire sitemap and must be the outermost tag, with the file encoded in [UTF-8](/page/UTF-8) to handle international characters properly.[](https://www.sitemaps.org/protocol.html) Each individual [URL](/page/URL) is represented by the `<url>` element, which acts as a wrapper for the details of a single page or resource. This element is required for every entry and must contain exactly one child `<loc>` element, though it may also include optional sub-elements like `<lastmod>`, `<changefreq>`, and `<priority>`. The `<url>` tag provides a logical grouping, allowing search engines to parse the [sitemap](/page/Site_map) as a list of discrete entries without ambiguity. Multiple `<url>` elements are nested within the `<urlset>`, forming the core body of the file.[](https://www.sitemaps.org/protocol.html) The `<loc>` element is the mandatory core of each `<url>` entry, specifying the absolute [URL](/page/URL) of the page being referenced. It must be a fully qualified [URL](/page/URL), starting with a protocol such as HTTP or [HTTPS](/page/HTTPS), limited to 2048 characters in length, and excluding fragment identifiers (e.g., no "#section" parts). For instance, a valid `<loc>` might be `<loc>https://www.[example.com](/page/Example.com)/products/widget</loc>`, and all values within the sitemap must be entity-escaped, such as replacing "&" with "&amp;". Relative [URLs](/page/URL) are not permitted, as they prevent universal accessibility across [search engine](/page/Search_engine) crawlers.[](https://www.sitemaps.org/protocol.html) Optionally, the `<lastmod>` element indicates the date and time of the last significant modification to the page, helping search engines prioritize recrawling. It follows the W3C datetime format, such as `<lastmod>2025-11-09T14:30:00+00:00</lastmod>` for a precise [timestamp](/page/Timestamp) or a simpler `<lastmod>2025-11-09</lastmod>` for just the date (YYYY-MM-DD). This value should reflect content changes rather than metadata updates or [sitemap](/page/Site_map) generation times, and it is distinct from HTTP headers like If-Modified-Since, which search engines may use independently.[](https://www.sitemaps.org/protocol.html)[](http://www.w3.org/TR/NOTE-datetime) The `<changefreq>` element provides a hint about the expected update frequency of the page, using one of the predefined [enumeration](/page/Enumeration) values: always, hourly, daily, weekly, monthly, yearly, or never. For example, `<changefreq>weekly</changefreq>` suggests moderate changes, guiding crawlers on scheduling but serving only as a non-binding suggestion, as search engines may adjust based on other factors. This element is optional and should be used judiciously to avoid misleading infrequent updates as frequent ones.[](https://www.sitemaps.org/protocol.html) Similarly optional, the `<priority>` element assigns a relative importance score to the URL within the context of the same [website](/page/Website), expressed as a [decimal](/page/Decimal) value from 0.0 (lowest) to 1.0 (highest), with a default of 0.5 if omitted. An example is `<priority>0.8</priority>`, indicating higher priority than the site average but not implying global ranking influence across different sites. Priorities are site-relative only, and setting all entries to 1.0 negates any useful differentiation.[](https://www.sitemaps.org/protocol.html) A complete example of a `<url>` entry incorporating all elements for a hypothetical page might appear as follows:

<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/</loc> <lastmod>2005-01-01</lastmod> </url> <url> <loc>http://www.example.com/page1.html</loc> </url> </urlset> ```[](https://www.sitemaps.org/protocol.html) Unlike HTML sitemaps designed for human navigation, the XML format is machine-readable and optimized exclusively for [search engine](/page/Search_engine) processing, omitting any presentational elements. Detailed specifications for individual elements, such as the precise usage of `<loc>`, are covered in the element definitions section.[](https://www.sitemaps.org/protocol.html) ### Element Definitions The XML Sitemap protocol defines a structured set of elements to describe URLs on a [website](/page/Website), enabling search engines to understand the site's content more efficiently. The [root element](/page/Root_element), `<urlset>`, serves as the container for all URL entries in the file and must include the namespace attribute to reference the protocol standard. Specifically, it is declared as `<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">`, ensuring compliance with the [schema](/page/Schema) for validation. This element encapsulates the entire sitemap and must be the outermost tag, with the file encoded in [UTF-8](/page/UTF-8) to handle international characters properly.[](https://www.sitemaps.org/protocol.html) Each individual [URL](/page/URL) is represented by the `<url>` element, which acts as a wrapper for the details of a single page or resource. This element is required for every entry and must contain exactly one child `<loc>` element, though it may also include optional sub-elements like `<lastmod>`, `<changefreq>`, and `<priority>`. The `<url>` tag provides a logical grouping, allowing search engines to parse the [sitemap](/page/Site_map) as a list of discrete entries without ambiguity. Multiple `<url>` elements are nested within the `<urlset>`, forming the core body of the file.[](https://www.sitemaps.org/protocol.html) The `<loc>` element is the mandatory core of each `<url>` entry, specifying the absolute [URL](/page/URL) of the page being referenced. It must be a fully qualified [URL](/page/URL), starting with a protocol such as HTTP or [HTTPS](/page/HTTPS), limited to 2048 characters in length, and excluding fragment identifiers (e.g., no "#section" parts). For instance, a valid `<loc>` might be `<loc>https://www.[example.com](/page/Example.com)/products/widget</loc>`, and all values within the sitemap must be entity-escaped, such as replacing "&" with "&amp;". Relative [URLs](/page/URL) are not permitted, as they prevent universal accessibility across [search engine](/page/Search_engine) crawlers.[](https://www.sitemaps.org/protocol.html) Optionally, the `<lastmod>` element indicates the date and time of the last significant modification to the page, helping search engines prioritize recrawling. It follows the W3C datetime format, such as `<lastmod>2025-11-09T14:30:00+00:00</lastmod>` for a precise [timestamp](/page/Timestamp) or a simpler `<lastmod>2025-11-09</lastmod>` for just the date (YYYY-MM-DD). This value should reflect content changes rather than metadata updates or [sitemap](/page/Site_map) generation times, and it is distinct from HTTP headers like If-Modified-Since, which search engines may use independently.[](https://www.sitemaps.org/protocol.html)[](http://www.w3.org/TR/NOTE-datetime) The `<changefreq>` element provides a hint about the expected update frequency of the page, using one of the predefined [enumeration](/page/Enumeration) values: always, hourly, daily, weekly, monthly, yearly, or never. For example, `<changefreq>weekly</changefreq>` suggests moderate changes, guiding crawlers on scheduling but serving only as a non-binding suggestion, as search engines may adjust based on other factors. This element is optional and should be used judiciously to avoid misleading infrequent updates as frequent ones.[](https://www.sitemaps.org/protocol.html) Similarly optional, the `<priority>` element assigns a relative importance score to the URL within the context of the same [website](/page/Website), expressed as a [decimal](/page/Decimal) value from 0.0 (lowest) to 1.0 (highest), with a default of 0.5 if omitted. An example is `<priority>0.8</priority>`, indicating higher priority than the site average but not implying global ranking influence across different sites. Priorities are site-relative only, and setting all entries to 1.0 negates any useful differentiation.[](https://www.sitemaps.org/protocol.html) A complete example of a `<url>` entry incorporating all elements for a hypothetical page might appear as follows:

[https](/page/HTTPS)://www.[example.com](/page/Example.com)/products/widget 2025-11-09T14:30:00+00:00 weekly

0.8 ``` This snippet would be nested within a <urlset> for the full file. Common errors in implementing these elements include using invalid date formats in <lastmod>, such as non-W3C compliant strings like "11/09/2025", which may cause search engines to ignore the value; providing relative URLs in <loc>, like "/products/widget" instead of a full absolute path; or exceeding the 2048-character limit for <loc>, leading to or rejection of the entry. Additionally, failing to entity-escape special characters or omitting the required <loc> within a <url> can render the sitemap unparseable.

Alternative Formats

Plain Text Sitemaps

Plain text sitemaps provide a basic method for listing website URLs in a non-structured format, consisting of a single text file with one absolute URL per line and no accompanying metadata such as last modification dates, change frequencies, or priorities. These files must use the .txt extension and be encoded in to ensure proper parsing by crawlers. This format is particularly suitable for small websites or legacy systems requiring minimal maintenance, as it avoids the complexity of XML tagging while still enabling basic URL discovery. Both and Bing officially support plain text sitemaps for crawling and indexing purposes, allowing webmasters to notify search engines of site content without advanced features. To create a sitemap, webmasters can use any standard to compile a list of absolute URLs, ensuring the file does not exceed 50,000 URLs or 50 MB in uncompressed size; for larger sites, multiple files can be generated and referenced accordingly. For instance, a simple three-page site might use the following content in its sitemap.txt file:

https://www.example.com/ https://www.example.com/about.html https://www.example.com/contact.html

https://www.example.com/ https://www.example.com/about.html https://www.example.com/contact.html

This approach emphasizes straightforward compilation, often via manual entry or basic scripting tools. The primary advantage of sitemaps lies in their , enabling quick creation and deployment even in resource-constrained environments without the need for XML validation or specialized generators. However, this format lacks the rich metadata available in the XML sitemap protocol, which limits its ability to guide crawlers on update priorities or frequencies, potentially reducing overall crawl efficiency. Plain text sitemaps pre-date the XML sitemap protocol, which was jointly standardized by , Yahoo, and in 2006, and were commonly used for early URL submissions to Yahoo's search index.

RSS and Atom Feeds

and Atom feeds, originally designed for , can be adapted to function as sitemaps by search engines when they include elements pointing to site s. This adaptation allows feeds in RSS 2.0 or Atom 0.3/1.0 formats to notify crawlers of available pages, particularly useful for sites already generating such feeds for content distribution. began supporting and Atom feeds as sitemaps in September 2005, enabling publishers to leverage existing infrastructure for improved discoverability. Key requirements for using these feeds as sitemaps include embedding full, absolute URLs to site pages via the <link> element in or Atom entries, rather than relying solely on feed item descriptions or relative paths. Additionally, including a modification —such as <pubDate> in or <updated> in Atom—helps search engines prioritize crawling based on recency. Feeds should be placed in the site's to facilitate easy discovery by crawlers, and they must adhere to the respective syndication standards while serving purposes. One primary advantage of RSS and Atom feeds as sitemaps is their ability to provide automatic updates for dynamic content, such as posts or articles, ensuring search engines receive notifications of changes without manual intervention. This dual-purpose functionality benefits both end-users subscribing to content updates and crawlers seeking fresh URLs, making it ideal for frequently updated sites like or portals. However, and Atom feeds have notable limitations when used as sitemaps, as they typically only encompass recent content—often the last 10 to 500 items—rather than an exhaustive list of all site pages. Unlike dedicated XML sitemaps, they lack support for priority levels or change frequency indicators, which can reduce their effectiveness for comprehensive site mapping. For instance, a basic feed adapted for sitemap use might resemble the following snippet, where <link> elements point to full URLs and <pubDate> provides timestamps:

<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0"> <channel> <title>Example Site</title> <link>[https](/page/HTTPS)://www.example.com/</link> <description>Site description</description> <pubDate>Sat, 01 Jan 2025 00:00:00 GMT</pubDate> <item> <title>Article Title</title> <link>[https](/page/HTTPS)://www.example.com/article1</link> <pubDate>Sat, 01 Jan 2025 12:00:00 GMT</pubDate> <description>Article summary</description> </item> </channel> </rss>

<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0"> <channel> <title>Example Site</title> <link>[https](/page/HTTPS)://www.example.com/</link> <description>Site description</description> <pubDate>Sat, 01 Jan 2025 00:00:00 GMT</pubDate> <item> <title>Article Title</title> <link>[https](/page/HTTPS)://www.example.com/article1</link> <pubDate>Sat, 01 Jan 2025 12:00:00 GMT</pubDate> <description>Article summary</description> </item> </channel> </rss>

This structure allows discovery of linked pages but does not extend to older or static content. Compatibility varies across search engines, with full support in and Bing, where RSS 2.0 and Atom 0.3/1.0 feeds are processed similarly to XML sitemaps for URL discovery and crawling prioritization. Bing explicitly accepts these formats alongside XML and plain text, treating them as valid sitemap submissions. Other engines may offer partial support, but and Atom feeds are not intended as a complete replacement for full XML sitemaps, especially for large or static sites requiring broad coverage.

Submission and Indexing

Submitting to Search Engines

Sitemaps can be submitted to search engines through two primary methods: automatic discovery by placing the file at the website's root directory or referencing it in the robots.txt file, and direct submission via dedicated webmaster tools. Automatic discovery allows search engine crawlers to locate the sitemap without manual intervention; for instance, adding a line like Sitemap: https://example.com/sitemap.xml to the robots.txt file enables major engines to find and process it during routine crawls. Direct submission provides more control and immediate notification, typically through web-based consoles where site owners verify ownership before adding the sitemap URL. For Google, sitemaps are submitted via by navigating to the Sitemaps section, entering the sitemap URL (or index file), and clicking submit; this method is recommended over deprecated alternatives and is a key step for better SEO outcomes, such as faster indexing of pages. Bing accepts submissions through under the Sitemaps tool, where users paste the sitemap URL and submit it after site verification. uses its Webmaster Tools, selecting Indexing > Sitemap files to enter and submit the sitemap . These consoles support sitemap index files, which consolidate multiple sitemaps into a single reference file for easier management of large sites; engines process the index to access individual sitemaps. Sitemaps must be accessible via HTTP or protocols, ensuring crawlers can fetch them without authentication or redirection issues. A notable change occurred with the retirement of Google's sitemap ping endpoint in late 2023, where notifications via http://www.google.com/ping?sitemap=URL ceased to function, shifting emphasis to console submissions and auto-discovery for efficient crawling signals. Tools facilitate submission for non-technical users; for example, the plugin for automatically generates and enables XML sitemaps, integrating submission options directly within the dashboard for seamless delivery to search engines. Online generators like XML-Sitemaps.com allow users to create and download sitemaps, which can then be uploaded to the or submitted manually. Verification of submission occurs through console reports, which display processing status, last access dates, discovered URLs, and any errors such as invalid formats or access issues. For dynamic sites with frequent content updates, such as news platforms, resubmitting the sitemap daily ensures timely crawling of new pages, while static sites may require updates only after significant changes. Multi-engine support follows unified guidelines from sitemaps.org, which outline compatible formats and encourage cross-submission to engines like , Bing, and for broader indexing coverage.

Indexing Limitations

Sitemaps serve as suggestions to search engines about URLs available for crawling and potential indexing, but they do not guarantee that any listed pages will be included in search results. Search engines like evaluate each based on factors such as content quality, duplication, , and adherence to guidelines, often prioritizing high-value pages within limited crawl budgets. For instance, allocates resources based on site size, update frequency, and server performance, meaning even sitemap-submitted URLs may remain unvisited if resources are constrained. Several key constraints can prevent indexing despite sitemap inclusion. Pages marked with a noindex meta tag or HTTP header directive will not be indexed, as this explicitly signals search engines to exclude them from results, overriding any sitemap recommendation. Similarly, resources blocked by robots.txt directives remain inaccessible for crawling, and sitemaps cannot bypass these restrictions—search engines respect disallow rules and will not fetch or index such content. Low-value or thin content, such as duplicate pages or those lacking substantial user benefit, is also frequently ignored, as engines apply policies to maintain result quality. In terms of effectiveness, sitemaps primarily accelerate discovery for new or orphaned pages that lack strong internal or external links, potentially reducing the time to indexing compared to reliance on natural crawling alone. However, for sites with robust linking structures, the impact on overall indexing rates is often minimal, as search engines already efficiently traverse well-connected content. Common pitfalls further limit sitemap utility. Including non-canonical URLs or pages with noindex directives can trigger warnings or rejection of the sitemap file, wasting processing resources and potentially harming crawl efficiency. Over-submission of unchanged sitemaps consumes unnecessary quota in webmaster tools and may dilute focus on truly updated content, indirectly straining crawl budgets. Engine-specific behaviors highlight varying reliance on sitemaps. Bing places greater emphasis on sitemaps for comprehensive discovery in large or deep sites, using them to ensure full coverage amid AI-powered search demands. As of 2025, major engines like have intensified focus on content quality over URL quantity, with core updates penalizing low-value content and rewarding signals of authoritative, user-focused pages.

Specifications and Limits

Size and URL Constraints

Sitemaps adhere to strict size and content constraints to ensure efficient processing by crawlers. According to the Sitemaps protocol, each individual sitemap file is limited to a maximum of 50,000 and must not exceed 50 MB (52,428,800 bytes) in uncompressed size. These limits apply to the XML content before any compression, helping to prevent overload on server resources during crawling. Additionally, each specified in the <loc> element must be fewer than 2,048 characters in length, and all within a sitemap must belong to the same host as the sitemap file itself. For sites exceeding these per-file limits, the protocol recommends using a sitemap index file, which employs the <sitemapindex> to reference up to 50,000 individual files, each conforming to the standard constraints. The index file itself is also capped at 50 MB uncompressed. indexes must only link to sitemaps on the same site, enabling scalable organization without violating core limits. Major search engines like and Bing enforce these 50,000 URL and 50 MB thresholds strictly to maintain crawling efficiency. Yandex enforces the standard limits of 50,000 URLs and 50 MB uncompressed per sitemap file, recommending the use of sitemap index files for larger sites. To manage large-scale sites within these bounds, sitemaps can be compressed using , which typically reduces file sizes by 60-90% for XML content, aiding efficient transmission, and divided into logical subsets such as dated archives (e.g., sitemap-2025-11.xml) or categorized collections (e.g., sitemap-products.xml). The protocol advises against including redirecting URLs or those with excessive parameters in sitemaps, as they may lead to processing errors, emphasizing , direct links instead.

Best Practices

To create effective sitemaps, automate their generation using content management system (CMS) plugins like Yoast SEO for WordPress or tools such as Screaming Frog for broader sites, ensuring dynamic updates for large inventories without manual intervention. Include only canonical, indexable URLs—such as primary versions of pages with absolute paths like https://www.example.com/product-page.html—while excluding duplicates, redirects, or non-public content to guide crawlers efficiently. Always update the <lastmod> element with precise, verifiable dates in ISO 8601 format (e.g., 2025-11-09) to signal recent changes and prioritize recrawling. For maintenance, resubmit sitemaps to search engines via or robots.txt after significant site updates, such as adding new content or restructuring, to prompt fresh crawling. Regularly monitor for errors in Search Console's Sitemaps report, addressing issues like fetch failures or invalid URLs promptly to maintain crawl efficiency. Avoid including pages marked with noindex directives, as this can confuse crawlers and dilute the sitemap's value. Optimization involves using <priority> and <changefreq> elements judiciously, though ignores them in favor of other signals; reserve higher priorities (e.g., 0.8-1.0) for high-value pages like homepages or key landing pages if targeting engines beyond . Prioritize inclusion of revenue-driving or user-critical pages to focus crawler budget on impactful content. Integrate with markup on individual pages—such as Product or Article schemas—to enhance rich result eligibility, as sitemaps alone do not embed structured data. In 2025, ensure sitemap compatibility with mobile-first indexing by listing a single preferred URL version (mobile or responsive) per entry, avoiding separate desktop/mobile variants to align with Google's primary rendering focus. Test sitemap URLs using Google Search Console's URL Inspection tool to verify crawlability and indexing status before submission. Track key metrics like indexing rates and error percentages through , aiming to keep error rates below 10% by resolving issues such as malformed XML or inaccessible files, which directly correlates with improved . For sites, create separate for product catalogs to manage large volumes (e.g., one for active , another for images), respecting size limits while highlighting seasonal or high-traffic items. News sites should refresh weekly—or more frequently for breaking content—to include recent articles, ensuring timely indexing without exceeding per- URL caps.

Specialized Types

Image and Video Sitemaps

Image and video sitemaps extend the standard XML sitemap protocol to provide search engines with detailed information about media content on a , facilitating better discovery and indexing of and videos. These extensions use dedicated namespaces and elements that can be embedded directly within the <url> tags of a conventional sitemap or housed in separate files, such as sitemap-images.xml or sitemap-videos.xml. By including media-specific metadata, these sitemaps help prioritize content for rich search features, such as thumbnails and enhanced previews, improving visibility in image and video search results. For images, the extensions are defined in the namespace http://www.google.com/schemas/sitemap-image/1.1. The core structure involves the <image:image> element, which encapsulates details for a single image and can appear multiple times under each <url>. The required <image:loc> element specifies the absolute of the image file itself. Historically, additional elements like <image:title> for a short descriptive title, <image:caption> for contextual text, and <image:geo_location> for latitude and longitude coordinates were supported to enrich image understanding; however, these have been deprecated since August 2022 in favor of simpler structures and alternative best practices like descriptive alt text in . Up to 1,000 <image:image> entries are permitted per <url>, allowing sites with image galleries to associate multiple assets with a single page. The following XML snippet illustrates an embedded image extension for a page featuring a gallery:

xml

<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"> <url> <loc>https://example.com/gallery-page.html</loc> <lastmod>2025-11-09</lastmod> <image:image> <image:loc>https://example.com/images/photo1.jpg</image:loc> </image:image> <image:image> <image:loc>https://example.com/images/photo2.jpg</image:loc> </image:image> </url> </urlset>

<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"> <url> <loc>https://example.com/gallery-page.html</loc> <lastmod>2025-11-09</lastmod> <image:image> <image:loc>https://example.com/images/photo1.jpg</image:loc> </image:image> <image:image> <image:loc>https://example.com/images/photo2.jpg</image:loc> </image:image> </url> </urlset>

When the deprecated elements were in use, titles and captions were recommended to be concise, ideally under 100 characters, to maintain efficiency in processing. Today, focusing on <image:loc> ensures compatibility while aiding in discovering images that might be loaded dynamically via or hidden from standard crawling. This approach enhances the potential for images to appear as thumbnails in search results, driving more targeted traffic to media-rich pages. Video sitemaps, similarly, leverage the namespace http://www.google.com/schemas/sitemap-video/1.1 and wrap content in the <video:video> element, which supports up to 1,000 instances per <url>. Essential tags include <video:content_loc>, which points to the direct URL of the video file in supported formats like MP4 or WebM; <video:thumbnail_loc> for a representative image preview; <video:title> for a brief, engaging name; <video:description> for a summary of the content; and <video:duration>, specified as an integer value in seconds representing the video's length. These elements provide context that helps search engines evaluate relevance and quality for video-specific queries. An example of a video extension within a standard sitemap entry for a page hosting a tutorial video is shown below:

xml

<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:video="http://www.google.com/schemas/sitemap-video/1.1"> <url> <loc>[https](/page/HTTPS)://example.com/video-tutorial.html</loc> <lastmod>2025-11-09</lastmod> <video:video> <video:content_loc>[https](/page/HTTPS)://example.com/videos/tutorial.mp4</video:content_loc> <video:thumbnail_loc>[https](/page/HTTPS)://example.com/thumbs/tutorial.jpg</video:thumbnail_loc> <video:title>Tutorial on [Web Development](/page/Web_development)</video:title> <video:description>A beginner's guide to building websites with [HTML](/page/HTML) and CSS.</video:description> <video:duration>300</video:duration> </video:video> </url> </urlset>

<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:video="http://www.google.com/schemas/sitemap-video/1.1"> <url> <loc>[https](/page/HTTPS)://example.com/video-tutorial.html</loc> <lastmod>2025-11-09</lastmod> <video:video> <video:content_loc>[https](/page/HTTPS)://example.com/videos/tutorial.mp4</video:content_loc> <video:thumbnail_loc>[https](/page/HTTPS)://example.com/thumbs/tutorial.jpg</video:thumbnail_loc> <video:title>Tutorial on [Web Development](/page/Web_development)</video:title> <video:description>A beginner's guide to building websites with [HTML](/page/HTML) and CSS.</video:description> <video:duration>300</video:duration> </video:video> </url> </urlset>

Titles and descriptions should be kept succinct—titles ideally under 100 characters—to optimize for display in search interfaces without truncation. The benefits of video sitemaps are particularly pronounced for SEO, as they enable videos to surface in rich results like video carousels, especially following Google's 2006 acquisition of , which expanded video indexing capabilities across hosted and embedded content. This integration has made explicit video metadata crucial for competing in unified video search ecosystems. Google has provided full support for image sitemaps since April 2010 and video sitemaps since December 2007, allowing webmasters to submit them via tools like Search Console for prioritized crawling. Bing offers partial compatibility, accepting standard XML sitemaps that may include these extensions but without dedicated processing for image or video-specific tags, relying instead on general discovery. For optimal results, sites should validate sitemaps against official schemas and monitor indexing status through respective tools.

News Sitemaps

News sitemaps are a specialized extension of the standard XML sitemap protocol designed specifically for news publishers to accelerate the discovery and indexing of timely articles by search engines like . They utilize the namespace http://www.google.com/schemas/sitemap-news/0.9 to incorporate news-specific metadata within each <url> entry, enabling faster crawling of fresh content that meets strict timeliness criteria. This format helps ensure that appears promptly in search results and news aggregators, prioritizing content relevance and recency over general web pages. The core structure of a news sitemap embeds a <news:news> parent element inside each <url> tag, which contains required sub-elements for publication details and article metadata. The <news:publication> element is mandatory and includes <news:name>, specifying the exact publication name as recognized on news.google.com (without parentheses or variations), and <news:language>, using an ISO 639-1 or ISO 639-2 code such as "en" or "zh-cn". Additionally, <news:publication_date> must be provided in W3C datetime format (e.g., "2025-11-09" or "2025-11-09T12:00:00-08:00") to indicate the article's ISO 8601-compliant publication time, while <news:title> captures the article's headline in plain text. Optional elements enhance discoverability, such as <news:keywords> for up to five comma-separated terms relevant to the content (e.g., "election, politics, results"), and <news:geo_targeting> using ISO 3166-1 alpha-2 codes like "US" for location-specific targeting. To qualify for inclusion, news sitemaps must adhere to stringent requirements: articles can only be listed if published within the last 48 hours. Approval in the Publisher Center is recommended for publishers seeking full inclusion in features, where they can verify ownership and manage content. Keywords should be limited to fewer than five terms to maintain focus, avoiding overly broad or unrelated phrases. Sitemaps are capped at 1,000 <news:news> entries each, with no support for <priority> or <changefreq> tags, as these are irrelevant for ephemeral content; exceeding limits requires splitting into multiple files via a sitemap index. Publishers are encouraged to update sitemaps hourly or as new articles publish to reflect real-time flows, removing outdated entries promptly. The primary purpose of news sitemaps is to fast-track indexing in , signaling high-priority content for immediate crawling and reducing latency in surfacing breaking stories. They also support (AMP) through the optional <news:amp> tag, which points to a mobile-optimized AMP version of the article , improving load times on devices. For a breaking news article, a representative XML snippet might appear as follows, incorporating keywords and geo-targeting for a U.S. election story:

<url> <loc>[https](/page/HTTPS)://example.com/2025-election-results</loc> <news:news> <news:publication> <news:name>Example News</news:name> <news:language>en</news:language> </news:publication> <news:publication_date>2025-11-09T08:00:00-05:00</news:publication_date> <news:title>2025 Election: Key Results and Analysis</news:title> <news:keywords>[election](/page/Election), results, politics, vote</news:keywords> <news:geo_targeting>[US](/page/United_States)</news:geo_targeting> <news:amp>[https](/page/HTTPS)://example.com/amp/2025-election-results</news:amp> </news:news> </url>

<url> <loc>[https](/page/HTTPS)://example.com/2025-election-results</loc> <news:news> <news:publication> <news:name>Example News</news:name> <news:language>en</news:language> </news:publication> <news:publication_date>2025-11-09T08:00:00-05:00</news:publication_date> <news:title>2025 Election: Key Results and Analysis</news:title> <news:keywords>[election](/page/Election), results, politics, vote</news:keywords> <news:geo_targeting>[US](/page/United_States)</news:geo_targeting> <news:amp>[https](/page/HTTPS)://example.com/amp/2025-election-results</news:amp> </news:news> </url>

This example ensures compliance with schema requirements while highlighting timely metadata for efficient indexing.

Advanced Configurations

Multilingual Support

Sitemaps support multilingual websites through the integration of hreflang annotations, which allow webmasters to specify alternate language and regional versions of pages directly within the XML structure. This is achieved by including <xhtml:link> elements as children of each <url> entry, using the rel="alternate" attribute paired with hreflang to indicate the language or locale (e.g., hreflang="en" for English or hreflang="es" for Spanish). These annotations must be bidirectional, meaning each variant page links to all others in the set, including a self-referential link to its own URL. The sitemap namespace must include the XHTML extension: xmlns:xhtml="http://www.w3.org/1999/xhtml". Webmasters can approach multilingual sitemaps in two primary ways: using a single sitemap file that encompasses all language variants or creating separate sitemap files for each language, which are then linked together via a sitemap index file. The single-file method consolidates all <url> entries with their respective <xhtml:link> annotations, making it suitable for smaller sites, while separate files improve organization for larger, language-diverse sites and can reference the index for submission to search engines. Best practices include always adding self-referential hreflang tags (e.g., pointing back to the page's own <loc>), supporting region-specific codes like en-US for American English versus en-GB for British English, and incorporating a default variant with hreflang="x-default" for users whose language or region does not match any specified alternate. Fully qualified absolute URLs should be used in all <loc> and <xhtml:link href> attributes to avoid resolution issues. Key challenges in implementing multilingual sitemaps involve ensuring consistency and avoiding errors that could lead search engines to ignore the annotations. For instance, languages must not be mixed within a single <url> entry; each entry should represent one primary language version with links to alternates. Incorrect language codes (using for languages and Alpha 2 for regions) or missing bidirectional links can invalidate the cluster. Validation is essential and can be performed using tools like Google's URL Inspection tool in Search Console to check if signals are recognized during crawling, or third-party validators such as the Hreflang Tags Testing Tool from TechnicalSEO.com. An example XML snippet for a sitemap entry supporting English and Spanish variants of a page might look like this:

xml

<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"> <url> <loc>https://example.com/en/article/</loc> <xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/article/" /> <xhtml:link rel="alternate" hreflang="es" href="https://example.com/es/article/" /> <xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/en/article/" /> </url> <url> <loc>https://example.com/es/article/</loc> <xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/article/" /> <xhtml:link rel="alternate" hreflang="es" href="https://example.com/es/article/" /> <xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/en/article/" /> </url> </urlset>

<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"> <url> <loc>https://example.com/en/article/</loc> <xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/article/" /> <xhtml:link rel="alternate" hreflang="es" href="https://example.com/es/article/" /> <xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/en/article/" /> </url> <url> <loc>https://example.com/es/article/</loc> <xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/article/" /> <xhtml:link rel="alternate" hreflang="es" href="https://example.com/es/article/" /> <xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/en/article/" /> </url> </urlset>

This structure ensures all variants are discoverable and properly annotated. Search engines like and Bing utilize these hreflang annotations in sitemaps to deliver results based on the user's language and region preferences, enhancing relevance for international audiences.

Sitemap Indexes

Sitemap indexes enable large-scale websites to organize and reference multiple individual sitemap files, addressing the protocol's constraints on file size and URL count. They serve as a central hub for managing extensive URL inventories, such as those exceeding 50,000 URLs, by linking to category-specific or segmented sitemaps like those for products, blog posts, or images. This approach facilitates efficient crawling and indexing for search engines, particularly on enterprise sites with millions of pages. The structure of a sitemap index file uses an XML root element <sitemapindex> with the namespace http://www.sitemaps.org/schemas/sitemap/0.9, containing one or more <sitemap> child elements. Each <sitemap> must include a <loc> element specifying the URL of an individual sitemap file, and may optionally include a <lastmod> element in W3C datetime format to indicate the last modification date of that sitemap. All files must be encoded, and the referenced sitemaps must belong to the same site as the index. This format has been supported since the initial protocol version 0.9. Implementation involves naming the index file conventionally as sitemap_index.xml (or similar, such as sitemap-index.xml) and placing it in the website's for automatic discoverability by search engines, which commonly check for standard locations like /sitemap.xml or /sitemap_index.xml. Sitemaps referenced in the index should reside in the same directory or a subdirectory relative to the index file to ensure proper . For submission, the index file is provided to search engines, which then process the linked sitemaps. Limits for sitemap indexes include a maximum of 50,000 <sitemap> entries per index file and a total uncompressed file size of 50 MB (or equivalent when gzipped). limits the number of sitemap index files that can be submitted per site to 500 via Search Console. Recursive indexing—where an index links to another index—is permitted by the protocol but supported only to limited depths by major search engines; for instance, processes up to one level of nesting (index to index to sitemaps) but does not recommend structures beyond two levels to avoid processing inefficiencies. The following example illustrates a basic sitemap index file linking to three sub-sitemaps for products, , and images:

xml

<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>https://www.example.com/sitemaps/products.xml</loc> <lastmod>2025-11-01</lastmod> </sitemap> <sitemap> <loc>https://www.example.com/sitemaps/blog.xml</loc> <lastmod>2025-11-08</lastmod> </sitemap> <sitemap> <loc>https://www.example.com/sitemaps/images.xml</loc> <lastmod>2025-11-09</lastmod> </sitemap> </sitemapindex>

<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>https://www.example.com/sitemaps/products.xml</loc> <lastmod>2025-11-01</lastmod> </sitemap> <sitemap> <loc>https://www.example.com/sitemaps/blog.xml</loc> <lastmod>2025-11-08</lastmod> </sitemap> <sitemap> <loc>https://www.example.com/sitemaps/images.xml</loc> <lastmod>2025-11-09</lastmod> </sitemap> </sitemapindex>

This structure simplifies maintenance for large sites by allowing modular updates to individual sitemaps without regenerating a single massive file, improving crawl efficiency and reducing server load during updates.

References

  1. https://developers.[google](/page/Google).com/search/blog/2014/10/best-practices-for-xml-sitemaps-rssatom
  2. https://developers.[google](/page/Google).com/search/docs/crawling-indexing/sitemaps/large-sitemaps
Add your contribution
Related Hubs
User Avatar
No comments yet.