Recent from talks
Nothing was collected or created yet.
Sitemaps
View on Wikipedia
This article contains instructions or advice. (March 2021) |
Sitemaps is a protocol in XML format meant for a webmaster to inform search engines about URLs on a website that are available for web crawling. It allows webmasters to include additional information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs of the site. This allows search engines to crawl the site more efficiently and to find URLs that may be isolated from the rest of the site's content. The Sitemaps protocol is a URL inclusion protocol and complements robots.txt, a URL exclusion protocol.
History
[edit]Google first introduced Sitemaps 0.84 in June 2005 so web developers could publish lists of links from across their sites.[1] Google, Yahoo! and Microsoft announced joint support for the Sitemaps protocol in November 2006.[2] The schema version was changed to "Sitemap 0.90", but no other changes were made.
In April 2007, Ask.com and IBM announced support for Sitemaps.[3] Also, Google, Yahoo, MSN announced auto-discovery for sitemaps through robots.txt. In May 2007, the state governments of Arizona, California, Utah and Virginia announced they would use Sitemaps on their web sites.[4]
The Sitemaps protocol is based on ideas[5] from "Crawler-friendly Web Servers,"[6] with improvements including auto-discovery through robots.txt and the ability to specify the priority and change frequency of pages.
Purpose
[edit]Sitemaps are particularly beneficial on websites where:
- Some areas of the website are not available through the browsable interface[7]
- Webmasters use rich Ajax, Silverlight, or Flash content that is not normally processed by search engines.
- The site is very large and there is a chance for the web crawlers to overlook some of the new or recently updated content[7]
- When websites have a huge number of pages that are isolated or not well linked together, or[7]
- When a website has few external links[7]
- The website contains a large amount of rich media content (such as video or images) or is included in Google News.[8]
File format
[edit]The Sitemap Protocol format consists of XML tags. The file itself must be UTF-8 encoded. Sitemaps can also be just a plain text list of URLs. They can also be compressed in .gz format.
A sample Sitemap that contains just one URL and uses all optional tags is shown below.
<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2006-11-18</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
</urlset>
The Sitemap XML protocol is also extended to provide a way of listing multiple Sitemaps in a 'Sitemap index' file. The maximum Sitemap size of 50 MiB (uncompressed) or 50,000 URLs[9] means this is necessary for large sites.
An example of Sitemap index referencing one separate sitemap follows.
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd"
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://www.example.com/sitemap1.xml.gz</loc>
<lastmod>2014-10-01T18:23:17+00:00</lastmod>
</sitemap>
</sitemapindex>
Element definitions
[edit]The definitions for the elements are shown below:[9]
| Element | Required? | Description |
|---|---|---|
<urlset>
|
Yes | The document-level element for the Sitemap. The rest of the document after the '<?xml version>' element must be contained in this. |
<url>
|
Yes | Parent element for each entry. |
<sitemapindex>
|
Yes | The document-level element for the Sitemap index. The rest of the document after the '<?xml version>' element must be contained in this. |
<sitemap>
|
Yes | Parent element for each entry in the index. |
<loc>
|
Yes | Provides the full URL of the page or sitemap, including the protocol (e.g. http, https) and a trailing slash, if required by the site's hosting server. This value must be shorter than 2,048 characters. Note that ampersands in the URL need to be escaped as &.
|
<lastmod>
|
No | The date that the file was last modified, in ISO 8601 format. This can display the full date and time or, if desired, may simply be the date in the format YYYY-MM-DD. |
<changefreq>
|
No | How frequently the page may change:
"Always" is used to denote documents that change each time that they are accessed. "Never" is used to denote archived URLs (i.e. files that will not be changed again). This is used only as a guide for crawlers, and is not used to determine how frequently pages are indexed. Does not apply to |
<priority>
|
No | The priority of that URL relative to other URLs on the site. This allows webmasters to suggest to crawlers which pages are considered more important.
The valid range is from 0.0 to 1.0, with 1.0 being the most important. The default value is 0.5. Rating all pages on a site with a high priority does not affect search listings, as it is only used to suggest to the crawlers how important pages of the site are to one another. Does not apply to |
Support for the elements that are not required can vary from one search engine to another.[9]
Google ignores <priority>and <changefreq> values.[10]
Other formats
[edit]Text file
[edit]The Sitemaps protocol allows the Sitemap to be a simple list of URLs in a text file. The file specifications of XML Sitemaps apply to text Sitemaps as well; the file must be UTF-8 encoded, and cannot be more than 50MiB (uncompressed) or contain more than 50,000 URLs. Sitemaps that exceed these limits should be broken up into multiple sitemaps with a sitemap index file (a file that points to multiple sitemaps).[11]
Syndication feed
[edit]A syndication feed is a permitted method of submitting URLs to crawlers; this is advised mainly for sites that already have syndication feeds. One stated drawback is this method might only provide crawlers with more recently created URLs, but other URLs can still be discovered during normal crawling.[9]
It can be beneficial to have a syndication feed as a delta update (containing only the newest content) to supplement a complete sitemap.
Search engine submission
[edit]If Sitemaps are submitted directly to a search engine (pinged), it will return status information and any processing errors. The details involved with submission will vary with the different search engines. The location of the sitemap can also be included in the robots.txt file by adding the following line:
Sitemap: <sitemap_location>
The <sitemap_location> should be the complete URL to the sitemap, such as:
https://www.example.org/sitemap.xml
This directive is independent of the user-agent line, so it doesn't matter where it is placed in the file. If the website has several sitemaps, multiple "Sitemap:" records may be included in robots.txt, or the URL can simply point to the main sitemap index file.
The following table lists the sitemap submission URLs for a few major search engines:
| Search engine | Submission URL | Help page | Market |
|---|---|---|---|
| Baidu | https://zhanzhang.baidu.com/dashboard/index | Baidu Webmaster Dashboard | China, Singapore |
| Bing (and Yahoo!) | https://www.bing.com/webmaster/ping.aspx?siteMap= | Bing Webmaster Tools | Global |
| Yandex | https://webmaster.yandex.com/site/map.xml | Sitemaps files | Russia, Belarus, Kazakhstan, Turkey |
Sitemap URLs submitted using the sitemap submission URLs need to be URL-encoded, for example:
replace : (colon) with %3A,
replace / (slash) with %2F.[9]
Google retired sitemap submissions using URLs in late 2023.[12]
Limitations for search engine indexing
[edit]Sitemaps supplement and do not replace the existing crawl-based mechanisms that search engines already use to discover URLs. Using this protocol does not guarantee that web pages will be included in search indexes, nor does it influence the way that pages are ranked in search results. Specific examples are provided below.
- Google - Webmaster Support on Sitemaps: "Using a sitemap doesn't guarantee that all the items in your sitemap will be crawled and indexed, as Google processes rely on complex algorithms to schedule crawling. However, in most cases, your site will benefit from having a sitemap, and you'll never be penalized for having one."[13]
- Bing - Bing uses the standard sitemaps.org protocol and is very similar to the one mentioned below.
- Yahoo - After the search deal commenced between Yahoo! Inc. and Microsoft, Yahoo! Site Explorer has merged with Bing Webmaster Tools.
Sitemap limits
[edit]Sitemap files have a limit of 50,000 URLs and 50MiB (52,428,800 bytes) per sitemap. Sitemaps can be compressed using gzip, reducing bandwidth consumption. Multiple sitemap files are supported, with a Sitemap index file serving as an entry point. Sitemap index files may not list more than 50,000 Sitemaps and must be no larger than 50MiB and can be compressed. You can have more than one Sitemap index file.[9]
According to Google, a single property in Google Search Console can include up to 500 sitemap index files. Additionally, sitemaps that are referenced in a sitemap index file must be located in the same directory as the sitemap index file, or in a subdirectory lower in the site hierarchy.[14]
Best practice for optimising a sitemap index for search engine crawlability is to ensure the index refers only to sitemaps as opposed to other sitemap indexes. Nesting a sitemap index within a sitemap index is invalid according to Google.[15]
Additional sitemap types
[edit]A number of additional XML sitemap types outside of the scope of the Sitemaps protocol are supported by Google to allow webmasters to provide additional data on the content of their websites. Video and image sitemaps are intended to improve the capability of websites to rank in image and video searches.[16][17]
Video sitemaps
[edit]Video sitemaps indicate data related to embedding and autoplaying, preferred thumbnails to show in search results, publication date, video duration, and other metadata.[17] Video sitemaps are also used to allow search engines to index videos that are embedded on a website, but that are hosted externally, such as on Vimeo or YouTube.
Image sitemaps
[edit]Image sitemaps are used to indicate image metadata, such as licensing information, geographic location, and an image's caption.[16]
Google News Sitemaps
[edit]Google supports a Google News sitemap type for facilitating quick indexing of time-sensitive news subjects.[18][19]
Multilingual and multinational sitemaps
[edit]In December 2011, Google announced the annotations for sites that want to target users in many languages and, optionally, countries. A few months later Google announced, on their official blog,[20] that they are adding support for specifying the rel="alternate" and hreflang annotations in Sitemaps. Instead of the (until then only option) HTML link elements the Sitemaps option offered many advantages which included a smaller page size and easier deployment for some websites.
One example of the multilingual sitemap would be as follows:
If for example we have a site that targets English language users through https://www.example.com/en and Greek language users through https://www.example.com/gr, up until then the only option was to add the hreflang annotation either in the HTTP header or as HTML elements on both URLs like this
<link rel="alternate" hreflang="en" href="https://www.example.com/en" />
<link rel="alternate" hreflang="gr" href="https://www.example.com/gr" />
But now, one can alternatively use the following equivalent markup in Sitemaps:
<url>
<loc>https://www.example.com/en</loc>
<xhtml:link
rel="alternate"
hreflang="gr"
href="https://www.example.com/gr" />
<xhtml:link
rel="alternate"
hreflang="en"
href="https://www.example.com/en" />
</url>
<url>
<loc>https://www.example.com/gr</loc>
<xhtml:link
rel="alternate"
hreflang="gr"
href="https://www.example.com/gr" />
<xhtml:link
rel="alternate"
hreflang="en"
href="https://www.example.com/en" />
</url>
See also
[edit]References
[edit]- ^ Shivakumar, Shiva (2005-06-02). "Google Blog: Webmaster-friendly". Archived from the original on 2005-06-08. Retrieved 2021-12-31.
- ^ "Major Search Engines Unite to Support a Common Mechanism for Website Submission". News from Google. November 16, 2006. Retrieved 2021-12-31.
- ^ Pathak, Vivek (2007-05-11). "The Ask.com Blog: Sitemaps Autodiscovery". Ask's Official Blog. Archived from the original on 2007-05-18. Retrieved 2021-12-31.
- ^ "Information for Public Sector Organizations". Archived from the original on 2007-04-30.
- ^ M.L. Nelson; J.A. Smith; del Campo; H. Van de Sompel; X. Liu (2006). "Efficient, Automated Web Resource Harvesting" (PDF). WIDM'06.
- ^ O. Brandman, J. Cho, Hector Garcia-Molina, and Narayanan Shivakumar (2000). "Crawler-friendly web servers". Proceedings of ACM SIGMETRICS Performance Evaluation Review, Volume 28, Issue 2. doi:10.1145/362883.362894.
{{cite conference}}: CS1 maint: multiple names: authors list (link) - ^ a b c d "Learn about sitemaps | Search Central". Google Developers. Retrieved 2021-06-01.
- ^ "Build and submit a sitemap". Google Developers. Retrieved 2025-10-26.
- ^ a b c d e f "Sitemaps XML format". Sitemaps.org. 2016-11-21. Retrieved 2016-12-01.
- ^ "Build and submit a sitemap". Google Developers. Retrieved 2025-10-26.
- ^ "Build and submit a sitemap - Search Console Help". Support.google.com. Retrieved 30 November 2020.
- ^ "Sitemaps ping endpoint is going away". 2025-04-04. Retrieved 2025-04-04.
- ^ "About Google Sitemaps". 2016-12-01. Retrieved 2016-12-01.
- ^ "Build and submit large sitemaps". Google Developers. Retrieved 2025-10-26.
- ^ "Sitemaps report - Search Console Help". support.google.com. Retrieved 2020-04-15.
- ^ a b "Image Sitemaps". Google Search Console. Retrieved 28 December 2018.
- ^ a b "Video Sitemaps". Google Search Console. Retrieved 28 December 2018.
- ^ Bigby, Garenne. "Why You should be using a Google News Sitemap". Dyno Mapper. Retrieved 28 December 2018.
- ^ "Google News Sitemaps". Google Search Console. Retrieved 28 December 2018.
- ^ "Multilingual and multinational site annotations in Sitemaps". Google Webmaster Central Blog. Pierre Far. May 24, 2012.
External links
[edit]- Official website
- Google news groups
- Sitemaps
- Webmaster help - Sitemap Archived 2006-12-21 at the Wayback Machine
Sitemaps
View on Grokipedia<urlset> and <loc> for each URL (limited to 2,048 characters and from a single host), while optional tags such as <lastmod> (in W3C datetime format), <changefreq> (values like "always," "hourly," "daily," "weekly," "monthly," "yearly," or "never"), and <priority> (a decimal from 0.0 to 1.0, defaulting to 0.5) provide additional guidance for crawlers.[1] Each sitemap file is limited to 50,000 URLs or 50 megabytes (uncompressed), with support for gzip compression, and for larger sites, a separate sitemap index file can reference up to 50,000 individual sitemaps.[1]
Website owners submit sitemaps to search engines via tools like Google Search Console, by adding a directive in the site's robots.txt file, or through HTTP requests, enabling faster discovery of new or updated pages that might lack internal links.[1][2] Benefits include improved indexing for sites with over 500 pages, those featuring rich media like images or videos, news content, or international versions in multiple languages, though small, well-linked sites may not require them.[2] Specialized sitemap variants exist for images, videos, and news, extending the protocol's utility beyond basic URL lists.[2] All sitemaps must be UTF-8 encoded and entity-escaped to ensure compatibility with search engine parsers.[1]
Fundamentals
Definition and Purpose
A sitemap is a file or structured data source that lists the URLs of a website's pages, videos, images, and other files to inform search engines about content available for crawling and indexing.[2][1] Accessing a publicly available sitemap.xml file allows one to view these listed URLs, which can reveal the website's subdirectory structure through the paths included in those URLs.[1] This protocol enables webmasters to provide structured information about site organization and relationships between resources, supplementing traditional link-based discovery methods.[2] The XML format serves as the standard under the official Sitemaps protocol, supported by major search engines including Google, Bing, and Yahoo.[4] The core purposes of sitemaps are to assist search engines in discovering new or updated content that might otherwise be overlooked, especially on large, dynamic, or poorly linked sites. For SEO purposes, an XML sitemap should include a list of all important pages to improve indexing by search engines.[2] They achieve this by including metadata such as the last modification date (History
The concept of sitemaps first emerged in the late 1990s as part of early web design practices aimed at improving user navigation on increasingly complex websites. Publishers and guides, such as the Web Style Guide, recommended including hierarchical site maps—often as simple HTML pages or diagrams—to help visitors understand site structure and locate content efficiently.[6] By the early 2000s, with the rapid growth of search engines, these user-focused maps began evolving toward machine-readable formats to assist automated crawling and indexing, addressing inefficiencies in discovering new or updated pages across large sites. A key milestone came in June 2005 when Google introduced the initial Sitemaps protocol (version 0.84) in XML format, enabling webmasters to submit lists of URLs along with metadata like last modification dates and change frequencies to guide search engine crawlers more effectively.[7] This addressed post-search engine boom challenges, such as incomplete crawling of dynamic or poorly linked content. In November 2006, Google, Yahoo!, and Microsoft jointly announced support for the protocol, formalizing it under version 0.9 and establishing sitemaps.org as the central documentation site managed by a working group of representatives from these companies.[7] The protocol saw rapid extensions to support specialized content: a news extension was added in November 2006 to prioritize timely articles with publication timestamps, followed by image extensions in April 2010 for enhanced media discovery, and video extensions in December 2007 to include details like duration and thumbnails.[8][9][10] These developments were driven by Google engineers, notably Vanessa Fox, who contributed to launching sitemaps.org and building the associated Webmaster Central tools to facilitate adoption.[11] In recent years, the protocol has remained stable with ongoing maintenance by major search engines, though without significant overhauls. A notable change occurred in June 2023 when Google deprecated the Sitemap Ping Endpoint—a mechanism for notifying engines of updates—which ceased functioning by December 2023, encouraging reliance on direct sitemap submissions via tools like robots.txt and accurate lastmod tags for discovery.[3]Core Formats
XML Sitemap Protocol
The XML Sitemap Protocol defines a standardized XML format for listing website URLs to facilitate discovery by search engine crawlers. It specifies a root<urlset> element that encapsulates all entries, with each individual URL represented as a child <url> element. The protocol mandates inclusion of the namespace declaration xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" in the <urlset> tag to ensure compatibility and validation.[1]
Within each <url> element, the <loc> tag is required and contains the canonical URL of the page, limited to 2,048 characters, which includes the full path and thus can potentially expose subdirectory information to anyone accessing the sitemap. Optional elements include <lastmod>, which records the last modification date in W3C Datetime format (equivalent to ISO 8601); <changefreq>, indicating update frequency with values such as "always", "hourly", "daily", "weekly", "monthly", "yearly", or "never"; and <priority>, a floating-point value from 0.0 to 1.0 that suggests relative importance within the site (defaulting to 0.5 if omitted). These components provide metadata hints to crawlers without guaranteeing specific crawling behavior.[1]
Sitemap files following this protocol are typically named sitemap.xml and placed at the website's root directory for easy access. They must be encoded in UTF-8 and adhere to XML 1.0 specifications, with a maximum uncompressed size of 50 megabytes (52,428,800 bytes) and no more than 50,000 URLs per file. Validation against the official schema at http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd ensures conformance, as demonstrated in this basic example for listing URLs:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
</url>
<url>
<loc>http://www.example.com/page1.html</loc>
</url>
</urlset>
```[](https://www.sitemaps.org/protocol.html)
Unlike HTML sitemaps designed for human navigation, the XML format is machine-readable and optimized exclusively for [search engine](/page/Search_engine) processing, omitting any presentational elements. Detailed specifications for individual elements, such as the precise usage of `<loc>`, are covered in the element definitions section.[](https://www.sitemaps.org/protocol.html)
### Element Definitions
The XML Sitemap protocol defines a structured set of elements to describe URLs on a [website](/page/Website), enabling search engines to understand the site's content more efficiently. The [root element](/page/Root_element), `<urlset>`, serves as the container for all URL entries in the file and must include the namespace attribute to reference the protocol standard. Specifically, it is declared as `<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">`, ensuring compliance with the [schema](/page/Schema) for validation. This element encapsulates the entire sitemap and must be the outermost tag, with the file encoded in [UTF-8](/page/UTF-8) to handle international characters properly.[](https://www.sitemaps.org/protocol.html)
Each individual [URL](/page/URL) is represented by the `<url>` element, which acts as a wrapper for the details of a single page or resource. This element is required for every entry and must contain exactly one child `<loc>` element, though it may also include optional sub-elements like `<lastmod>`, `<changefreq>`, and `<priority>`. The `<url>` tag provides a logical grouping, allowing search engines to parse the [sitemap](/page/Site_map) as a list of discrete entries without ambiguity. Multiple `<url>` elements are nested within the `<urlset>`, forming the core body of the file.[](https://www.sitemaps.org/protocol.html)
The `<loc>` element is the mandatory core of each `<url>` entry, specifying the absolute [URL](/page/URL) of the page being referenced. It must be a fully qualified [URL](/page/URL), starting with a protocol such as HTTP or [HTTPS](/page/HTTPS), limited to 2048 characters in length, and excluding fragment identifiers (e.g., no "#section" parts). For instance, a valid `<loc>` might be `<loc>https://www.[example.com](/page/Example.com)/products/widget</loc>`, and all values within the sitemap must be entity-escaped, such as replacing "&" with "&". Relative [URLs](/page/URL) are not permitted, as they prevent universal accessibility across [search engine](/page/Search_engine) crawlers.[](https://www.sitemaps.org/protocol.html)
Optionally, the `<lastmod>` element indicates the date and time of the last significant modification to the page, helping search engines prioritize recrawling. It follows the W3C datetime format, such as `<lastmod>2025-11-09T14:30:00+00:00</lastmod>` for a precise [timestamp](/page/Timestamp) or a simpler `<lastmod>2025-11-09</lastmod>` for just the date (YYYY-MM-DD). This value should reflect content changes rather than metadata updates or [sitemap](/page/Site_map) generation times, and it is distinct from HTTP headers like If-Modified-Since, which search engines may use independently.[](https://www.sitemaps.org/protocol.html)[](http://www.w3.org/TR/NOTE-datetime)
The `<changefreq>` element provides a hint about the expected update frequency of the page, using one of the predefined [enumeration](/page/Enumeration) values: always, hourly, daily, weekly, monthly, yearly, or never. For example, `<changefreq>weekly</changefreq>` suggests moderate changes, guiding crawlers on scheduling but serving only as a non-binding suggestion, as search engines may adjust based on other factors. This element is optional and should be used judiciously to avoid misleading infrequent updates as frequent ones.[](https://www.sitemaps.org/protocol.html)
Similarly optional, the `<priority>` element assigns a relative importance score to the URL within the context of the same [website](/page/Website), expressed as a [decimal](/page/Decimal) value from 0.0 (lowest) to 1.0 (highest), with a default of 0.5 if omitted. An example is `<priority>0.8</priority>`, indicating higher priority than the site average but not implying global ranking influence across different sites. Priorities are site-relative only, and setting all entries to 1.0 negates any useful differentiation.[](https://www.sitemaps.org/protocol.html)
A complete example of a `<url>` entry incorporating all elements for a hypothetical page might appear as follows:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
</url>
<url>
<loc>http://www.example.com/page1.html</loc>
</url>
</urlset>
```[](https://www.sitemaps.org/protocol.html)
Unlike HTML sitemaps designed for human navigation, the XML format is machine-readable and optimized exclusively for [search engine](/page/Search_engine) processing, omitting any presentational elements. Detailed specifications for individual elements, such as the precise usage of `<loc>`, are covered in the element definitions section.[](https://www.sitemaps.org/protocol.html)
### Element Definitions
The XML Sitemap protocol defines a structured set of elements to describe URLs on a [website](/page/Website), enabling search engines to understand the site's content more efficiently. The [root element](/page/Root_element), `<urlset>`, serves as the container for all URL entries in the file and must include the namespace attribute to reference the protocol standard. Specifically, it is declared as `<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">`, ensuring compliance with the [schema](/page/Schema) for validation. This element encapsulates the entire sitemap and must be the outermost tag, with the file encoded in [UTF-8](/page/UTF-8) to handle international characters properly.[](https://www.sitemaps.org/protocol.html)
Each individual [URL](/page/URL) is represented by the `<url>` element, which acts as a wrapper for the details of a single page or resource. This element is required for every entry and must contain exactly one child `<loc>` element, though it may also include optional sub-elements like `<lastmod>`, `<changefreq>`, and `<priority>`. The `<url>` tag provides a logical grouping, allowing search engines to parse the [sitemap](/page/Site_map) as a list of discrete entries without ambiguity. Multiple `<url>` elements are nested within the `<urlset>`, forming the core body of the file.[](https://www.sitemaps.org/protocol.html)
The `<loc>` element is the mandatory core of each `<url>` entry, specifying the absolute [URL](/page/URL) of the page being referenced. It must be a fully qualified [URL](/page/URL), starting with a protocol such as HTTP or [HTTPS](/page/HTTPS), limited to 2048 characters in length, and excluding fragment identifiers (e.g., no "#section" parts). For instance, a valid `<loc>` might be `<loc>https://www.[example.com](/page/Example.com)/products/widget</loc>`, and all values within the sitemap must be entity-escaped, such as replacing "&" with "&". Relative [URLs](/page/URL) are not permitted, as they prevent universal accessibility across [search engine](/page/Search_engine) crawlers.[](https://www.sitemaps.org/protocol.html)
Optionally, the `<lastmod>` element indicates the date and time of the last significant modification to the page, helping search engines prioritize recrawling. It follows the W3C datetime format, such as `<lastmod>2025-11-09T14:30:00+00:00</lastmod>` for a precise [timestamp](/page/Timestamp) or a simpler `<lastmod>2025-11-09</lastmod>` for just the date (YYYY-MM-DD). This value should reflect content changes rather than metadata updates or [sitemap](/page/Site_map) generation times, and it is distinct from HTTP headers like If-Modified-Since, which search engines may use independently.[](https://www.sitemaps.org/protocol.html)[](http://www.w3.org/TR/NOTE-datetime)
The `<changefreq>` element provides a hint about the expected update frequency of the page, using one of the predefined [enumeration](/page/Enumeration) values: always, hourly, daily, weekly, monthly, yearly, or never. For example, `<changefreq>weekly</changefreq>` suggests moderate changes, guiding crawlers on scheduling but serving only as a non-binding suggestion, as search engines may adjust based on other factors. This element is optional and should be used judiciously to avoid misleading infrequent updates as frequent ones.[](https://www.sitemaps.org/protocol.html)
Similarly optional, the `<priority>` element assigns a relative importance score to the URL within the context of the same [website](/page/Website), expressed as a [decimal](/page/Decimal) value from 0.0 (lowest) to 1.0 (highest), with a default of 0.5 if omitted. An example is `<priority>0.8</priority>`, indicating higher priority than the site average but not implying global ranking influence across different sites. Priorities are site-relative only, and setting all entries to 1.0 negates any useful differentiation.[](https://www.sitemaps.org/protocol.html)
A complete example of a `<url>` entry incorporating all elements for a hypothetical page might appear as follows:
0.8
<urlset> for the full sitemap file.[1]
Common errors in implementing these elements include using invalid date formats in <lastmod>, such as non-W3C compliant strings like "11/09/2025", which may cause search engines to ignore the value; providing relative URLs in <loc>, like "/products/widget" instead of a full absolute path; or exceeding the 2048-character limit for <loc>, leading to truncation or rejection of the entry. Additionally, failing to entity-escape special characters or omitting the required <loc> within a <url> can render the sitemap unparseable.[1]
Alternative Formats
Plain Text Sitemaps
Plain text sitemaps provide a basic method for listing website URLs in a non-structured format, consisting of a single text file with one absolute URL per line and no accompanying metadata such as last modification dates, change frequencies, or priorities. These files must use the .txt extension and be encoded in UTF-8 to ensure proper parsing by search engine crawlers.[12] This format is particularly suitable for small websites or legacy systems requiring minimal maintenance, as it avoids the complexity of XML tagging while still enabling basic URL discovery. Both Google and Bing officially support plain text sitemaps for crawling and indexing purposes, allowing webmasters to notify search engines of site content without advanced features.[12][13] To create a plain text sitemap, webmasters can use any standard text editor to compile a list of absolute URLs, ensuring the file does not exceed 50,000 URLs or 50 MB in uncompressed size; for larger sites, multiple files can be generated and referenced accordingly. For instance, a simple three-page site might use the following content in its sitemap.txt file:https://www.example.com/
https://www.example.com/about.html
https://www.example.com/contact.html
https://www.example.com/
https://www.example.com/about.html
https://www.example.com/contact.html
RSS and Atom Feeds
RSS and Atom feeds, originally designed for web syndication, can be adapted to function as sitemaps by search engines when they include elements pointing to site URLs. This adaptation allows feeds in RSS 2.0 or Atom 0.3/1.0 formats to notify crawlers of available pages, particularly useful for sites already generating such feeds for content distribution. Google began supporting RSS and Atom feeds as sitemaps in September 2005, enabling publishers to leverage existing infrastructure for improved discoverability.[15][1] Key requirements for using these feeds as sitemaps include embedding full, absolute URLs to site pages via the<link> element in RSS or Atom entries, rather than relying solely on feed item descriptions or relative paths. Additionally, including a modification timestamp—such as <pubDate> in RSS or <updated> in Atom—helps search engines prioritize crawling based on recency. Feeds should be placed in the site's root directory to facilitate easy discovery by crawlers, and they must adhere to the respective syndication standards while serving sitemap purposes.[1][12]
One primary advantage of RSS and Atom feeds as sitemaps is their ability to provide automatic updates for dynamic content, such as blog posts or news articles, ensuring search engines receive notifications of changes without manual intervention. This dual-purpose functionality benefits both end-users subscribing to content updates and search engine crawlers seeking fresh URLs, making it ideal for frequently updated sites like blogs or news portals.[16][13]
However, RSS and Atom feeds have notable limitations when used as sitemaps, as they typically only encompass recent content—often the last 10 to 500 items—rather than an exhaustive list of all site pages. Unlike dedicated XML sitemaps, they lack support for priority levels or change frequency indicators, which can reduce their effectiveness for comprehensive site mapping. For instance, a basic RSS feed adapted for sitemap use might resemble the following snippet, where <link> elements point to full URLs and <pubDate> provides timestamps:
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>Example Site</title>
<link>[https](/page/HTTPS)://www.example.com/</link>
<description>Site description</description>
<pubDate>Sat, 01 Jan 2025 00:00:00 GMT</pubDate>
<item>
<title>Article Title</title>
<link>[https](/page/HTTPS)://www.example.com/article1</link>
<pubDate>Sat, 01 Jan 2025 12:00:00 GMT</pubDate>
<description>Article summary</description>
</item>
</channel>
</rss>
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>Example Site</title>
<link>[https](/page/HTTPS)://www.example.com/</link>
<description>Site description</description>
<pubDate>Sat, 01 Jan 2025 00:00:00 GMT</pubDate>
<item>
<title>Article Title</title>
<link>[https](/page/HTTPS)://www.example.com/article1</link>
<pubDate>Sat, 01 Jan 2025 12:00:00 GMT</pubDate>
<description>Article summary</description>
</item>
</channel>
</rss>
Submission and Indexing
Submitting to Search Engines
Sitemaps can be submitted to search engines through two primary methods: automatic discovery by placing the file at the website's root directory or referencing it in the robots.txt file, and direct submission via dedicated webmaster tools. Automatic discovery allows search engine crawlers to locate the sitemap without manual intervention; for instance, adding a line likeSitemap: https://example.com/sitemap.xml to the robots.txt file enables major engines to find and process it during routine crawls.[19][1] Direct submission provides more control and immediate notification, typically through web-based consoles where site owners verify ownership before adding the sitemap URL.[12]
For Google, sitemaps are submitted via Google Search Console by navigating to the Sitemaps section, entering the sitemap URL (or index file), and clicking submit; this method is recommended over deprecated alternatives and is a key step for better SEO outcomes, such as faster indexing of pages.[12] Bing accepts submissions through Bing Webmaster Tools under the Sitemaps tool, where users paste the sitemap URL and submit it after site verification.[13] Yandex uses its Webmaster Tools, selecting Indexing > Sitemap files to enter and submit the sitemap URL. These consoles support sitemap index files, which consolidate multiple sitemaps into a single reference file for easier management of large sites; engines process the index to access individual sitemaps.[1] Sitemaps must be accessible via HTTP or HTTPS protocols, ensuring crawlers can fetch them without authentication or redirection issues.[12]
A notable change occurred with the retirement of Google's sitemap ping endpoint in late 2023, where notifications via http://www.google.com/ping?sitemap=URL ceased to function, shifting emphasis to console submissions and auto-discovery for efficient crawling signals.[3] Tools facilitate submission for non-technical users; for example, the Yoast SEO plugin for WordPress automatically generates and enables XML sitemaps, integrating submission options directly within the dashboard for seamless delivery to search engines.[20] Online generators like XML-Sitemaps.com allow users to create and download sitemaps, which can then be uploaded to the root directory or submitted manually.[21]
Verification of submission occurs through console reports, which display processing status, last access dates, discovered URLs, and any errors such as invalid formats or access issues.[22] For dynamic sites with frequent content updates, such as news platforms, resubmitting the sitemap daily ensures timely crawling of new pages, while static sites may require updates only after significant changes.[12] Multi-engine support follows unified guidelines from sitemaps.org, which outline compatible formats and encourage cross-submission to engines like Google, Bing, and Yandex for broader indexing coverage.[1]
Indexing Limitations
Sitemaps serve as suggestions to search engines about URLs available for crawling and potential indexing, but they do not guarantee that any listed pages will be included in search results. Search engines like Google evaluate each URL based on factors such as content quality, duplication, relevance, and adherence to webmaster guidelines, often prioritizing high-value pages within limited crawl budgets. For instance, Google's crawl budget allocates resources based on site size, update frequency, and server performance, meaning even sitemap-submitted URLs may remain unvisited if resources are constrained.[2][12][2] Several key constraints can prevent indexing despite sitemap inclusion. Pages marked with a noindex meta tag or HTTP header directive will not be indexed, as this explicitly signals search engines to exclude them from results, overriding any sitemap recommendation. Similarly, resources blocked by robots.txt directives remain inaccessible for crawling, and sitemaps cannot bypass these restrictions—search engines respect disallow rules and will not fetch or index such content. Low-value or thin content, such as duplicate pages or those lacking substantial user benefit, is also frequently ignored, as engines apply policies to maintain result quality.[23][24][25] In terms of effectiveness, sitemaps primarily accelerate discovery for new or orphaned pages that lack strong internal or external links, potentially reducing the time to indexing compared to reliance on natural crawling alone. However, for sites with robust linking structures, the impact on overall indexing rates is often minimal, as search engines already efficiently traverse well-connected content.[2][26] Common pitfalls further limit sitemap utility. Including non-canonical URLs or pages with noindex directives can trigger warnings or rejection of the sitemap file, wasting processing resources and potentially harming crawl efficiency. Over-submission of unchanged sitemaps consumes unnecessary quota in webmaster tools and may dilute focus on truly updated content, indirectly straining crawl budgets.[16][27] Engine-specific behaviors highlight varying reliance on sitemaps. Bing places greater emphasis on sitemaps for comprehensive discovery in large or deep sites, using them to ensure full URL coverage amid AI-powered search demands. As of 2025, major engines like Google have intensified focus on content quality over URL quantity, with core updates penalizing low-value content and rewarding signals of authoritative, user-focused pages.[28][29]Specifications and Limits
Size and URL Constraints
Sitemaps adhere to strict size and content constraints to ensure efficient processing by search engine crawlers. According to the official Sitemaps protocol, each individual sitemap file is limited to a maximum of 50,000 URLs and must not exceed 50 MB (52,428,800 bytes) in uncompressed size.[1] These limits apply to the XML content before any compression, helping to prevent overload on server resources during crawling. Additionally, each URL specified in the<loc> element must be fewer than 2,048 characters in length, and all URLs within a sitemap must belong to the same host as the sitemap file itself.[1]
For sites exceeding these per-file limits, the protocol recommends using a sitemap index file, which employs the <sitemapindex> root element to reference up to 50,000 individual sitemap files, each conforming to the standard constraints.[1] The index file itself is also capped at 50 MB uncompressed. Sitemap indexes must only link to sitemaps on the same site, enabling scalable organization without violating core limits. Major search engines like Google and Bing enforce these 50,000 URL and 50 MB thresholds strictly to maintain crawling efficiency.[30][31]
Yandex enforces the standard limits of 50,000 URLs and 50 MB uncompressed per sitemap file, recommending the use of sitemap index files for larger sites.[32] To manage large-scale sites within these bounds, sitemaps can be compressed using gzip, which typically reduces file sizes by 60-90% for XML content, aiding efficient transmission, and divided into logical subsets such as dated archives (e.g., sitemap-2025-11.xml) or categorized collections (e.g., sitemap-products.xml). The protocol advises against including redirecting URLs or those with excessive parameters in sitemaps, as they may lead to processing errors, emphasizing canonical, direct links instead.[1]
Best Practices
To create effective sitemaps, automate their generation using content management system (CMS) plugins like Yoast SEO for WordPress or tools such as Screaming Frog for broader sites, ensuring dynamic updates for large inventories without manual intervention.[26][12] Include only canonical, indexable URLs—such as primary versions of pages with absolute paths likehttps://www.example.com/product-page.html—while excluding duplicates, redirects, or non-public content to guide crawlers efficiently.[12][26] Always update the <lastmod> element with precise, verifiable dates in ISO 8601 format (e.g., 2025-11-09) to signal recent changes and prioritize recrawling.[12]
For maintenance, resubmit sitemaps to search engines via Google Search Console or robots.txt after significant site updates, such as adding new content or restructuring, to prompt fresh crawling.[12] Regularly monitor for errors in Search Console's Sitemaps report, addressing issues like fetch failures or invalid URLs promptly to maintain crawl efficiency.[12] Avoid including pages marked with noindex directives, as this can confuse crawlers and dilute the sitemap's value.[12][26]
Optimization involves using <priority> and <changefreq> elements judiciously, though Google ignores them in favor of other signals; reserve higher priorities (e.g., 0.8-1.0) for high-value pages like homepages or key landing pages if targeting engines beyond Google.[12] Prioritize inclusion of revenue-driving or user-critical pages to focus crawler budget on impactful content.[26] Integrate sitemaps with schema markup on individual pages—such as Product or Article schemas—to enhance rich result eligibility, as sitemaps alone do not embed structured data.[12]
In 2025, ensure sitemap compatibility with mobile-first indexing by listing a single preferred URL version (mobile or responsive) per entry, avoiding separate desktop/mobile variants to align with Google's primary rendering focus.[12] Test sitemap URLs using Google Search Console's URL Inspection tool to verify crawlability and indexing status before submission.[12]
Track key metrics like indexing rates and error percentages through Google Search Console, aiming to keep error rates below 10% by resolving issues such as malformed XML or inaccessible files, which directly correlates with improved discoverability.[12][26]
For e-commerce sites, create separate sitemaps for product catalogs to manage large volumes (e.g., one for active inventory, another for images), respecting size limits while highlighting seasonal or high-traffic items.[26] News sites should refresh sitemaps weekly—or more frequently for breaking content—to include recent articles, ensuring timely indexing without exceeding per-sitemap URL caps.[26][12]
Specialized Types
Image and Video Sitemaps
Image and video sitemaps extend the standard XML sitemap protocol to provide search engines with detailed information about media content on a website, facilitating better discovery and indexing of images and videos. These extensions use dedicated namespaces and elements that can be embedded directly within the<url> tags of a conventional sitemap or housed in separate files, such as sitemap-images.xml or sitemap-videos.xml. By including media-specific metadata, these sitemaps help prioritize content for rich search features, such as thumbnails and enhanced previews, improving visibility in image and video search results.[33][34]
For images, the extensions are defined in the namespace http://www.google.com/schemas/sitemap-image/1.1. The core structure involves the <image:image> element, which encapsulates details for a single image and can appear multiple times under each <url>. The required <image:loc> element specifies the absolute URL of the image file itself. Historically, additional elements like <image:title> for a short descriptive title, <image:caption> for contextual text, and <image:geo_location> for latitude and longitude coordinates were supported to enrich image understanding; however, these have been deprecated since August 2022 in favor of simpler structures and alternative best practices like descriptive alt text in HTML. Up to 1,000 <image:image> entries are permitted per <url>, allowing sites with image galleries to associate multiple assets with a single page.[33][35]
The following XML snippet illustrates an embedded image extension for a page featuring a gallery:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>https://example.com/gallery-page.html</loc>
<lastmod>2025-11-09</lastmod>
<image:image>
<image:loc>https://example.com/images/photo1.jpg</image:loc>
</image:image>
<image:image>
<image:loc>https://example.com/images/photo2.jpg</image:loc>
</image:image>
</url>
</urlset>
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>https://example.com/gallery-page.html</loc>
<lastmod>2025-11-09</lastmod>
<image:image>
<image:loc>https://example.com/images/photo1.jpg</image:loc>
</image:image>
<image:image>
<image:loc>https://example.com/images/photo2.jpg</image:loc>
</image:image>
</url>
</urlset>
<image:loc> ensures compatibility while aiding Google in discovering images that might be loaded dynamically via JavaScript or hidden from standard crawling. This approach enhances the potential for images to appear as thumbnails in search results, driving more targeted traffic to media-rich pages.[33][36]
Video sitemaps, similarly, leverage the namespace http://www.google.com/schemas/sitemap-video/1.1 and wrap content in the <video:video> element, which supports up to 1,000 instances per <url>. Essential tags include <video:content_loc>, which points to the direct URL of the video file in supported formats like MP4 or WebM; <video:thumbnail_loc> for a representative image preview; <video:title> for a brief, engaging name; <video:description> for a summary of the content; and <video:duration>, specified as an integer value in seconds representing the video's length. These elements provide context that helps search engines evaluate relevance and quality for video-specific queries.[34]
An example of a video extension within a standard sitemap entry for a page hosting a tutorial video is shown below:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
<url>
<loc>[https](/page/HTTPS)://example.com/video-tutorial.html</loc>
<lastmod>2025-11-09</lastmod>
<video:video>
<video:content_loc>[https](/page/HTTPS)://example.com/videos/tutorial.mp4</video:content_loc>
<video:thumbnail_loc>[https](/page/HTTPS)://example.com/thumbs/tutorial.jpg</video:thumbnail_loc>
<video:title>Tutorial on [Web Development](/page/Web_development)</video:title>
<video:description>A beginner's guide to building websites with [HTML](/page/HTML) and CSS.</video:description>
<video:duration>300</video:duration>
</video:video>
</url>
</urlset>
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
<url>
<loc>[https](/page/HTTPS)://example.com/video-tutorial.html</loc>
<lastmod>2025-11-09</lastmod>
<video:video>
<video:content_loc>[https](/page/HTTPS)://example.com/videos/tutorial.mp4</video:content_loc>
<video:thumbnail_loc>[https](/page/HTTPS)://example.com/thumbs/tutorial.jpg</video:thumbnail_loc>
<video:title>Tutorial on [Web Development](/page/Web_development)</video:title>
<video:description>A beginner's guide to building websites with [HTML](/page/HTML) and CSS.</video:description>
<video:duration>300</video:duration>
</video:video>
</url>
</urlset>
News Sitemaps
News sitemaps are a specialized extension of the standard XML sitemap protocol designed specifically for news publishers to accelerate the discovery and indexing of timely articles by search engines like Google News.[38] They utilize the namespacehttp://www.google.com/schemas/sitemap-news/0.9 to incorporate news-specific metadata within each <url> entry, enabling faster crawling of fresh content that meets strict timeliness criteria.[39] This format helps ensure that breaking news appears promptly in search results and news aggregators, prioritizing content relevance and recency over general web pages.[38]
The core structure of a news sitemap embeds a <news:news> parent element inside each <url> tag, which contains required sub-elements for publication details and article metadata. The <news:publication> element is mandatory and includes <news:name>, specifying the exact publication name as recognized on news.google.com (without parentheses or variations), and <news:language>, using an ISO 639-1 or ISO 639-2 code such as "en" or "zh-cn".[38] Additionally, <news:publication_date> must be provided in W3C datetime format (e.g., "2025-11-09" or "2025-11-09T12:00:00-08:00") to indicate the article's ISO 8601-compliant publication time, while <news:title> captures the article's headline in plain text.[40] Optional elements enhance discoverability, such as <news:keywords> for up to five comma-separated terms relevant to the content (e.g., "election, politics, results"), and <news:geo_targeting> using ISO 3166-1 alpha-2 codes like "US" for location-specific targeting.[38]
To qualify for inclusion, news sitemaps must adhere to stringent requirements: articles can only be listed if published within the last 48 hours. Approval in the Google Publisher Center is recommended for publishers seeking full inclusion in Google News features, where they can verify ownership and manage content.[38][41] Keywords should be limited to fewer than five terms to maintain focus, avoiding overly broad or unrelated phrases.[38] Sitemaps are capped at 1,000 <news:news> entries each, with no support for <priority> or <changefreq> tags, as these are irrelevant for ephemeral news content; exceeding limits requires splitting into multiple files via a sitemap index.[38] Publishers are encouraged to update sitemaps hourly or as new articles publish to reflect real-time news flows, removing outdated entries promptly.[38]
The primary purpose of news sitemaps is to fast-track indexing in Google News, signaling high-priority content for immediate crawling and reducing latency in surfacing breaking stories.[38] They also support accelerated mobile pages (AMP) through the optional <news:amp> tag, which points to a mobile-optimized AMP version of the article URL, improving load times on devices.[38]
For a breaking news article, a representative XML snippet might appear as follows, incorporating keywords and geo-targeting for a U.S. election story:
<url>
<loc>[https](/page/HTTPS)://example.com/2025-election-results</loc>
<news:news>
<news:publication>
<news:name>Example News</news:name>
<news:language>en</news:language>
</news:publication>
<news:publication_date>2025-11-09T08:00:00-05:00</news:publication_date>
<news:title>2025 Election: Key Results and Analysis</news:title>
<news:keywords>[election](/page/Election), results, politics, vote</news:keywords>
<news:geo_targeting>[US](/page/United_States)</news:geo_targeting>
<news:amp>[https](/page/HTTPS)://example.com/amp/2025-election-results</news:amp>
</news:news>
</url>
<url>
<loc>[https](/page/HTTPS)://example.com/2025-election-results</loc>
<news:news>
<news:publication>
<news:name>Example News</news:name>
<news:language>en</news:language>
</news:publication>
<news:publication_date>2025-11-09T08:00:00-05:00</news:publication_date>
<news:title>2025 Election: Key Results and Analysis</news:title>
<news:keywords>[election](/page/Election), results, politics, vote</news:keywords>
<news:geo_targeting>[US](/page/United_States)</news:geo_targeting>
<news:amp>[https](/page/HTTPS)://example.com/amp/2025-election-results</news:amp>
</news:news>
</url>
Advanced Configurations
Multilingual Support
Sitemaps support multilingual websites through the integration of hreflang annotations, which allow webmasters to specify alternate language and regional versions of pages directly within the XML structure. This is achieved by including<xhtml:link> elements as children of each <url> entry, using the rel="alternate" attribute paired with hreflang to indicate the language or locale (e.g., hreflang="en" for English or hreflang="es" for Spanish).[42] These annotations must be bidirectional, meaning each variant page links to all others in the set, including a self-referential link to its own URL. The sitemap namespace must include the XHTML extension: xmlns:xhtml="http://www.w3.org/1999/xhtml".[42]
Webmasters can approach multilingual sitemaps in two primary ways: using a single sitemap file that encompasses all language variants or creating separate sitemap files for each language, which are then linked together via a sitemap index file. The single-file method consolidates all <url> entries with their respective <xhtml:link> annotations, making it suitable for smaller sites, while separate files improve organization for larger, language-diverse sites and can reference the index for submission to search engines.[43] Best practices include always adding self-referential hreflang tags (e.g., pointing back to the page's own <loc>), supporting region-specific codes like en-US for American English versus en-GB for British English, and incorporating a default variant with hreflang="x-default" for users whose language or region does not match any specified alternate. Fully qualified absolute URLs should be used in all <loc> and <xhtml:link href> attributes to avoid resolution issues.[42]
Key challenges in implementing multilingual sitemaps involve ensuring consistency and avoiding errors that could lead search engines to ignore the annotations. For instance, languages must not be mixed within a single <url> entry; each entry should represent one primary language version with links to alternates. Incorrect language codes (using ISO 639-1 for languages and ISO 3166-1 Alpha 2 for regions) or missing bidirectional links can invalidate the cluster. Validation is essential and can be performed using tools like Google's URL Inspection tool in Search Console to check if hreflang signals are recognized during crawling, or third-party validators such as the Hreflang Tags Testing Tool from TechnicalSEO.com.[42][44]
An example XML snippet for a sitemap entry supporting English and Spanish variants of a page might look like this:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://example.com/en/article/</loc>
<xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/article/" />
<xhtml:link rel="alternate" hreflang="es" href="https://example.com/es/article/" />
<xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/en/article/" />
</url>
<url>
<loc>https://example.com/es/article/</loc>
<xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/article/" />
<xhtml:link rel="alternate" hreflang="es" href="https://example.com/es/article/" />
<xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/en/article/" />
</url>
</urlset>
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://example.com/en/article/</loc>
<xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/article/" />
<xhtml:link rel="alternate" hreflang="es" href="https://example.com/es/article/" />
<xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/en/article/" />
</url>
<url>
<loc>https://example.com/es/article/</loc>
<xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/article/" />
<xhtml:link rel="alternate" hreflang="es" href="https://example.com/es/article/" />
<xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/en/article/" />
</url>
</urlset>
Sitemap Indexes
Sitemap indexes enable large-scale websites to organize and reference multiple individual sitemap files, addressing the protocol's constraints on file size and URL count. They serve as a central hub for managing extensive URL inventories, such as those exceeding 50,000 URLs, by linking to category-specific or segmented sitemaps like those for products, blog posts, or images. This approach facilitates efficient crawling and indexing for search engines, particularly on enterprise sites with millions of pages.[1][30] The structure of a sitemap index file uses an XML root element<sitemapindex> with the namespace http://www.sitemaps.org/schemas/sitemap/0.9, containing one or more <sitemap> child elements. Each <sitemap> must include a <loc> element specifying the URL of an individual sitemap file, and may optionally include a <lastmod> element in W3C datetime format to indicate the last modification date of that sitemap. All files must be UTF-8 encoded, and the referenced sitemaps must belong to the same site as the index. This format has been supported since the initial protocol version 0.9.[1]
Implementation involves naming the index file conventionally as sitemap_index.xml (or similar, such as sitemap-index.xml) and placing it in the website's root directory for automatic discoverability by search engines, which commonly check for standard sitemap locations like /sitemap.xml or /sitemap_index.xml. Sitemaps referenced in the index should reside in the same directory or a subdirectory relative to the index file to ensure proper hierarchy. For submission, the index file URL is provided to search engines, which then process the linked sitemaps.[1]
Limits for sitemap indexes include a maximum of 50,000 <sitemap> entries per index file and a total uncompressed file size of 50 MB (or equivalent when gzipped). Google limits the number of sitemap index files that can be submitted per site to 500 via Search Console.[30] Recursive indexing—where an index links to another index—is permitted by the protocol but supported only to limited depths by major search engines; for instance, Google processes up to one level of nesting (index to index to sitemaps) but does not recommend structures beyond two levels to avoid processing inefficiencies.[1][30]
The following example illustrates a basic sitemap index file linking to three sub-sitemaps for products, blog, and images:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://www.example.com/sitemaps/products.xml</loc>
<lastmod>2025-11-01</lastmod>
</sitemap>
<sitemap>
<loc>https://www.example.com/sitemaps/blog.xml</loc>
<lastmod>2025-11-08</lastmod>
</sitemap>
<sitemap>
<loc>https://www.example.com/sitemaps/images.xml</loc>
<lastmod>2025-11-09</lastmod>
</sitemap>
</sitemapindex>
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://www.example.com/sitemaps/products.xml</loc>
<lastmod>2025-11-01</lastmod>
</sitemap>
<sitemap>
<loc>https://www.example.com/sitemaps/blog.xml</loc>
<lastmod>2025-11-08</lastmod>
</sitemap>
<sitemap>
<loc>https://www.example.com/sitemaps/images.xml</loc>
<lastmod>2025-11-09</lastmod>
</sitemap>
</sitemapindex>
References
- https://developers.[google](/page/Google).com/search/blog/2014/10/best-practices-for-xml-sitemaps-rssatom
- https://developers.[google](/page/Google).com/search/docs/crawling-indexing/sitemaps/large-sitemaps
