Everything you need to know about multilingual and multinational sitemaps

Last Updated On

Table of contents

Foreword

There are many websites that support multiple languages. There are cases when websites have multiple countries with multiple languages. These different language versions might be just variations of the main website or exist as a standalone application. Even though they all have a lot in common and share most of the functionality, their behavior may vary. Especially content part - each case has its own restrictions, rules, and regulations. As a result, each page will be different from the other and search engine like Google should be able to index website correctly.

This can be solved with a sitemap. If you haven't heard about sitemap before, It basically a way to tell Google to go and look for specific pages to crawl. There are two ways to implement sitemaps for multilingual and multinational websites:

  • using one global sitemap (alternate language pages)
  • have multiple sitemaps for each country-language.

URL structure

Before we start looking at the specific sitemap implementation, I want to highlight one important concept - URL structure for multilingual and multinational sitemaps. It's very important to do it properly in the very beginning since it might be challenging to change it in future (although it's also possible).

Each specific version of the website should have a language-country code in the URL. The preferred method (used by a lot of companies, but not required) is the following:

  • http://www.example.com/en/index.html - English version of the website
  • http://www.example.com/us/en/index.html - United States English version of the website
  • http://www.example.com/de/de/index.html - German version for Germany
  • http://www.example.com/be/fe/index.html - French version for Belgium

In the example above I use the language-locale codes for different countries and languages:

  • "en" for English
  • "us" for the United States
  • "de" for Germany and German
  • "fr" for French (also can be used for France)
  • "be" for Belgium

There are multiple resources where you can find country-language codes for your needs:

After becoming familiar with the concept of URLs, let's take a look at the actual ways of adding sitemaps.

One global sitemap approach

This approach is especially useful when all country/language versions are variations of the main website. The solution is also mentioned in Google SEO section and known as an indication of alternate language pages.

The core idea is to have a base site with the main language. Very often it's an English version with the following URL structure - http://www.example.com/en/index.html. All other websites are considered to be alternatives to the main site, for example, http://www.example.com/de/index.html, http://www.example.com/fr/index.html, etc.

With this approach it's possible ot have only one sitemap for all country-language variations. It should be located at the base level: http://www.example.com/sitemap.xml. Sitemap itself should look in the following way:

As you may recognize from the previous section we used locale country codes in the hreflang attribute.

Per each URL it's required to have <url></url> block with <loc> tag indicating the page URLs of the "base site", plus alternative urls for each language version.

The downside of this approach is that all variations are tightly coupled to the base version. If the main site wouldn't contain the specific page and its language version will, it won't be indexed. All websites should have the same amount of pages to be indexed properly.

Multiple sitemaps approach

In some cases, it's not possible to have one sitemap for all countries and languages. It might be a case when sites are more independent and have their own pages that don't exist in the other siblings. In this case, we are talking about country-language sitemaps (in this context I also call them individual sitemaps).

Create individual sitemaps

In this situation, each country-language sitemap (in this context it is also called individual sitemap) should be hosted in its own subfolder. For example English sitemap at http://www.example.com/us/en and Spanish sitemap at http://www.example.com/us/es/. In this case, individual sitemaps have no difference compare to the sitemaps of simple one language website. Here is an example of sitemap for our "Spanish version":

However, we can end up having a lot of sitemaps the different country-language versions, for example:

  • http://www.example.com/us/en/sitemap.xml
  • http://www.example.com/us/es/sitemap.xml
  • http://www.example.com/be/fr/sitemap.xml
  • http://www.example.com/be/ch/sitemap.xml
  • ….

All of them should be submitted to the search engines. Doing that manually requires a lot of work. For that purpose, there is another concept of sitemap index file.

Create sitemap index

It's a file that stores all references for individual sitemaps to simplify submission process. The piece of a sitemap index looks in the following way:

As you may notice, all reference URLs are stored in the tag.

Sitemap index should be stored under the root folder of the entire website.

Having sitemap index allows submit all sitemaps at once. It can be done through the Google Webmasters Console or by triggering the following URL: <searchengine_URL>/ping?sitemap=http://www.example.com/sitemap.xml.

That's pretty much it. I hope this article helped you better understand how to manage multilingual and multinational sitemaps. Happy coding :-) .