Introduction

There are quite a few myths related to Single Page Application (SPA) and SEO, for example:

Today’s reality is that both SPA and SEO can easily coexist and grow next to each other. The best way to dispel the surrounding myths is to present the evidence to the contrary:

SPA Website

Note

Declared Sitemap

Proof of Indexing

crisp-react.winwiz1.com

Demo website.
Full-stack deployment.

sitemap.xml

Link
Screenshot

jamstack.winwiz1.com

Demo website.
Jamstack deployment.

sitemap.xml

Link
Screenshot

virusquery.com

Production website.

sitemap.xml

Link
Screenshot

The Links in the table point to Google searches based on the site: keyword. Only indexed pages can appear in the search results, so if a page does show up then you know it has been indexed. However, not all indexed pages are guaranteed to be included. If a page doesn’t appear, you can perform a more specific site: search for the given page only, like this.

The Screenshots shown in the table come from Google Search Console (GSC). It’s the primary tool to request and verify indexing.

Some screenshots show the indexed pages as “Indexed, not submitted in sitemap”. It’s confusing since the pages are actually included in sitemap.xml. This wording means Googlebot has initially discovered the pages without the sitemap and keeps crawling these without regards to the available sitemap.xml.

The rest of this article explains the steps aimed at getting a similar GSC screen for your SPA website.

Groundwork

Prerequisites

Deploy a SPA

In this section we are going to deploy a React SPA to Cloudflare Pages using the following steps:

Use Menu > Pages > Create a project. You will be asked to authorize read-only access to your GitHub repositories with an option to narrow the access to specific repositories. Select the repository which you pushed to GitHub at the previous step and on the "Set up builds and deployments" screen, provide the following information:

Configuration option

Value

Production branch

master

Build command

yarn build:jamstack

Build output directory

client/dist

Add the following environment variable:

Environment variable

Value

NODE_VERSION

16.13.1

Optionally, you can customize the "Project name" field. It defaults to the GitHub repository name and is used to create a 'per-project subdomain' e.g. <project-name>.pages.dev.

After completing the configuration, click on the "Save and Deploy" button. When the deployment pipeline finishes, the website will be partially functional. Point a browser to https://<project-name>.pages.dev/first to check the site is online.

Type

Name

IPv6 Address

AAAA

subdomain

100::

Replace subdomain with either a subdomain name e.g. jamstack to deploy to jamstack.<your-domain>.com or apex e.g. @ if you prefer to use the root domain. Check the "Proxy status" of the record is set to "Proxied".

The "Proxied" status ensures the DNS record won't become public. Cloudflare will create another public DNS record that ensures the requests for subdomain are routed to Cloudflare datacenters. Once handled there, the requests would have been dropped since the address 100:: is used to discard traffic. But in reality, the requests will be handled by the Worker we are about to create at the next step.

Request Indexing

In this section, we’ll ask Google to:

  1. Confirm that each SPA page can be indexed,
  2. Accept a request to index each SPA page.

Both confirmation and acceptance will be obtained by using Google URL Inspection Tool which is a part of GSC.

Perform the following steps:

The last 3 steps will have to be repeated for all the pages of each SPA.

Follow-up

You can use the “URL Inspection” menu to monitor if the page was indexed. It can take from a few days to a couple of weeks for the page to be added to Google index. At which time the response will state: “URL is on Google”. When that happens, you can double-check that the page was indexed by performing Google search using the site: keyword.

Finally, it will take up to another week for the indexed pages to appear in the indexing report under the GSC “Coverage” menu.

Under the Hood

Let’s dissect what the Cloudflare Worker does and find out when & why is it needed.

Plain SPA

Google has no trouble with rendering a SPA presented by a typical HTML file that has a nearly empty <body> element with references to scripts only. This HTML markup makes sense for CSR since the HTML and the DOM are generated at run time in the browser’s memory. You can observe such a markup on the demo websites mentioned at the beginning of the article, just jump to the second SPA and right-click on the page to ‘View page source’.

The page that belongs to the second SPA is indexed. It suggests the concern about the alleged fundamental troubles that Google has with an ‘empty’ SPA page has no merit. The only difference is the extra ‘Rendering’ stage in the indexing pipeline.

Also when you click on the "TEST LIVE URL" link in GSC to inspect that particular page, the screenshot of the page appears on the right pane. It proves the scripts were duly run resulting in proper rendering of the ‘empty’ page.

Selectively Prerendered SPA

Now let’s switch from a plain SPA to the one with the landing page prerendered. Crisp React builds such SPA to combine landing page prerendering with CSR for other pages. This is done to achieve performance.

Once the landing page is prerendered, the SEO troubles start. The indexing pipeline sees the HTML markup generated by prerendering and decides that it can optimise the scripts out. It assumes there is no need to run any client-side scripts because the markup, and therefore the page content, is already there.

No client scripts execution amounts to having no CSR and no SPA page switching. As a result, GSC reports under the “Coverage” menu that all the non-landing pages are treated as duplicates of the landing page and not indexed. Despite of all the pages having different content.

The remedy was simple. After HTMLRewriter in Cloudflare Worker was used to strip out the prerendered HTML markup, all internal (e.g. non-landing) SPA pages were indexed.

An attentive reader can ask if this amounts to website cloaking. The answer is No. Google provides definitions for ‘hybrid rendering’ and ‘dynamic rendering’. Both result in Googlebot and users receiving different content which effectively is a cloaking, albeit permitted by Google.

In our case the rendering stage of Google indexing pipeline sees what is rendered by the script whereas users are presented with the output of React renderToString function which yields the same HTML markup and page content.

Cloudflare Worker Functionality

The Worker is used to implement the following features:

Conclusion

Hopefully, this article sheds some light on the SPA & SEO topic. As you could see, the overall picture is quite bright and certainly not as bleak as frequently painted.

Getting your website indexed is only the first, though critical, stretch of the long SEO road. It’s highly recommended to ensure your SPA includes meaningful Structured Data. Crisp React assists with that by providing static and dynamic placeholders you can replace with data that contributes to the SEO of your website.

Thank you for reading!