Understanding Mobile-First Indexing (2/3): The Long-Term Impact on SEOJanuary 29, 2017
By: Cindy Krum
Most of us in the digital space have heard the statistic that 90% of the world’s information was created in the past two years. It is a great illustration of the immense challenge that Google is facing to pursue their goal of cataloging all the world’s information. They have made significant progress, but the definition of ‘information’ is expanding. ‘Information’ according to Google, now includes songs, movies, TV shows, apps, recipes, books and just about anything else you can think of. Google wants to know, not just that the information exist, but exactly what it is and how to access and present that information on a variety of devices and in a variety of formats, not just limited to visual presentation in a browser.
Suffice it to say, crawling the web to index and rank it is getting to be a much bigger task. When companies add new content, they don’t remove old content. Beyond that, Google has conditioned us to be wary of ever removing content from our websites, presumably because maintaining an archive is best for users, but also incase the old content has links, social shares or other signals that are helping it continue to drive traffic. To keep up with the ever-increasing amount of digital content that Google would like to organize, they will have to make their process more efficient, and limit their algorithmic evaluation-set more stringently. They have already indicated a strong preference for sorting signals like Schema and other micro-formats because it simplifies the crawling, decreases the algorithmic effort and minimizes overhead that Google needs to continue indexing and ranking content. Now, with Mobile-First Indexing, factors like these will become even more important.
This article is the second part of a three-part series about Mobile-First Indexing. The first article focused on providing a simple and pragmatic interpretation of what Mobile-First Indexing will mean in the immediate term, and how webmasters and SEO’s can update their existing websites to protect against any negative impacts of Mobile-First Indexing. This article will go deeper, and focus on more theoretical concerns; It will detail the reasons Mobile-First Indexing is necessary and valuable for Google, how Google continues to push companies towards a new rubric of ranking signals and finally, the important role that the cloud will play in the future of SEO. The final article in this series will give specific use-cases and information about the various Mobile-First development options that Google has been advocating, and favoring its mobile search results. It will outline the pros and cons of each, and detail when and how they can be used to their maximum benefit.
Loss of the URL as The Foundation of Indexing
Historically SEO’s have talked about indexing in somewhat binary terms – Content was in the index, or it was not; (or, at one time, content was in the mobile index, the desktop index or neither). In SEO audits, technical problems may arise, when too many or too few pages are admitted to the index; Again, a question of what is or is not in the index. As SEO’s we rarely have had the occasion to question how the index worked, until now. The change that Google is describing in the shift to Mobile-First Indexing is actually not how things are admitted or prevented from inclusion in Google’s index, but instead, how things are organized within the index. Assuming this is the case, Google is using the word ‘index’ to mean ‘organize’ rather than simply ‘identify,’ so this change could be even more significant than most SEOs realize.
Mobile-First Indexing alludes to a future that is less dependant on URL’s as the organizing mechanism for Google’s index. You must understand that indexes are essentially just databases. Before everything was digitized, the phone book was an alphabetical index of people, and the white pages was alphabetical index of companies, pre-sorted by category. Similarly, the dewey-decimal system was an index of books that were present in a library, ordered numerically. Books could be in the index, not in the index or indexed incorrectly. What is important, is that indexes are not free-for-alls; they have a unifying organizational principle based on an element that is extracted from a larger set of data, like a name, title, category or a numeric representation.
Google has used URLS and URL structure, along with metadata and links to organize content in their index which is why SEO’s have always operated under the maxum, “one url for every piece of content”. Google has always been an ‘internet search engine’ and the internet is mostly consumed through web browsers that rely on URLs, but this is all changing. The internet is actually much larger, and contains much more information than can not be presented in a browser. Huge amounts of data and information that is not HTML formatted is processed in the background of the Internet. This type of information is becoming critical for the Internet of Things (IoT) and Big Data style calculations. It is accessible only through API’s and direct access to the databases, and Google wants to be able to leverage this information in their algorithm.
Beyond that, mobile operating systems (OS) and the browsers are getting less distinct. Both Spotlight Search and Google Now on Tap are aspectos of the mobile OS that can search and surface content from the web and apps. In the case of Google Now on Tap, it appears that content is provided in feeds and APIs, without necessarily including a web page or URL. Once the URL requirement is removed, content from apps can compete on an even playing ground with websites, which makes for a much better experience that has much more flexibility in terms of how and where information can be presented to a user. A Mobile-First Indexing mentality allows Google to further distance rankings from simple URLs and links, and focus more on things important to mobile experiences, like speed, rendering and engagement.
Many of the newest mobile-oriented development techniques that Google has been advocating actually muddle or de-emphasize the importance of urls, site structure and links. Things like native apps, web apps and PWAs and AMP all obscure Google’s access to URL’s and link data. AMP content does not have traditional URLs, but instead, lives on a url that Google generates and hosts, and Android Instant Apps, which just came out of beta, is expected to be the same. In app indexing, deep-linked URI’s are basically just bookmarks in a user-flow of an app. Similarly, PWA’s leverage on-going communication with the server to deliver content when it is requested, and different URLs are not needed to trigger different content.
For years Google has been trying to distance rankings from the link-economy that it created, and now, they seem to be actively trying to stop SEOs and webmasters, (or maybe themselves) from relying on URLs as the primary organizing principle in their index. Instead, Google has probably begun associating specific indexable pieces of content with signals like Schema, on-page structured markup and XML feeds.
Schema, Markup & XML Feeds
Schema and nested schema have been part of Google’s top SEO recommendations for a number of years, because they help provide a concrete and easy-to-crawl entity-understanding of the content on the page. Since Google the launch of the Hummingbird update, which focused on entities, voice-search and semantic understanding, Google has not communicated as actively about entity search, but with the connected home, it is going to become vital again. Schema is much easier for Google to crawl and understand than regular HTML, and with the transition to JSON-LD, it has become even faster because it is separate from the page code and available directly from the server. Google is so interested in Schema that they now are even requesting that certain kinds of Schema be added in app markup. This on its own, is quite telling.
The shift from JSON to JSON-LD is important for Google’s larger understanding of the world. The ‘LD’ in ‘JSON-LD’ stands for ‘Linked Data’ because JSON-LD is not about individual pieces of metadata, but instead, it is about metadata in the context of other metadata. JSON-LD.org explains that: “Linked Data empowers people that publish and use information on the Web. It is a way to create a network of standards-based, machine-readable data across Web sites. It allows an application to start at one piece of Linked Data, and follow embedded links to other pieces of Linked Data that are hosted on different sites across the Web.” This is how Google plans on building deeper understanding of content and content relationships without relying on links and URLs.
Google has been directly asking for information feeds from a larger number of sources. Most companies are happy to comply, because getting their feed of information directly to Google has direct benefits, like AccuWeather’s visual weather information that shows up at the top of a weather related search result, or the flights and hotels that Google easily surfaces in their aggregation engine, or any of the million products that Google includes in Google Shopping PLAs. These feeds give Google exactly the information they need, in exactly the format that they want it so that it can be added directly to your database and surfaced appropriately. Results from these sources do seem to be shown favor in mobile search results – because they provide a good user experience, but also because they are easy for Google to parse and display quickly.
Many SEOs may not realize that Google also uses XML feeds and JSON-LD data to ingest the list deep-link maps for app indexing, to understand Schema relationships on a large detailed website, and to understand things like sports scores, news, recipes and movie times. Methods like this are superior to old fashioned crawling because they are so much more efficient. In both, webmasters essentially connect your database to a Google API.
Mobile-First Really Means Cloud First
Google will also rely heavily on their own cloud hosting to facilitate Mobile-First Indexing. They are calling this change ‘Mobile-First’ Indexing, but that is probably a misnomer. Ostensibly, it is more accurately described as ‘cross-device-first’ or even ‘cloud-first’ indexing in the long-run. Nearly every new Google announcement seems to be another front or ploy to get webmasters to host their content on Google servers. Even Google Play is actively encouraging Android app developers to test hosting their apps on the Google Cloud Platform, so that users can benefit from a speedier native app experience, as shown at the right. While this is optional now, you can expect cloud-hosting with Google to be heavily incentivised or even required in the future.
Google will push more and more assets into their cloud, because it allows them decrease their reliance on crawling, and increase their understanding how users engage with the content over time. In the cloud, without reliance on URLs for indexing, indexiable content can be text, video, audio, image or anything else; a concept brilliantly illustrated by Emily’s tweet above. As the cost of cloud hosting continues to decrease, and the amount that can be stored continues to increase, the benefits Google can expect from pushing developers and webmasters to host content on their cloud become apparent. Cloud hosting content will dramatically improve Google’s ability manage their own effectiveness and minimize their business overhead, but it also feeds into the Big-Data mentality that they love, and that they can profit from dramatically in the long run. Here are the concepts related to Google Cloud Hosting to consider:
Less Crawling: Hosting is much more efficient and effective than crawling. Crawling might be the least effective and most complex aspect of ranking the web, and it is difficult to scale. If Google hosts your content, they don’t have to crawl it as aggressively, and because they know immediately when you make updates or upload new content. They can also use the frequency of content requests on their system to know what is popular and what is not. This is a better measure of engagement than links.
More Efficient Data Collection & Presentation: When all your content lives in Google’s cloud, it can be super-fast all the time. They use their own compression and caching algorithms, they can also detect the speed of the device requesting content, and adapt what they send, to suit the speed of the network connection and device that they are on, so content work across different devices and operating systems without hassle.
The speedier presentation that Google gets from webmasters cloud-hosting with them will benefit Google users when they access the cloud-hosted content, but it will also benefit Google. As shown in the diagram below, Google prefers to include rich visuals when they can directly in the SERPs. They are especially important for improving the mobile search user experience and driving engagement in search results. When Google hosts content, it also makes it faster and easier for Google to present your content directly in SERPs like this. Google already incentivises this kind of cooperation with top rankings, especially in mobile search results.
Flexibility of Content: Part of making content indexable in a cross-device or cloud-first world is a deeper separation for content from its ‘intended-device presentation layer’ making it device-agnostic and even potentially format-agnostic too. When clean, unencumbered content can be saved in a web-hosted database, developers can focus on creating the custom interfaces for a variety of potential presentation-devices and formats without having to constantly replicate the content for each device – It is a bit like taking the concept of separating content from design with CSS further, to include format and functionality instead of just the visual cues for presentation.
If Google had to crawl and index content that was tied to specific devices over and over again for each device, indexing and ranking it all, it would be a nightmare. Cloud hosting agnostic content is much more scalable for Google’s indexing resources, but also development resources within companies. Google will need to know what content to rank well for each device, but the as the number of potential searching devices grows, so does the number of potential use cases.
Understanding the Indexing of the Future
The difficulty with disconnecting content from a corresponding URL is the loss of a unique identifier for the index. URLs on the web work like product SKUs in an inventory database – There is a one to one correlation. So, how will Google manage their index without URLs? The best answer is most-likely a relational database with entity understanding and AI. Google will still have the URLs that it currently has in the index, and they have no-doubt, already used those to begin creating an entity-understanding for all the companies and content in the current web index. They can merge with Schema from the web and other information that they have from content they host on their cloud hosting platform, and then the will have a lot. From there, Google can structure its understanding based what it knows about the world in general, by leveraging Freebase/WikiData, which they technically ‘retired’ a few years ago, but may have actually just been re-purposed. Maybe now, instead of being built out by human editors, it was built-out by Rank-Brain.
There may already be indications of this popping into Google’s search suggest, as they try to disambiguate a broad query that could mean two different things. As you can see below, Google believes that the query for ‘Bread’ could be about two very different topics: Food or a Band. You know this is their understanding of the word because you can see the disambiguation suggestions directly at the top of the Search Suggest options. What you have to understand is that each of these disambiguation options is there because they both are attributed with very specific information in Google’s Knowledge Graph. Bread the food is associated with certain ingredients, recipes, images and calorie counts and the like, whereas Bread the band is associated with songs, tour dates and very different-looking pictures. (Perhaps part of Google’s image recognition engine was to help sort pictures for queries like this!)
The internet is changing. As technology expands and improves it is becoming more and more invisible. As more of our daily devices go online, they become less reliant on screens, browsers and direct-entry keyboards, and instead are all operated remotely from the cloud. Both Amazon Echo and Google Home let you control web-connected elements of your home and perform simple searches using just your voice, and they respond based on streams of data from the cloud. These devices can and do operate without the use of URLs, so Google must also begin to operate in a more presentation-agnostic way. Our devices are moving away from needing browsers, so Google’s index should not be organized based on URLs.
With structured data, especially in JSON-LD, XML feeds and API’s, Google is building a strong understanding of the world that is less reliant on URLS and links for organization and evaluation of the data. Beyond that, Google will continue pushing developers and SEO’s to leverage their cloud-hosting services often for free, because of it offers such a significant benefit to their ability to index their content. With more content in Google’s cloud, Google will always know when content is updated, because their server-logs will show it and automatically trigger a re-crawl. Hosting the content also allows Google to understand more about the user engagement; when they host all of the app information, they can know exactly what content was requested from the server, even if there was no new page URL requested, as might happen in a PWA. This is deeper engagement data than Google has ever had with web content before!
This article and the one before it have dissected what Google’s move towards Mobile-First indexing will mean to the practice of SEO in the short and long term. The next article in the series will outline a number of new development options that are best suited for Mobile-First Indexing, and why. It will also describe how and when they should be used, and what steps to take to build them for future evolution of Mobile-First Indexing.