The Leaked Google Documents: What Could They Say About Content?

The search marketing community has been on fire since Rand Fishkin, SEO legend and co-founder of Moz and SparkToro, published the leaked API documentation for Google’s Content Cloud Warehouse. The documentation contains 14,014 attributes (API features) that SEOs believe could be beneficial for search ranking. According to Rand, he received the leaked documents from an anonymous source who has now come forward as Erfan Azimi, a search marketer and founder of EA Eagle Digital. 

The leaked documents suggest many internal systems and signals, which SEOs have always believed the search giants employed in their ranking algorithms over the years. These include Chrome user data, click-through rates, sandboxes, and more, all of which Google has publicly denied using. 

As expected, there were questions about the legitimacy of the documentation. So, Rand, Azimi, and Mike King, an SEO expert and the founder of iPullRank, reached out to their network of ex-Google employees to verify the authenticity of the documents and confirmed them to be legitimate internal documents from Google. 

The documentation is actually a repository of information for Google employees, outlining variables or attributes, their functions, and how to work with them. Nothing in the documentation shows how any of the attributes are used in the search ranking system. There’s also no proof of the elements being used currently, as some content of the documents has been deprecated. Still, there’s a treasure trove of insights on the elements and data points that Google collects, has collected at some point, and considers important.

Meanwhile, in a statement to Search Engine Land, Google confirmed that its internal documents have actually leaked. But, they’ve warned us against making incorrect assumptions about search based on out-of-context, outdated, or incomplete information. 

Still, many SEOs, like Mike, who has extensively analyzed the leaked documents, believe that Google has not been transparent about what it uses to rank content in the SERPs. 

Whether or not Google has been telling the truth is not our headache today. Instead, we are more concerned with the parts of the leaked document that could shed light on the critical ranking signals and features related to content. We’ve been digging through the documentation (hats off to Dixon for making the attributes searchable over here), and here, we’ll distil what we believe the leak reveals about content and how you could possibly apply them to your SEO content strategy.

What The Leak Reveals About Content

Before we get into our findings, it’s important to reiterate that nothing in the leaked documentation is definitive. So, you should take EVERYTHING you see about the Google leak with a healthy dose of scepticism. SEOs see what they want to see, so we can’t say for sure how Google is using any of these data points. But if Google is looking at, storing and describing these features in its API docs, it should be things we should be aware of and at least consider as part of our content optimization efforts.

Google Associates Content With Topics and Entities

As we’ve said many times on the InLinks blog (and should not come as a surprise to our users), the search giant has been analyzing and organizing the world’s information through entities (objects in a database) since it acquired Metaweb in 2011. 

Much of the document mentions how Google looks at topics as representing a knowledge graph entity. 

There are mentions of Google looking out for a focus entity, other entities mentioned in a document and their connectedness when analyzing queries. It also recognizes that multiple entities can be found in a given document.

There are also mentions of a topicality score that represents how connected entities are within your content and serves as a relative ranking signal between different documents for an entity.

Google storing and calculating these values align with data supporting an entity-centric content strategy, the main foundation InLinks is built on. If you’re not already optimizing your content for entities, start today by digesting our guide on entity SEO and learning how to know the right entities to target for your content.

The Search Engine Looks At Topical Clusters  

Attributes such as tundraClusterId and onsiteProminence could be indicative of how Google categorizes websites and their pages as a whole, which might affect how individual pages are ranked based on their association with broader site clusters. This suggests that instead of creating unrelated content assets, continuing to build topical clusters that tell a cohesive story of what your website is about is the right way to create content.

Content Freshness

Many attributes, like freshboxArticleScores and lastSignificantUpdate, bylineDate, indicate how Google looks at and could score fresh content. These attributes suggest that regularly updated articles and blog posts might be viewed as better quality content and might perform better in search. 

The takeaway is that SEOs should prioritize content updates, as Google might be paying attention to fresh content. Remember that every content asset, no matter how good, may have a helpful lifespan (to the user). So, SEOs should build processes that frequently audit the content library, refresh old content, prune irrelevant pieces and maintain high-quality standards throughout the site. 

Relevance Remains The Name of The Game

Google is likely placing significant emphasis on whether or not its quality raters can understand your content.

Attributes like relevanceScore suggest that a signal called relevance score is associated with content based on its topicality.

This suggests that you should focus on producing clear, helpful content around each topic and using schema-structured data to enhance the content in a machine-readable format that search engine bots can easily understand.

Content Authorship

The documentation indicates that Google collects author information for every piece of content and checks if the entity mentioned on the page is known and the author of the content. It does this for content types other than news articles, such as scientists, doctors, and contributors to scientific papers. 

Although this is far from conclusive or reinforcing into Google’s E-E-A-T recommendations, it does show that authority is a value that appears to be important in news, YMYL and evidence-based content. Hence, you should endeavour to use authors with demonstrated expertise in the topics you publish across the web.

Pay Attention to Content Engagement

In the document, signals like impressions, last longest click, good click, bad click, unicorn clicks, and unsquashed clicks are all considered metrics that NavBoost (one of Google’s ranking systems uncovered during Google’s antitrust trial with the Department of Justice) uses. 

This could mean that engagement metrics are also considered critical in ranking documents. Therefore, beyond creating great content, paying attention to how users behave when they visit your website is worthwhile. Ensure your site is optimized for mobile devices. Remove intrusive ads, maintain proper formatting, present the information the reader came for, and put effort into delivering an excellent experience to increase the time users spend on your site. 

All of these (and more signals aligned with user experience) may contribute to your site’s measure of helpfulness, ultimately being appreciated by your users and rewarded by Google. 

What Do The Leaked Documents Change About SEO?

Absolutely nothing! Nothing in the documents is groundbreaking or different from what data-led SEOs have practised over the years. If anything, it confirms that we are headed in the right direction. Although the lexicon of ‘how’ has changed, the fundamental principles for search engine ranking have remained the same:

  • Prioritize high-quality and helpful content that contributes something different to search
  • Build good websites that users and search engines can navigate 
  • Use entities to connect your content to Google’s knowledge graph and schema markup to enhance the search engine’s understanding of it
  • Build quality links from contextually relevant sites 
  • And continue to practice sound technical and on-page SEO.

If you’ve been adhering to all these, Google will likely reward your site no matter how many times it updates its algorithm. 

Share this entry



0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *