The SEO community is clambering to get their collective head around an unprecedented leak of Google’s algorithm. Through this leak, SEO agencies and consultants have been given a rare opportunity to see the inner workings of Google Algorithm. This leak not only validates many long-standing theories, championed by SEO professionals and often denied by Googlers, but also provides new insights that could significantly impact SEO strategies.
The Leak
The leak itself was discovered on GitHub. Detailed documentation was distributed under the Apache 2.0 license, meaning anyone who downloaded it has the legal right to examine it. This data, confirmed to be no older than nine months, offers a snapshot of Google’s algorithm as it stood in August 2023 – although some speculate it could be as recent as March 2024. Out of the 14,000 attributes listed in the leak, approximately 7,300 are directly related to Google Search, with the remainder connected to other Google services like Maps and YouTube.
What This Means for SEO Professionals
The confirmation of several key attributes is a game-changer for SEO professionals. We’ve treated many of these concepts as gospel for years, based on our own experience and evidence shared within the industry. Google, however, has flat out denied some of these claims and yet we can see bullet-proof evidence to the contrary listed in these docs.
Site Authority and Domain Authority
These attributes confirm that having a strong, authoritative website does positively influence search rankings. Specifically, a site-wide score that has a knock-on benefit to specific pages. Google’s engineers have long denied this, but the leak confirms what many in the SEO world have believed for years that your site’s authority can help you rank higher.
Toxic Links and Penalties
The debate over toxic links has also been settled. The leak confirms that bad backlinks can indeed harm your site’s rankings. This includes a page-level Penguin penalty, which reinforces the importance of ensuring your backlink profile is clean.
Google Sandbox
For a long time, there has been speculation about the existence of the Google Sandbox – a sort of probationary period for new websites. The leaked documents confirm its existence, showing that new domains, especially those repurposed from expired domains, are more likely to be affected. This sandbox effect serves as a filter to prevent spammy or harmful sites from ranking too quickly.
Freshness as a Ranking Signal
While Google has downplayed the importance of content freshness, the leak indicates that it does play a role, particularly for current events and trending topics. However, it doesn’t appear to be the most sensitive ranking factor, meaning it may not drastically affect rankings across the board.
Trust Signals
Trust signals, such as links from reputable sources, are crucial. The documents highlight that Google uses these signals to protect against spam and ensure the credibility of content. This is especially important for sites in YMYL (Your Money or Your Life) niches, where trust is paramount.
Big Brands vs Small Sites
We’ve talked a lot about big brand signals over the years and it does make sense for Google to profile larger brands and use them as a measure for trust. The documentation has indicated that smaller personal sites may be demoted in rankings, with bias given to larger brands or very high authority sites, despite specialised knowledge or better quality content being present on smaller publishers.
Contextual and Modular Insights
One of the more fascinating revelations is how Google uses modular and contextual analysis to determine the relevance and importance of attributes. For instance, the context surrounding anchor text is crucial. Google tracks changes in anchor text, looking for signs of manipulation, which can be a negative ranking signal.
User-Generated Content (UGC)
The documents reveal that UGC signals have been significantly amplified. This might explain the surge in traffic to sites like Reddit, which have seen massive increases in organic traffic.
User Behaviour Data
The leak has revealed that Google’s use of click data and Chrome data as ranking factors and confirms long-held suspicions – particularly the idea of click-through rate and dwell time as indicators to Google. The documents show that Google Search uses Chrome data to understand user interactions and preferences. This highlights the importance of creating content that not only attracts clicks but also retains user interest, and the role of dwell time in general. Google tracks up to 13 months of click data on websites and uses this to understand user preferences and behaviour, influencing rankings accordingly. High dwell time indicates users find the content engaging and valuable, positively impacting rankings. Low dwell time can signal to Google that the content may not be meeting user expectations, potentially leading to lower rankings.
Entity-Based Ranking
Google’s emphasis on entities—topics, people, places, and things—has been reinforced by the leak. The algorithm uses entity data to build a comprehensive understanding of content, making it essential for SEO professionals to focus on entity optimisation. This is a huge confirmation for concepts like topical hierarchies or authority, semantic SEO and entity based SEO tactics.
Navigating the Future of SEO
While this leak provides a wealth of information, it’s crucial to approach it with caution. The absence of a direct scoring system means we can’t definitively rank the importance of each attribute. However, this glimpse into Google’s algorithm allows SEO agencies to refine their strategies and validate their approaches with greater confidence.
Additional Insights from Industry Experts
Rand Fishkin originally broke the story. For those that don’t know or can’t remember, Rand is the founder of Moz and creator of the Domain Authority measure. He has stated that the leak has been confirmed by several sources to be legitimate, though it’s always wise to take such information with a pinch of salt. Google have since confirmed the leak themselves but massively downplaying its significance. The documentation reveals over 2,596 modules with 14,014 attributes, highlighting the complexity of Google’s ranking system.
Rand’s analysis highlighted the “UGC effort score,” which explains why forums and user-generated content sites like Reddit and Quora have seen increased traffic after recent updates. This score is a page quality signal, emphasising the importance of user engagement and perhaps indicates the information is much more recent in line with these ranking shifts.
Another significant element shared by Rand is the presence of multiple variations of PageRank and the concept of “rank boost,” which tracks up to 13 months of click data. As above, this means that Google values user interactions and engagement over time.
Rand hasn’t been active in the SEO industry for many years, since the sale of Moz. However, he is still held in high regard by many SEO professionals, including myself. This leak has served to legitimise many concepts that he’s pushed over the years. In some cases where he’s received criticism or skepticism, particularly from Google, over his theories, the leak has validated his past knowledge and research.
Conclusion
This accidental leak has shook the SEO world. It confirms many theories and provides new insights that can help shape future SEO strategies. While we should proceed with caution as to the weighting of these factors there’s one major element that has stood out – we cannot trust the word of Google representatives. It makes sense, Google’s relationship with the SEO industry is a strange one and I’m sure their preference would be to eliminate SEO altogether. It’s clear from the leaked documentation that many theories that have been “debunked” or “downplayed” or out right denied over the years, are in fact true.
For further insights and detailed analysis, feel free to reach out to our team.