Our Semantic Search Guide continually evolves. You are welcome to suggest edits and submit content for consideration.

Internal Link Audits

A well-crafted internal link structure improves your chances of your content being seen. At the right place and at the right time. Your internal links structure is also known as your “Link Graph” and there are three core elements.

  1. Your Navigational or menu link structure,
  2. your breadcrumb link structure and –
  3. The most important in today’s search algorithms is the internal link structures in the body of the text.

The way in which your main content connects to other content on your websites can have a profound effect on both Google and users alike. In this Internal link audit “how-to” guide, we’ll dive straight into internal “body text” links first. Auditing these is the most complex problem to solve. We will also cover Navigational menus and breadcrumbs at the end for more basic SEO.

A recent study found that website owners missed more than 80% of the link opportunities.

Even before Google migrated its ideas toward Semantic search, links acted as important signposts for search algorithms. You can really understand the importance of links for Google’s Pagerank algorithm here. There are a few points of note about PageRank which are worth noting, though.

First, PageRank was calculated at the page level, not the domain level. This means that internal links play a big part in determining the strength of the page in terms of Pagerank. Second, PageRank in its purest form has no context. A link should only have an effect on a search algorithm if it adds to the context in which it exists. Google did talk about “Topical PageRank”, although was not explicit at the time about the way it implemented it. One paper on Topical PageRank from Cambridge University shows how this works.

For search, the presence of links in a document collection adds valuable information over that contained in the text of the documents alone.

Jardine & Teufel

There are also important reasons why internal links are relevant in the world of semantic search. By linking text closely to content about the entity, you are making life much easier for a reader to understand the meaning of an article and – just as importantly – you are helping Google and other search engines derive the meaning of your content. For example, if you talk about “Queen” on a page, are you talking about a band, a monarch, or a lifestyle choice? By linking to an article of content that has schema around this context, machines can readily identify the nature of the relationship between the two pieces of content.

In “15 advantages of using Internal Link Building for SEO“, Fred Laurent makes the argument for internal links compelling.

Once your site relies on content, internal links are as essential to your visibility as external links, to:

Increase the number of long-tail keywords
Better respond to users’ queries
And ultimately, increase your visibility and your organic traffic

Fred Laurent

Combined, Ranking pages in the SERPs is much more effective for any search engine if the Internal Links are taken into account as a major ranking factor. The challenge is to be able to see what the machines see, as they run algorithms across all the pages – with those linking to any given target page having more ranking and contextual relevance than those pages several clicks away. This is really the only reason that the Home Page carries the most weight in search. Usually, the home page is accessible from every other page on the site and therefore becomes the most important.

The Principal Strategies behind an Internal Linking Audit

The main idea behind an Internal link audit is to increase the “Contextual Relevance” of all internal links, such that it is abundantly clear to any human and search engine alike, where the authority for any important topic is on your site. That is to say – for every “Head term”, there should be a clear and agreed target page and all other significant mentions of that topic should always link to that headline page. These pages are generally called “Pillar pages”, “Target Pages” or “Cornerstone Content” depending on the SEO or technology you speak used. This means that when a blog post mentions an important idea, a link should exist within the text that talks about that idea.

Traditionally, SEOs have only focussed on the anchor text, but the surrounding text also gives a link context. It is the underlying meaning that is important. Try not to rely on told that only generate exact match anchor text and audit the site to ensure this has not happened to an extreme in the past.

An exception to the idea that topics should link to their Pillar Page is when you have a more targeted page with a long tail concept that is more appropriate. Links should be as specific as possible and often it is the wording of the outgoing page that will reflect the most targeted link candidate. this is perhaps the hardest part of a link audit. The general strategy is to create a hierarchy of pages around a topic or idea. For example, you may have a headline target of “SEO” but then “On-Page SEO” could be a major pillar of the SEO group (sometimes called a silo). Another might be Site speed and another might be Backlinks. Grouping these pages makes sense, but struct silos are not always a good idea.

A strict silo can make sure that you ONLY link pages within a silo to other pages within the same silo. This is rarely the best way to freely share concepts and topics around a site, although hard walls like pages in different languages, or hotels in different cities may well be a good reason to recommend struct silos.

In addition to strategic considerations, there are also a few other elements that need to be taken into account when conducting an Internal link audit, They are:

  1. Listing internal links that result in 404 pages and
  2. Listing links that redirect and in a similar vein
  3. Look at links that do not link to the correct canonical URL

Whatever strategy suits your audit, the hardest task is viewing your link graph. So we will cover tools to do this next.

Viewing your Internal links structure can be achieved with a number of tools. Without prejudice or preference, here are a few.

1. OnCrawl

OnCrawl is an enterprise-level crawler that offers some sophisticated Internal link analyses. In particular, Oncrawl has a metric called “InRank” which they use as a proxy for Google’s PageRank measurement, specifically for the internal links within the site.

OnCrawl shows Internal Link flows

2: Sitebulb

I find that Sitebulb has a great many ways to visualize Internal links. These make it very easy to not only visualize internal links but also to see where questionable internal links are diluting focus or (more likely) where internal pages are not linking to each other and should be.

A Crawl Tree visualization by SiteBulb
Alternative Visualizations in SiteBulb

Recently, my previous company, Majestic, came out with a brand new way to visualize links on a page.

This new visualization shows how links are balanced on a web page. Internal links are in blue and external links are in orange. Each page is segmented into 40 sections, allowing you to see where the links are on the page.

You can see the overall look of the page and see which links are in the body and which are in the navigation. On the downside, Majestic does not render Javascript links at the moment. Also, this visualization really looks at the links out of a page, rather than the links into the page.

4: Screaming Frog

Every SEO’s go-to tool. Screaming frog lets you crawl any website. In doing so it tracks all the internal links that it finds and allows you to sort. The graphic above shows you how to see all the Internal links into a given page in one place. (The next tab also shows the outbound links from the same page.) Unfortunately, this does not separate out the body text links.

Define and Agree on the “Pillar Pages”

Your Internal Link audit will only be strategic if you first agree on the most important topics for your business. The ones that you want your business to rank for or be seen as an expert in. You should then select ONE pillar page for each of these main topics. If you can get to a site with one pillar page per topic, your internal link strategy will be cleaner. Internal Link Audits are a measure of how effectively the site is linking topics in the text through to the related pillar pages.

One mistake that sites make is to create automated content which tries to create a page for almost every keyword variation. This is common when trying to cover off (say) a trade for every town and city in a country. In this event, each city is not really a pillar page! You could set the target for each of these pages to be the town or city in question, but you may not mention that town anywhere else on the website. On the other hand, clever use of maps might be able to show “nearby stores” for each town. This kind of tactic, though is not further considered in this guide.

Once the pillar pages have been agreed upon, the contextual link audit has to centre around what percentage of possible internal links have already been created and how many are still to do. Because of this, we have developed an easy-to-understand new metric, specifically designed for Internal Link Audits, called “Internal Linking Score”.

Introducing “Internal Linking Score”

Given the 2022 study that the vast majority of internal linking opportunities are missed, we have developed an Internal Linking Score algorithm:

Mathematical equation for internal linking
Internal linking score equation

For the link audit, the challenge is to find the link opportunities in the first place. This is the “gold” within a link audit. Whilst removing links or checking for broken links is interesting as part of an audit, it is a list of link recommendations that will have an immediate and actionable use.

Now that we have a methodology for scoring a site’s Internal Linking, and tools for visualizing internal links, the missing part of the audit is finding a list of internal links that could be added to the website’s content. These are known as “link opportunities”.

A good tactic for finding link opportunities is to use Google search itself. Before starting, it is assumed that there is already an agreed list of key topics the client wishes to rank for and that these topics have clearly defined target pages. Then for each phrase, do the following:

Search for [Keyword] site:sitename.com on Google. The site: command restricts the search to your client site. The target page really SHOULD already come to the top of the list, but if it doesn’t, then make the client aware that the target page should either be a different page or that the content on the target page needs to become more relevant to the topic.

Look at the remaining pages, in the priority on the screen, and find the keyword on the page (I use CTRL-F in my browser in Windows). Do not just link the keyword. Instead, find a suitable text and only if it seems appropriate from a user’s perspective reading that page. As this is an audit, I would suggest making a spreadsheet with the following columns:

  • Target URL
  • Title of Target Page
  • Target Topic [keyword]
  • Source URL
  • Title of Source Page
  • Anchor text you propose to use for the link

Make use of our Entity- Based internal link diversity checker

InLinks now allows you to check how good your existing internal linking structure is. In our newest update, we have brought out a feature that compares the added js code links from InLinks to the hard-coded ones already on your site.

Inlinks has its own NLP, informed by a knowledge graph built from the bottom up. This means that the software reads your content and summarizes it based on entities. As it already understands where your existing internal linking anchor texts are in order to avoid duplicating any work already done, it can figure out how well you are internally linking based on the entities you are targeting to a page.

For example, suppose you are offering a service and have one page dedicated to cost, InLinks will be able to find many different synonyms for this concept and take into account how well you are internally linking to this page. ‘How well’ here refers to anchor text diversity and accuracy.

How can InLinks help?

You now can see how your existing hardcoded links relate to entities, to get a better view of where you are starting in your journey to topical authority and great internal linking.

After targeting your entities, you can enter the internal linking tab and find the data here.

InLinks flags entity-based link duplication. For example, if you are always hardcoding links with exact match anchor texts then you will easily be able to see and decide on how to diversify these.

Why are exact match anchor texts bad?

Well, they’re not so much bad as they are ineffective and old-fashioned. Internal linking is one of your most important SEO tools, so diversifying anchor texts keeps the reader involved and provides the search engines with far more information on the context/topical authority of a page.

An example of an internal link audit

 

Here is dixonjones.com. Dixon has targeted the concepts of the internet, link-building, PageRank, etc to each of his most important pages. The red dots next to the percentage indicate that more than 80% of his links (existing) are duplicated on the topics of the internet and link building. That means that 99% of all 143 hard-coded links to do with the internet are duplicated.

InLinks will try to dilute this by finding varied anchor texts and inserting them via the JS code. Even with the small number of pages I had brought in from dixonjones.net InLinks had found 8 varied anchor texts.

Having this audit feature will show you most importantly where your problem areas are. As you continue to grow your site and it is great to have an overview of undiversified, problematic entities.

How can I do this myself?

To get started, bring in all of your sites and start targeting topics to pages. We have plenty of support on how to do this on the Inlinks academy.

Once you are set up, head over to the internal linking tab and find this information on the right-hand side. Clicking on this will meant that you will be able to see the exact placement and text of the existing hard-coded links.

 

Other ways to find Link Opportunities

Tools do exist to help find link opportunities at scale. Many, however, try to look up exact anchor text or keyword text matches. These can prove very one-sided, as they tend to only find links with no context. They miss synonyms and tend to lack nuance. It is only more recently that semantic-based missing link tools have come into being.

Of course – it is not necessary to use a tool to create internal links. You can easily create internal links within your content to other pages on your site. However, you will not achieve scale and will not be able to easily recalculate and redistribute these internal links when content s updated. This will mean that you are likely to miss many internal link opportunities that may be open to you. That said, here is a simple step process for creating internal links within WordPress.

Step 1: Find pages on the site that discuss a particular topic

The best way to do this is to type in a keyword into Google followed by “site:yourdomain.com”. So, to find the pages on this site that might be appropriate for internal links for the TERM “Internal links” I would type this into google.

Step 2: Select your target page for your search term

Usually, the top result will be the page that you would want to have as your target page for the search term you have chosen because this is the one that Google already believes to be the most relevant. If you are writing new content, then, of course, you may choose the new page instead.

Step 3: Identify where other pages should link to the target page

You should insert the link somewhere around where the term is highlighted by Google in the search results. You do this by…

Step 4: Opening the page in edit mode in WordPress

In the introduction, we stated that we would also look at the Navigational menus and Breadcrumbs. Whilst these are very important, I left them to last because they are easier to visualize and understand.

Audit & Minimize Menu Blocks

Most of us tend to think of a website as having one menu structure, but most sites tend to have multiple menu blocks. One along the top is common, but there are usually several other menu blocks, as you can see from this example from the BBC News page.

There are at least 7 separate menus on this page

The question for the SEO Audit is when is it appropriate to include any particular menu block and when should it be omitted? In general, “Pillar” pages should seek to reduce the number of menu blocks, whilst “generic” content should be more liberal. This has the effect of making the pillar pages more focussed around their main topic because menus on other pages will link into the pillar page… giving context… but these pages do not reciprocate the menu link back.

Create Templates page styles in your CMS

Since different pages may have different menu clocks on a page, the Internal Link audit should recommend a number of templates that a page can have. These templates will use different menu structures, to help promote this strategy without the content writer needing to overly be concerned about the menu structure. For example:

A Default Post page structure can contain all the relevant menu bars. Unless there is an active reason to make the page a “target” page for SEO, then the more freely the ideas flow through a website, the better.

A Pillar-Post page structure would normally strip out most of the least important menus. A menu for other related pages to the main target topic may still be appropriate and a top-level menu is always useful, but perhaps relegate all other content to a search box. This keeps outbound links more related to the content on the page.

A Vanilla Post Structure with an absolute minimum of menu blocks may be useful if there is absolutely only one desirable call to action for the user.

Two approaches to Breadcrumbs

Breadcrumbs are a special menu type that helps a user to easily navigate up and down a topic funnel. Not all websites use them, but they can be helpful for SEO as they naturally link connected ideas together if constructed sensibly. The two main methodologies are “by category” and “by tagging”. If you have logical areas in your website, then category-based breadcrumbs are often simpler. Tags mean that the content writer (or you) will have to give every page a tag or set of tags that group the content into natural topics. Clicking on the tag (or crawling it with a bot) reveals a list of pages with that tag. Whichever approach is used, the best practice is to ensure that you only give each post one Category or tag and that you ALWAYS give a post a category or tag. Look for all the pages on the site which have been assigned the “default” category and make a list of any that have been incorrectly categorized or tagged. I bet there are a few! These should have the category changed, but if you use WordPress, this will also change the URL! (check for all other CMS systems you may encounter). You can use a plugin like “Yoast” or “Redirection” to manage 301 redirects when these are changed – or you can manually force the redirects using htaccess or Cloudflare or several other ways. The important result is that the old URL does not return a 404 after changing the category or tag. It should 301 redirect to the new page. There should not be two pages with the same content.

In Summary

We have looked at the reasons why Internal Link Audits can help the performance of your website. We have proposed a number of tools you can use to conduct internal link audits and listed some pros and cons of each. Your audit should cover:

Low-hanging fruit, including:

  • Removing dead links to 404 pages
  • Minimizing redirects and links to non-canonical versions of pages.

Looking at a gap analysis between existing internal links to pillar pages and potential (missing) links to pillar pages.

In order to quantify and evaluate this, we have introduced the Internal Linking Score.

Providing an overview of the navigational link structure seeing that it plays into natural product or service area groupings.

Providing an overview of the effect of any Breadcrumb links and whether they help to guide the users to the pillar pages in a consistent manner.

Next, read How to automate your internal linking
Or read the full Internal Links guide

 

Internal Linking Guide

Internal Linking is a skill that can levitate your site to new heights. This is a comprehensive guide on how to use links to optimize websites. It combines decades of expertise from the InLinks team in developing Internal link roadmaps and strategies.

95% of websites fail at internal linking.

A Study of over 5,000 websites.

This is the result of an analysis conducted on more than 5,000 websites across the globe and the reason we’ve built this guide.

Internal Linking Definition

Internal linking, as opposed to backlinking, is the art and science of interconnecting content within your web site for SEO.

Our definition of Internal Linking at Inlinks.

Even if you have doubts about Internal links, you might want to consider that Internal Linking from Wikipedia is cited as one reason that they do so well in the search engines, according to Lewandowski, D. and Spree, U., 2011. (Ranking of Wikipedia articles in search engines revisited: Fair ranking for reasonable quality? Journal of the American Society for Information Science and Technology62(1), pp.117-132.)

Internal Linking Guide Contents

First, let’s have a look at the main benefits.

Benefits of setting up an internal linking strategy

If you rely on content marketing to support the growth of your website traffic and rankings, then you will know that you need to produce well-written, high-quality posts to please Google.

And as the competition increases, the chances of ranking at the top of Google SERPs reduce. Good content is not enough. This is why you also need backlinks.

Implementing your strategy will help search engines understand the structure of your site. It will also show what your important pages are.

Here’s a summary of the benefits that internal linking provides to SEO:

SEO benefitExplainer
User experienceInternal links help your visitors to navigate through your content.
Link juice distributionIf you manage to get backlinks from other websites, internal links will help you distribute the link juice to your important pages
Time on siteAdding relevant internal links in your content will make your users more likely to discover other related content, improving session duration
PageviewsSince relevant inlinks will encourage your visitors to continue their visit, it will mechanically increase the number of page views per session
Crawl and indexingA good internal link profile will help Googlebot and search engines better understand your site architecture and help them discover new pages
Long-tail keywordsUsing keyword-rich anchor texts with synonyms and phrases will improve the number of keywords your website is ranking for
RankingsAs a consequence of the above benefits, your overall rankings will increase when you engage in internal linking optimization

Featured Resource: 15 benefits of internal linking

How do you build internal links?

There are mainly two types of internal links: navigational and contextual.

Navigational links are the website’s primary navigational structure, the ones you’ll find in the website’s main menu, in the sidebars and footer. You’ll find the same navigational links structure throughout the website most of the time. They’re mainly used to help users navigate to category pages or company information pages.

Contextual links (or editorial links) are embedded in a page’s body text. These links are very useful for SEO as they help PageRank circulate between your pages. Semantically relevant phrases around the link will convey better SEO juice to the target page.

As soon as you’ve set up your navigational links, you need to start building your contextual links. Here is the process to follow:

Step 1: Define your cornerstone content for a given keyword.

This is probably the more critical step. Often, webmasters think that by having lots of web content on or about the same topic, they will rise to the top of search engines. Nothing could be further from the truth if you do not give all that content hierarchy through the links. Some SEOs call a lack of hierarchy “cannibalization”. The search engines see multiple pages on the site that COULD all rank for a given topic. If the MAIN page is not defined, no page has enough clarity or confidence to rank.

Actively decide which page should be the master page for a given phrase or topic. You also decide that the other pages should NOT rank for that topic. Link to the cornerstone content when the topic is mentioned elsewhere.

For more information: How to associate target entities to web pages

Step 2: Find anchor text opportunities

Use your site’s search functionality to find other mentions of those keywords. You can also use the popular Google hack to do this. Search in Google for “Your keyword site:yoursite.com”. (That is to say, the SITE: command within a Google search will limit Google’s search results to the site you specify).

This latter approach is not practical if Google has not yet adequately indexed all the content on your website, so do use your site’s search function if it has one.

Wherever you find your keyword mentioned on the site, link that keyword through to the cornerstone page. This is not as straightforward as it sounds. If your keyword is too specific, you may not find all the mentions in a search. Worse, you may use an increasingly unnatural “anchor text”. (Anchor text is the text that the reader sees when looking at the link on the web page.) Avoid this.

Try and make sense to humans. For example, you may have a cornerstone page about “The Ritz Hotel, London”. You might have the text “Tea at the Ritz” on the page about afternoon tea. You need to decide whether to use the words “The Ritz” or the whole phrase “Tea at the Ritz” in the anchor text. This should depend on whether there is another page about the concept of “Tea” at “Tea at Hotels”. If not, then use the whole phrase.

Step 4: Repeat with varying keywords and synonyms.

Google often understands variations on a theme. For example, “Site, Website, and Domain” may (or may not) mean the same thing. It will depend on the context of their use.

Assuming you are not talking about another meaning for “site” and “domain”, let’s say you have a cornerstone page about “Websites”. You may also want to link mentions of “sites” and “Domains” to the same cornerstone content. Doing so should help Google see that these are similar concepts.

Always follow best practices

There are several best practices to follow to make the most of your effort.

Here are the main ones you definitely should follow:

  • Serve the interests of your visitors: link to pages talking about similar subjects
  • Use relevant anchor texts and mix keywords with synonyms
  • Always use do follow internal links

Featured Resource: Internal Linking Best Practices

To properly audit your internal links structure, you should breakdown your audit into three main steps:

  1. Diagnose and fix problems
  2. Get an estimate of your internal linking score
  3. Identify your opportunities

1. Diagnose & Fix Issues

To properly optimize an existing internal link structure, you first need to go through potential issues and fix them

Identify and fix broken links

Broken links go to non-existent resources, typically a 404 error page. You can get a clear overview of these broken lists by running a crawl of your website using a tool like Screaming Frog or Sitebulb.

Once you’ve listed all your broken links, you got a few solutions:

  • Change the link destination to an existing page
  • or add a redirect from the non-existent page to a relevant one
  • or simply delete the link

Make sure your links do not cause content duplication. When done inconsistently, they can create duplicate versions of your pages.

This may happen when some of your links to a page end with slashes while others don’t, or when some start with the www version of the URL and others don’t.

Search engines might consider these pages duplicated if you didn’t set up redirect rules.

The best way to handle this issue is to set up the required redirect rules and fix internal links to ensure the consistency of your internal link structure.

Optimize your anchor texts

Any contextual link (embedded in text paragraph) should incorporate a meaningful anchor text. Get rid of any “click here” anchor text. Use keywords and synonyms instead.

If some of your images are used for internal linking, as may happen for Call to Actions, then make sure that you’ve added a meaningful ALT attribute to these images.

Featured Resource: How to do an internal links audit

2. Compute your internal linking score

Whether you got a blog with hundreds of pages or a website incorporating a lot of text content, there is a simple way to know if you’re falling in the 5% of websites with a perfectly optimized internal link structure.

We’ve developed a simple way to assess this with the internal linking score.

Here is the process to follow:

  1. Sign up for a free account on InLinks, then create a project.
  2. Import your pages to identify the named entities they contain.
  3. Associate your essential pages with the entities they relate to.

Then, in the links tab, you’ll see a bunch of statistics, including your overall score (for the selected pages), and a breakdown of this score topic by topic.

Internal linking score computation

You’ll find more details on this internal links score computation in our study about the state of internal linking, but basically, this score is the ratio between existing, hard-coded internal links and internal links opportunities detected by Natural Language Processing.

If you manage a website having hundreds of editorial pages, you probably have tons of link opportunities sleeping in your content. Building them out manually will take you days or weeks. Using a plugin to automate internal link building is not a solution, as your links will suffer from exact match anchor texts and lack of context.

Do you want to optimize your internal linking? Turn your words into actionable data.

Think entities, not keywords, and you’ll be able to interlink between posts in different languages. s (BTW, if you’re not sure about what an entity is, have a look at the Entity SEO guide)

Now, if you manage to compute your internal links score using InLinks, you also have a list of available link opportunities.

This list is obtained first by extracting named entities from your pages to build a Topic Map of your website, listing all topics (aka entities) mentioned in your content.

Example of Topic Map of an SEO agency website

The second step in identifying your link opportunities is associating your target pages with related entities. A tool like InLinks will tell you exactly where you’ve talked about these corresponding “target” entities and if there are links built to your target pages.

If no link has been made, InLinks will show this and build these missing opportunities by selecting the best anchor text on a given page.

Example of a list of internal links opportunities with proposed anchor texts

The main benefit of this entity-based approach is that links opportunities are twofold:

  1. Missing link opportunities will be detected using entity synonyms. A keyword-based approach will only bring you a list of opportunities based on an exact match
  2. Anchor text suggestions will again use entity synonyms, context, and all the knowledge detected in your content to automatically build the missing links, enhancing your topical authority.

Moreover, you can define specific rules, such as “link only to entity A if entity B is also contained in the text”, to sculpt your internal links profile.

More information: How to automate your internal linking

4: Check out our Internal Linking FAQs

We have collected a list of the most frequently asked questions (with answers) about internal linking.

Some Case studies

Finally, in case you’re still not sure if internal links are a key factor for SEO success, here are some case studies showing the impact internal links may have on Google rankings:

Next: Read Internal linking: 15 benefits for your website’s SEO

You can define your own entities on your own web pages. When a search engine such as Google comes to see your site, they will see the underlying structured data on the page, which will allow them to easily categorize the content. Webpage schema is a sensible approach to doing this. You do not need to be listed in other RDFs to have entities recorded around the content you create.

Here is full documentation on structured data from Schema.org. However, as this is the beginner’s guide, here are some quick and easy ways to understand, organize and add structured data to your pages.

Looking at your structured data

Unless you are used to working with code, it can be very difficult to understand what structured data is already on each page of your website. Fortunately, Google provides a simple to use structured data testing tool. You do not have to own a website to be able to use the tool. It works on any site.

Google’s Structured Data Tool Output

Tip: Use the structured data tools on your competition

There are many structured data tools. It is likely, though, that you have a few serious competitors in your niche. Some of these may have done a much better job than you at becoming an entity in their own right or at least becoming an expert on entities that you feel you should own. You can take two steps to view a data structure that might work for you:

1: Use Google’s Knowledge Graph Search Tool to establish whether your competitor’s brand or name is already properly defined in the Google Knowledge Graph.

2: Then use the Structured data tool on the best-represented competitors’ web pages to understand how they used structured data.

Many tools claim to automate schema, but very few, beyond InLinks will create webpage schema for you automatically. This is because most schema tools will create other types of schema – which is valuable of course – but not necessarily help Google understand the content on your site. Another schema may help turn content into a recipe, or an event or describe the author or the organization behind the content. InLinks, on the other hand, describes the content meaning itself, by taking the most important entities and telling search engines through a schema that the page is ABOUT these main entities and then takes the secondary ideas and tells the search engines through a schema that the page MENTIONS these secondary entities.

Webpage Schema Example
This JSON-LD Webpage Schema was automatically generated by inlinks.net

Inlinks can automate this very effectively because it manages its own knowledge base.

  1. Add the URL to inLinks
  2. Associate the URL with one or more primary topics
  3. There is no 3 (if you have already set up inLinks code)

Using Plugins for WordPress

Yoast plugin for WordPress: Yoast’s plugin is one of several that will create your structured data for you. All you have to do is to decide whether your blog should be set up as a person or as an organisation. The plugin then uses your other setup configurations, such as your user profile and social media profiles to build out the structured data.

There are many other plugins that also allow you to manage your structured data on WordPress. A self-updating list is here. You should only integrate plugins with a large (10,000+) user base and one that is being regularly updated to work with your versions of WordPress. It is also probably good to only have ONE plugin trying to manage your structured data at any one time. You can make plugins inactive at the click of a button in WordPress if one plugin is clashing with another.

PreviousSemantic Search Guide ContentsNext Guide: Internal Linking

Creating Digital Assets

Whether you have decided that your strategy is to be the entity, be an authority on the entity or play on the edge, your next step is to start marshalling or making your digital assets. Here the thought process is rather different to the old school idea of “content marketing” where you just carry on writing content about a subject and hope it generates organic traffic. The best way to understand this is to return to Google and look more closely at how many different ways digital assets affect something. Let’s choose a very different theme this time… something that might be seen as a bit of a free-for-all “entity”. I’m feeling hungry, so let’s do food:

As with so many entities, Google chooses to have a snippet from Wikipedia in the knowledge box here. There is a very interesting section in the book reference earlier called “Entity-Orientated Search” on the structure of a Wikipedia page. Wikipedia is surprisingly exact and consistent, making it extremely easy for a knowledge base to create structure out of the content in Wikipedia. There are also many other RDFs (Resource Description Frameworks) based on the Wikimedia organisation. We’ll talk a little about RDFs in general and Wikimedia properties in particular separately.

The point I wanted to make here is that there are many other digital assets on this page other than just recipes for Coronation Chicken. There are Youtube videos, for example. Youtube is an extremely large structured data source, so why would you not try to have a video on how to make Coronation Chicken if you wanted to influence this page? Putting your brand of Mayonnaise in the video is part of the optimisation.

Then there are multiple images on the knowledge box. These can come from anywhere on the web, including your website. Do you see that one for “Curry Ketchup”? Now THAT is finding a niche J. My point is that you cannot optimise for Entity search unless you create all the digital assets that Google is choosing to represent on the page. Images are important. There is a renowned case of one brand taking this too far, by changing all the images on Wikipedia for ones that had their brand on. Unfortunately, Wikipedia did not see the funny side and now the case study makes up the majority of their brand page. Ask someone on Twitter if you want to find the case study.

We now also see ratings on the search results. Ratings are another form of structured data, helping Google to assess the quality of the coronation chicken recipes that it might choose from.

Lastly – I notice that Wikipedia thinks coronation chicken was invented by Constance Spry and Rosemary Hume and links TO their entries, which in turn link back. Look at how Wikipedia continually cross-references these facts through internal links (inlinks):

Rosemary Hume’s Wikipedia entry links back to the Coronation Chicken entry

Twitter Content

Once Google has associated an entity with its Twitter profile, a direct search hit on the entity will also return live Twitter posts in the search results! It is therefore important that IF you use Twitter, you properly link to it through structured mark-up and website (and complete the loop by linking back from your profile). On top of this, it is important to make sure the Twitter “tone of voice” is consistent with the rest of your brand story.

Video Content

Whilst posting your own videos on Youtube is a great idea, it is very possible that videos are created by other people that talk about you, your product or entity. For example, if your staff talk at events. These are also powerful assets and you can harness these by including these in your video channel if they are on YouTube or embedding them in your blog content. In doing so, you help to connect the dots for the knowledge graph.

PreviousSemantic Search Guide ContentsNext

This is much harder than it sounds, mostly because businesses do not entirely agree with the message that they want to portray and the niche they want to dominate succinctly enough.  Mary Bowling, a long-time SEO from Ignitor, recommends looking at your own website as if it was your own personal knowledge graph:

Figure 2 Make your site a Knowledge Base of your brand. Reproduced with permission from Mary Bowling.

This approach was also proposed by Jarno Van Driel, known as “@SEOSkeptic” on Twitter, several years ago. However, I think we can step one level beyond this approach. In a modern marketing strategy, you need to communicate with your audience on their terms, not yours. That is to say, some will engage on Twitter, others on Instagram and others on Youtube. Increasingly few will engage directly via your website and this should be factored into your personal knowledge graph.

This means that the relationships (links) should not solely be on your website, but should connect all your digital assets. In addition, Entities in your own personal Knowledge Graph should be extended to other digital assets beyond the website.

This leads to discussing the creation of Digital Assets.

PreviousSemantic Search Guide ContentsNext

Being the Entity

If you are a business or organisation, then you ARE an entity. Google may not have enough confidence, yet, to know this. Every person on the planet is an entity, but Google does not yet try to distinguish between every version of “Purna Patel or “Sally Stokes” on the planet… at least not in the search results. In the end, though, Google is collecting large amounts of this data. Very few of us in Western society can avoid having some form of Google login. Google, is currently having to address privacy concerns, however. This will mean that you being represented in search as an entity will increasingly require you to actively opt-in and request such representation. Google+ was shut down in December 2018, no doubt largely in response to the GDPR regulations in Europe and increased concerns in the US over privacy.

This suggests that Google is being careful to ensure that if you as an individual are represented in Google’s Knowledge Graph (or on the knowledge box in the SERPs), that they are confident this is a result which is not only accurate but also in the public domain and public interest to show. There are many ways to approach becoming a named person or entity, some of which are highlighted in this guide under “RDFs and how to find relevant ones”.

Google My Business (GMB)

As an organisation, your entity can live and flourish in Google, initially through Google My Business. GMB is itself an RDF and a great place for an organisation to start. Being listed in GMB will usually give you the ability to show up as a knowledge box, but this might be only in tight searches. Nevertheless, it acts as a useful launchpad for most organisations.

Becoming connected to an Entity

If you cannot BE the entity, you can still become an entity by association. It is very possible that nobody can own the entity or thing in question. This work is an example. It hopes to show authority in the field of SEO. SEO (or more accurately Search Engine Optimization) is an entity that Google understands. You can see from the knowledge box that writing a book on SEO is probably a great way for Google’s knowledge graph to link you closely with SEO.

Damn! My old sparring partner, Rand Fishkin’s excellent book (co-authored by Eric Enge, Stephan Spencer and Jessie Strichiola) is right there. “mastering the Art of SEO”. The very fact that four authors all known for their SEO are listed on the cover, makes them all semantically close to each other. Do you see how these close associations can easily start to create bubbles in a Knowledge Graph? You might understand Entity Search from the ground up and may have built your own knowledge graph as Inlinks has… but unless you are associated closely with the subject matter, the bubble that already exists will cut you out. Don’t get angry… it is simply Google’s equivalent of the echo chambers we see in society and on social media. These echo chambers in themselves are not good or bad, they just are. You simply need to find another way in…

Other RDFs

Wikipedia is by no means the only data source that Google can extract data from…

Write a book and get it published by a reputable publisher

This will get you associated with the book ontologies. If your book has an ISBN number, then this can be independently referenced. (The USA has a similar book referencing system).

Act in a film or Direct a Play

the IMDB is a powerful RDF database that is believed to be respected by Google as an authoritative (and therefore trusted) source of information about actors and directors. If you are in a film and listed in the credits, you can get into the IMDB and then claim your listing, much like you can with Google My Business. Having this listing will either help you to become an entity in your own right or will give a neutral and verifiable link for the creation of a Wikipedia entry.

Stand for something!

If you are a Congresswoman or a Member of Parliament, it is almost impossible not to be considered as an entity, because all the other people will also be considered entities.

If you are a band, get on the Festival Circuit

Your band may not be an entity, but Reading Rock Festival, Glastonbury, or Burning Man certainly is. By getting onto the bill of these established entities, you create independently verifiable information about the band.

A few music festivals likely to be listed in Google’s knowledge graph

Next: Aligning your online presence with your niche

PreviousSemantic Search Guide ContentsNext

Become an entity or an expert on an entity

Your first strategic decision is whether you want to try to BE a fully defined entity in your own right. There has been a move in recent years away from optimizing for keywords and instead simply trying to make your brand stand out from the crowd online. One reason this works well is that your brand can become an entity that you more or less can control (although not always). Once you have an entity on Google’s knowledge graph, what that entity gets up to will be continuously updated in the knowledge graph. If you are a band, for example, then marketing your new album organically becomes MUCH easier than it would be for a record store to market the same album. The knowledge base will simply update, showing the new album. This immediately creates a short vector between the album and the band. The relationship is defined… but the record shop may have a harder time and will need an edge strategy.

Strategies covered in more detail

Below are several competing ideas for semantic SEO. The SEO industry rarely agrees on anything and tends to use the phrase “it depends” way too often for C-suites to take SEOs seriously. In the end, you will need to weigh up the merits and risks associated with each approach and act accordingly.

PreviousSemantic Search Guide ContentsNext

Getting a Wikipedia entry is fraught with dangers. Inlinks has chosen not to list a specific strategy. Instead, we are bringing in tips and ideas from well-known practitioners in online retrieval, including inLinks users.

One of the challenges is that Wikipedia is controlled ultimately by a very small and not necessarily unbiased group of people. According to Ricardo Baeza-Yates (24 minutes into this lecture), 0.04% of the users of Wikipedia create 50% of all posts. That is considerably more extreme than Facebook or Twitter, also cited in the same lecture.

0.04% is less than 1 in 2,000 users.

I have previously discussed the bias that results from this problem over on my personal blog.

What the Experts Say

I approached Wikipedia editor, Search Engine Journal author and Webmasterradio online radio personality, Jim Hedger to get some thoughts.

“The crux to Wikipedia is to go very slowly and build personal authority. It’s a community driven legacy project with a high sense of purpose and mission. It has a hierarchy of authority but most decisions are made by regular editors who subscribe to a common set of guidelines.

Topical areas people want to edit in become little sub-tribes of networked contacts who have worked the subject material for years and newly interested people. Such communities are built on trust in long term dedication to accuracy and skill. Pretty much everything else revolves around some variation on the rules of educated and civil society.

Cite everything you can. Wikipedia is all about providing new paths for users to follow when examining and evaluating information if there’s a credible source. (There is strict criteria for what can be considered “credible”.)

Don’t try to impose your ideas on other people without first considering their backgrounds and experiences. Wikipedia isn’t social media. There is a definable right and a wrong and a great deal of proof is required to prove oneself right, even on things that are obvious to every observer on Earth.

Forgo: political bias or commercial goals; ‘I mean…’, ‘like’, ‘ok’, ‘so like’, ‘of course’, ‘but’, ‘you know…’. Polite, educated, civil society and all that. We have incredibly complex polite, educated, civil societies already made up of people who have known each other since they went to school together. We all know how things are done amongst people who have lived to learn to trust each other eh? It’s the same, ’tis the same in the whole wide world. Keep your political and / or commercial ideas at an arm’s length from your profile until you know its OK to introduce them in subtle ways.

Citations are extremely effective ways of being subtle but, of course, they’re the among most examined elements of newer editors’ work. Images are another way of introducing subversive or commercial content without being completely obvious.

Almost all Wikipedia editors, meta-editors, and admins can read (almost) but fewer will be able to visually contextualize an image unless they are extremely familiar with a topic. Know when to pick your battles. Unless you’re behind the scenes or sit on an American Parents Teachers Association, it’s hard to describe the levels of petty bullshit that fly around in discussions about ideas or controversial edits. You have a finite amount of social capital and community respect. Know how to invest it so it grows rather than spend it so it declines .”

Jim Hedger

Arnout Hellemans, a Dutch search specialist, agrees that you should take small steps and not try to dive in. He also recommends focussing on Wikimedia’s prime data repository, WikiData. Paraphrasing his telephone conversation:

I really became interested in Wikipedia after reading a SEMRush article [by Jacques Bouchard] on how to use Wikidata. The trick is to move slowly and connect dots. Let me start with the example of a hotel, such as the Waldorf in New York. Look up other hotels that have entries on Wikidata and look at the “identifiers” section. [This represents other URls that represent the same physical entity.] Now make sure that you add similar identifiers to your hotel.

Wikidata is the ‘Linking pin’ between all of the trusted topics of your entity.

Take your time and do not make multiple edits on the same entity. Edit and add identifiers to many other areas and add to the collective repository and not just ones you are directly interested in.

For SMBs and people it is much harder to use Wikidata.

Arnout Hellemans

Information Retrieval expert, Dawn Anderson takes a much more direct approach:

Do something notable I would say. Getting into Wikipedia is not a given for anybody.

Dawn Anderson

This is great advice but demonstrates how challenging it is to warrant comment in what is an encyclopedia. There is often a feeling of anguish at the personal level that you or your favourite entity does not warrant inclusion in Wikipedia, but would you have expected such an entry in previous versions of encyclopedias? Such and Encarta or the Encyclopedia Britannica? If not, then perhaps this is a pause for thought.

Jason Barnard concurs, but adds a cautionary note:

When thinking about a place in the Knowledge Graph, I would say ‘find your springboard’. As Dawn says, what makes you notable (and worthy)? Wikipedia’s rules are a great guide, but are no longer the ‘law’. The opportunities have gotten MUCH wider in the last year. And will get wider still in the years to come.

If you create an entry that is not worthy of a place, or overdo editing on pages you are closely associated with, you will get a warning, or possibly removed. The job to get a page relisted is very very difficult, and the work to remove a warning is very slow and delicate. Be warned !

Jason Barnard

Greg Niland of GoodROI suggests:

Using the Help a Reporter site can help to build up enough media mentions to support a case for inclusion.

Greg Niland

This looks at solving the problem from a side-on perspective and avoids trying to manipulate or edit Wikipedia directly. The theory is that if you can be cited as an authority in a reputable source, such as the Wall Street Journal or the New York Times, then this significantly increases the odds of a third party using your citation as an independent citation to back up a Wikipedia entry. Note that this strategy is not directly aiming at BEING an entity on Wikipedia, but instead develops LINKS from Wikipedia.

Avoiding editing your own entry

I asked “how should I suggest people deal with the thorny point that if you are connected to the entity/article, you are not supposed to edit the entity/article? This, to me, seems a little misguided as it means by definition, the editors are NOT experts in the content they are editing… but how should a would-be-notable address this?” I received this sage response from

I would suggest going to the “Talk” tab, start a thread there and just tell them that you realize you can’t edit it because you’re connected to it, but lay out the inaccuracies/corrections/additions and ask if someone would please make those changes.

Doc Sheldon

Other Resources:

  • A very good article on getting listed on Wikipedia is offered here.
  • Wikipedia also gives a guide itself here.
  • This SEMRush article from 2015 is also cited above.

PreviousSemantic Search Guide ContentsNext

Can you make this guide better? Send suggestions via the blue chat icon. ==>

When researching how the KG was being updated, it initially took me a long time to find entities that were anything except Wikipedia listings. It turns out, though, that Google has a lot of data that it does not initially reveal in the knowledge graph answer box.

Google’s knowledge graph extrapolates insights gleaned from its data set. Here is an example:

Google made two leaps here. The first was in what I searched for. I searched for “brother” and Google returned a sister! Google knows that “brother”, “Sister” and “siblings are semantically so close that Google made the substitution for me (and didn’t even tell me that it had). The second leap is that Google has provided details on a person without their own Wikipedia page.

In fact, there is no specific entity for Kasmira Cooke anywhere in the Wikimedia set of sites, if we use “Wikidata.org” as a measure:

How did Google get to this level of confidence? Google uses content to add to existing entries and in the process, creates new relationships. each “triple” as described in an earlier section, creates two entities. So in this case, Google felt it could trust the content on Wikipedia which gives several triples in just this section:

(From the Wikipedia page for Freddie Mercury)

Now Google knows:

  • Freddie Mercury (is the brother of) Kasmira Bulsara
  • KashmiraBulsara (is a type of) Person
  • KashmiraBulsara (is the same as) Kasmira Cooke

In fact, Google can then carry on collecting information about the new entity. Put “Kasmira Cooke” into Google and you get a pretty solid looking knowledge box.

What this teaches SEOs

You do not NEED to have a Wikipedia page to get your own entity in Google’s Knowledge Graph. Even so, it very much helps to be related (in this case quite literally) to an entity existing in Wikipedia. Have a good think about the entity you would LIKE to get listed in Google’s knowledge graph. Does it have any close relationships with any listings in Wikipedia? Does the person running that entity have a famous brother/sister/father/mother? If so, that person might get listed in Wikipedia as related to an existing entity. From there, they have their own entity. After this, you can possibly use schema to help Google understand that this entity runs the entity you wish to get listed.

Hire a Chair / Patron

Not all of us have the luxury of a famous brother or sister. But Princess Anne has nine pages of charities that she supports. These allow Google to make the connection. It does not in any way GUARANTEE it, though. Leuchie Forever Fund is a charity supported by Princess Royal, but as of the writing date, this charity did not have an entity, but it offers a potential path for the enterprising SEO to develop.

Who says that the Old School Tie network is dying out in the age of automation?

Start with a Unique Word to Brand your Entity

Google would have had a lot more difficulty in making these relationships if Freddie Mercury was not a unique name and if his surname had not been “Bulsara”. Uniqueness helps the KG reach levels of confidence faster. I am not suggesting a change of name will guarantee success, but it might be a consideration if you are just starting out and have not yet settled on a strategy.

Google is an agnostic White Man

This might be a little contentious, but “brother” and “sister” both have different meanings in black and religious communities. Google has connected these words so closely with the word “siblings” that its algorithm may have become closed to other interpretations of these words. This may emanate from the types of people involved in curating the initial seed set. This bias is a recognized problem in the building of Knowledge graphs.

There are also other databases that google considers beyond Wikipedia… let’s look at a few approaches to getting into these…

PreviousSemantic Search Guide ContentsNext

Know what is an Entity (and what isn’t )

Just as you can type in site:example.com into Google search to see all (or most) of the web pages that Google has for any given site in its web index, they also provide a tool https://developers.google.com/apis-explorer/#p/kgsearch/v1/kgsearch.entities.search to allow you to interrogate their knowledge base. This is a very useful place to start. After all, if your brand, product, organization or person is already well defined in Google’s Knowledge Graph, then you are in a much stronger position than if it is not defined.

Here are the basic steps. We’ll go deeper in the next section.

1: Go to Google’s web based API explorer.

The page should look a bit like the image above. In the Query field, add your search term. Then click execute.

2: Scroll Down!

One of the annoying things about many of these tools is that they are meant as demos for programmers, not for SEOs that do not program day-in-day-out. That means there is a little laxness when it comes to UI. If you don’t see anything happen when you pressed execute, it probably did work but displayed the results below the fold. Scroll down the page to see something like this…

Do not be alarmed by the look of this! It may be long or short, but it is structured… and quite easy to read as a human if you don’t panic.

3: Search for your domain

If the output is long, simply type CTRL-F to open a search box on your browser and see if your domain is on the page.

Understanding the output of Google’s Entity Search Tool

The tool described above is, in one sense, the last word on whether an entity is “recognized” by Google. Simply put, if the entity is in this list, then your strategy should be to make the record richer by helping Google add verifiable information to the Entity record. Once a record is created, then Google will be able to enrich the record with more information as it travels around the web (including your website) and reads structured markup in particular. However, whether any given structured data is taken on board by Google is far from clear. Barbara Starr talks about “Trust” and “Proof” being at the top of the Semantic Web Stack. This is worth a read to understand why you cannot just add to the record manually. Even so, there are some great nuggets for SEOs when analysing the output from this tool. We’ll discuss some now…

When ZERO entities exist for a given query

Regardless of whether any data is returned, the output provides a few lines of text. These can be ignored, except that it does mean that Google definitely doesn’t consider the query to be associated with “an entity”.

When only one result exists

If you are lucky enough to be famous and have an uncommon name, you may have found the Entity SEO equivalent of what Gary Stock and later Dave Gorman once termed as a GoogleWhack. The output text that appears when no entity is returned still appears, but then the output for one other entity. For the purposes of understanding the output, here are TWO query variations: “Bill Hartzer” and “Ramsey Saint Mary’s”

Result for “Bill Hartzer (July 2019)
Result for “Ramsey Saint Mary’s” (July 2019)

Both these queries return a single item. Going through these line by line helps us to understand what we are looking at:

@Type: EntitySearchResult: They are both showing @EntitySearchResults because we were using the Entity Search API. Every record seen using this tool will start with this description for the @type.

@id”: “kg:/m/…”: This is the all-important record locator. If this is the record you hope to optimize, then make a note of it. You could try using it in your structured mark-up on your web pages. The “kg” means that the data comes from Google’s “Knowledge Graph”. This may tell us that there are other structured data stores at Google? There is also another nugget for SEOs here. “m” usually seems to mean the data was sourced from Google’s purchase of Freebase a number of years ago. This data was expected to be migrated over to WikiData (part of Wikimedia opensource data) but it is not clear whether this migration was ever completed. If this was a “g” instead, the data is sourced in Google’s own proprietary dataset.

“name”: “Bill Hartzer” or “2007-08 Isle of Man League” : Here we get the name of the thing/entity in question. This is the entity that Google has returned for the search query that we entered. I find this interesting, because whilst “Bill Hartzer” is an exact match to the query, “2007-08 Isle of Man League” is not what I was expecting at all! I know “Ramsey Saint Marys” as a tiny village in Huntingdonshire, countryside in the UK. I have no idea how Google associated this query with what appears to be a sports league in the Isle of Man!

@type: now appears again on both entries. We have already seen @type higher up in the output, so why do we see this again? Note the slight indent as we are heading down the text? ….

What this Teaches SEOs

  • A search query does not need to be an exact match for Google to return an entity.
  • There could be a possibility in this ambiguity for Black Hat optimisation to try to exploit Entity Search.
  • Sometimes Google is just WRONG.

Inlinks builds out a full knowledge graph specific to your website and is able to find many stronger entities and relationships than Google does. This delta – the gap between the entities on your site and the entities that Google THINKS are on your site represents a valuable SEO opportunity. The tool is free for the first 20 pages on any site.

PreviousSemantic Search Guide ContentsNext

This is not the first book or content you should ever read on SEO. There are several good works on Search, including:

  • The Art of SEO: Rand Fishkin, Stephan Spencer et al
  • Search Engine Visibility: Shari Thurow
  • Search Engine Optimization for Dummies: Peter Kent
  • Entity Orientated Search: Krisztian Balog

This guide is not trying to replace these. Instead, the guide is looking to augment traditional SEO approaches, It helps existing SEOs to understand the principles of entities and semantic search and shows how maximizing the traffic and branding that entities can offer differs from traditional SEO.

At the same time, this guide is bite-sized, compared to other works on Entity Search. There are some complex ideas that may have been explained in terms that, to some, will be too complex and to others, too simple. This book offers up strategies for semantic SEO and does not seek to cover the complex subject of data indexing and information retrieval in Entity Search.

Contribute to further editions or revisions

Entities are a continually evolving idea. As such, this is a work that will need to be updated and improved frequently to remain relevant. Readers already noted as experts in the field are invited to add their own chapters or sections to further editions of this guide by emailing them to publications@dixonjones.com. Sending content on this route will suggest that it is your personal work and that you are willing for it to be incorporated into the main text.

PreviousSemantic Search Guide ContentsNext

Summary:

Modern Search Engines can derive insights across multiple documents instantly. Cataloguing systems over the years moved towards “10 blue links” search results and how now moved on to a more encyclopedic format. In a way, the retrieval methods have gone full-circle.

Back when we all used real-world libraries more (and those libraries are still there, very peaceful places to work now, away from the children’s section), how did the librarian look up where a book was when you asked? She invariably had a cataloguing system. In my youth, this was a card-based system, based on numbers. Today you still see every book in the library with catalogue number stuck to the back.

When the Internet started, Jerry Yang and David Filo thought that someone should start doing the same thing with websites and they formed the YAHOO directory. This was a manually curated list of websites, with a small summary of each site’s purpose and a hierarchical category. By modern standards, it wasn’t sophisticated, but at one point Yahoo was the most valuable online business in the world. Two popular variations of the model were Looksmart, which was used by Microsoft and the Open Directory Project, which was an open-source variation that could be used by any search engine, (later including Google).

Competing with this idea of cataloguing websites was the concept of “full-text search” – which was led by AltaVista and myriad other companies (including a valiant effort by Yahoo) but ultimately won by Google in the west, Baidu in China and Yandex in Russia. Full-text search offered more promise, providing everything could fall into place. Website curation was slow and manual. All the contents of the website had to be explained in a few sentences! Much like a cataloguing system in the local library. Full-text search, on the other hand, needed no manual intervention and every PAGE could be a separate entity in the index. That allowed for a much larger index overall.

Knowledge bases are, to some extent, a swing back to the old way of doing things. We’ll return to this argument later, but first, let’s explore the differences between catalogue or directory-based indexing and text-based indexing and then delve into some of the concepts behind text-based indexing. Time starved SEO experts that already know text-based search may choose to skip to the next section.

There were some advantages of both approaches to indexing the web. There still are. Ultimately the full text-based approach won out until recently. As the web continues to grow, however, Google’s mission of “organizing the world’s information” has hit several new barriers. Given that there are far more pages on the Internet about any given subject than anyone can ever read, how much point is there in Google continually trying to collect the information and order it, if nobody ever looks past the first page of results? Even Google’s resources are finite, and even if they were not, the planet’s resources ARE finite. You may be surprised to learn that one energy website has estimated the power needed to drive Google search is about the same as powering 200,000 homes. Statista reports 4X more energy used by Google, so approaching a million next year if something doesn’t change! Google could still sustain this by buying renewable energy to a point. Even so, Moore’s law, which suggested microchips would continue to get faster and faster has reached both a physical and economic barrier. Quantum computers may fill this void, but right now, any search engine needs to make compromises.

But until this crisis point, the full-text search was killing human-curated search. To achieve quality results for users in full-text search, search engines needed to change text strings (which are notoriously hard for machines to analyse) into numerical and mathematical concepts, which can then be easily ranked or scored, ready for the time when users need answers to their search queries. The process goes something like this:

Crawl and Discover phase

Most search engines discover content by crawling it, although traditional crawling is far from the only way in which search engines can ingest content. According to Incapsula (now Impervia), most web traffic actually comes from bots. This is not just Google and Bing. Distributed crawlers like Majestic (where I used to be a director) a specialist search engine analysing the links BETWEEN web pages, crawls faster than Bing. I discussed this once with a friend in Microsoft and he said that one of Microsoft’s objectives was to reduce the need for crawling altogether. I do not know how true this is, but certainly, at this point, web crawling is the main way in which search engines ingest text. It is also the main way in which they discover new URLs and content to feed these insatiable crawlers because crawling a page reveals links to new pages, which can then be put into a queuing system for the next crawl. Discovery also comes in many other forms. Site maps are very helpful for Google and they make it easy for website owners to submit maps directly into Google through their “Webmaster Search Console”. They can also cut corners by looking at news feeds or RSS feeds which update as the website content updates.

Crawling at scale was relatively efficient for many years. The bot could simply grab the HTML of the page and some other metadata and process the text on the page at a later point. However, technology never stops and first frames, then iFrames, then CSS and then Javascript started to add complexity to this process. Javascript, in particular, creates a huge overhead for search engines to process. Content delivered by Javascript is rendered on the client side. That is to say, your own PC, laptop or phone uses some of its CPU to make the web page appear in the way it does. For a web crawler to read every page on the internet is one thing. For it to crawl it AND understand the Javascript at the same time would slow the crawlers down to such a pace that crawling would not scale. Google, therefore, introduced a fourth step into the process of indexing.

Javascript Challenges

Google currently looks to be leading the charge in analysing Javascript and they have certainly improved significantly over recent years. Nevertheless, the computing overhead required is immense, the processing has to take place sometimes several weeks after the initial crawl and significant compromises have to be made. Martin Splitt, from Google, runs many excellent videos around this challenge.

Turning text into mathematical concepts

Now we turn to the heart of full-text search. SEOs tend to dwell on the indexing part of the search or the retrieval part of the search, called the Search Engine Results Pages (SERPs, for short). I believe they do this because they can see these parts of the search. They can tell if their pages have been crawled, or if they appear. What they tend to ignore is the black box in the middle. The part where a search engine takes all those gazillion words and puts them in an index in a way that allows for instant retrieval. At the same time, they are able to blend text results with videos, images and other types of data in a process known as “Universal Search”. This is the heart of the matter and whilst this book will not attempt to cover all of this complex subject, we will go into a number of the algorithms that search engines use. I hope these explanations of sometimes complex, but mostly iterative algorithms appeal to the marketer inside you and do not challenge your maths skills too much.

If you would like to take these ideas in video form, I highly recommend a video by Peter Norvig from Google in 2011: https://www.youtube.com/watch?v=yvDCzhbjYWs

Continuous Bag of Words (COBW) and nGrams

This is a great algorithm to start with because it is easy to visualize. Imagine a computer reading words at breakneck speed. It reads a word on a page, then the next, then the next. Every word it reads initially makes a decision:

Decision: Is this word potentially important?

It makes a determination here by stripping out all those very common words like “an”, “it”, “as”. It does this by checking against a (curated) list of STOP words.

Decision: is this the right variant?

At the same time as deciding whether to drop a word, it might change the word slightly, by removing the “s” from “horseshoes” or matching capitalized words with non-capitalized variants. In short, it aggregates different variants into one form. We’ll return to this when we talk about entities because there’s not much difference between “litter”, “rubbish” and “garbage”.

Then the system simply counts words. Every time it sees the word “Horseshoe” it adds 1 to the total number of times it has seen the word horseshoe on the Internet and adds 1 to the number of times it sees it on the page it is currently looking at. Technically, Information retrieval experts call pages “documents”, mostly due to historical reasons before the Internet was a thing, but possibly in part just to make us mortals feel inferior!

Now, the search engine can easily see that a searcher looks for the word “horseshoe” it can find the page with the word most densely mentioned on it. This is a pretty BAD way to build a search engine because a page that just spams the word horseshoe would come to the top, instead of one that talks about horseshoes, but we will come to dealing with this kind of spam when we talk about PageRank and other ranking tools. It is a GREAT way, however, of storing all the words on the Internet efficiently. Whether the word is used once or a million times, the amount of storage needed is about the same and only increases by the number of pages on the Internet. (Information retrieval experts partly call the Internet the “corpus” of “documents” here… partly due to historical reasons, but now I am beginning to think they do it through some sense of passive-aggressive intellectualism. You judge for yourselves.)

This system gets much more useful when the crawler starts counting words that are next to each other, called n-grams. The crawler can then count the number for phrases several words long, after first stripping out the stop words and choosing the dominant variant of each word. Google even went so far in 2006 to publish a data set of n-grams for 13 million words, which is shown in Peter Norvig’s lecture and remains available for download.

  • Number of sentences:    95,119,665,584
  • Number of unigrams:         13,588,391
  • Number of bigrams:         314,843,401
  • Number of trigrams:        977,069,902
  • Number of fourgrams:     1,313,818,354
  • Number of fivegrams:     1,176,470,663

Now we can glean huge amounts of information from this information. Google knows that the phrase “the quick fox” is much more common than “the clever fox” on the internet. It doesn’t know why, but it does not need to. It only needs to return relevant pages for “the quick fox” when a person searches for this. If you are not sure why a fox is more likely to be “quick” than “clever”, it is because this forms part of a famous sentence that uses all the letters of the alphabet, making it ideal for teaching people to type on a QWERTY keyboard.

Figure 1: You can also check search usage. Blue is “the quick fox” while red is “the clever fox”

 A search engine can look at the number of times the words in the search – both individually and as a group – appear on a page. Spamming aside, there are myriad ways to then score each document for this phrase. A search engine is born!

Vectors

There is another revelation here. Having seen that “the quick fox” is much more popular as a phrase on the Internet than “the clever fox”, we can also deduce that the word “quick” is semantically closer to the word “fox” than “clever”. There are many algorithms, such as “Word2Vec” that use this kind of intuition to map words based on their “proximity”. “King and Queen” end up close, whilst “king and fox” end up very far apart. For further reading on this, look up “Vector Space Models“.

The move to Semantic Markup

By adding Semantic Markup to pages, Google and other search engines can shortcut the algorithms that they need to turn words into concepts. You help explain the content in a way that machines can digest and read. However, on its own, it would be very easy for web content to abuse this system. The knowledge graph needs to only augment the information that it already has when it is confident that the recommendations in the semantic markup are valid. If the search engines get this wrong, then Semantic Markup would be a little more effective than the “olden days of SEO” with keyword stuffing.

To do this, Search engines still need to trust humans! The Knowledge Graph started with a human-curated set of data.

Trusted Seed Sets: A glorified directory!

We started the journey of search by discussing how human-led web directories like Yahoo Directory and the Open Directory Project was surpassed by full-text search. The move to Semantic search, though, is a blending of the two ideas. At its heart, Google’s Knowledge-based extrapolates ideas from web pages and augments its database. However, the initial data set is trained by using “trusted seed sets”. the most visible of these is the Wikipedia foundation. Wikipedia is curated by humans and if something is listed in Wikipedia, it is almost always listed as an entity in Google’s Knowledge Graph.

This means that the whole integrity of the Entity based approach to search depends on the integrity and authenticity of those (usually unpaid) volunteers curating Wikipedia content. This produces challenges of both scale and ethics which, which are discussed by the author here.

So in many regards. the Knowledge Graph is the old web Directory going full circle. The original directories used a tree-like structure to give the directory and ontology, whilst the Knowledge Graph is more fluid in its ontology. In addition, the smallest unit of a directory structure was really a web page (or more often a website) whilst the smallest unit of a knowledge graph is an entity that can appear in many pages, but both ideas do in fact stem from humans making the initial decisions.

This leads us on to what Goggle considers an entity and what it doesn’t. Clearly, knowing this is important if we are to start “optimising” Semantic SEO.

PreviousSemantic Search Guide ContentsNext

What (exactly) IS Google Knowledge Graph?

Google Knowledge Graph is technically a knowledge base containing information acquired from several sources and their relationships in order to enhance the search results. This concept was introduced in the year 2012 as a way of providing more relevant, accurate, and helpful information based on what users search on the Web through the search engines. The knowledge graph presents the information to users in several ways, most notably via an infobox or knowledge panel usually placed next to the results.

The knowledge panel presents a wide variety of information concerning a subject or entity. For example, when a user types the name of a famous musician, the knowledge panel displays such details as the musician’s full name, images, list of songs, their recent tracks, upcoming events, partners, and other details. This is made possible by the knowledge graph as it creates a database by using the data available about the entity to create meaningful relations.

The user experience is improved greatly by the knowledge graphs since one is availed with an extensive range of information on a concept thus eliminating the need to keep on searching for a specific topic. This results in a reduced number of clicks, and it reduces the amount of time required to locate matching content.

The knowledge base is created by forming relationships between various entities. Entities in this case refer to concepts or things that are distinguishable including colour, people, location, a feeling, and organizations among others. Machine learning and other algorithms are deployed in the knowledge graphs to provide the most relevant and useful information to searchers. Interlinking data from millions of sources and utilizing machine learning concepts enables the knowledge graphs to come up with a knowledge base that has helpful and accurate information about the entities. When searching for some content via the Web, the graphs utilize semantic search methods to return the most relevant feedback. The knowledge graphs are designed in such a way that they can analyze the relationship between keywords and phrases to better understand what the user is interested in, or to understand the search’s context that return matching results.

Edges are used to connect the various entities and provide a description of the nature of the relationship between these entities. Through the knowledge graph, Google is able to present searchers with more information that is relevant to the specific search, and also increases the traffic for search engine optimization (SEO). Google’s knowledge graph helps in enhancing voice searches by identifying the entities in queries made using natural language. A business can benefit from knowledge graphs because it normally provides detailed information about the business after a search. Users find information about future events planned by a business that is beneficial to the business.

Some Knowledge Graph Detail

For example, consider the following sentence:

“Queen is a rock band”

An example of a “Semantic triple”
Visual display of a Knowledge Graph

Here are some things that the Knowledge Graph might store for “Queen, the band”:

  • Freddy Mercury (is a member of) Queen, (which is a) rock band,
  • Bohemian Rhapsody (is a) song (written by) Queen, (which is a) rock band
  • Innuendo (is an) album (written by) Queen, (which is a) rock band
  • Live Aid (is a) concert (featuring) Queen, (which is a) band.

The items in bold are all Entities. They all connect with a relationship (Shown in brackets). That’s really all there is to it! “Person”, “band” and “concert” are really classifications of things. or @types of things, rather than things in their own right… that is to say, many people can be classified as a “person” entity, Queen in this record is classified as a band. In another record, Queen may also exist, classified as (say) a monarch. Some common classifier @types are:

  • Person
  • Place
  • Date
  • Organisation
  • Review
  • Recipe
  • Event

By populating the Queen (a band) record with these lines of relationships to other records, the table that is produced acts as a semi-structured dataset about the entity. So when you type in “Queen, band” into Google, not only do you get the official website of the band as you have always done, you also get a “Knowledge Box” which is really just all these relationships laid out in a pretty manner. Their YouTube channel, their Spotify channel, their albums, members and more.

Note the importance of Wikipedia in much of Google’s Knowledge Panel. Google has noted in presentations that it uses the data from the Wikimedia foundation as a primary dataset for training its own systems when building the knowledge graph. IMDB also looks prevalent in this example.

Semantic Triples

A “Triple” in the context of semantic search, is a relationship between two entities or entity @types. (The @ sign that we keep using will start making sense elsewhere in this guide).

If you understand the concept above, you’ll be delighted to understand that “Semantic Triples are even simpler. We used the example: “Freddy Mercury (is a member of) Queen, (which is a) rock band“. This is actually more complicated than a “triple”. In fact, THREE triples are contained in that statement:

  1. Freddy Mercury (is a member of) Queen
  2. Queen, (is a) rock band
  3. Therefore we can DEDUCE a third triple that: “Freddy Mercury” (is in a) “Rock band

Triples make up the core or the knowledge graph. Although interestingly, we see here that incomplete data can create errors. It would be more correct to have said Freddie Mercury WAS in a rock band. Without the date that he died as another triple, the deduction is in fact false. This is because the INPUT data, saying Freddie IS a member of Queen was also false.

Now that Google understands Queen as an ENTITY, Google can then go much further, by enriching the traditional search results, because Google now knows the YouTube channel. So can easily show a few videos in the results, for example.

Notice in the knowledge box for Queen, one of the first entities listed is “queenonline.com”? The official website is in itself an entity related to Queen, the band. It is not surprising, then, that Google also lists that website at the first traditional organic result.

Vectors

Just from that knowledge box for Queen, you can probably see that Google is likely to see that John Deacon and Freddie Mercury are “semantically close” to each other. Similarly, Google thinks the Beatles and Pink Floyd are probably semantically close bands. This is an extremely important concept for SEOs to understand. If you want to write about Queen, the band, then you had better also write about John Deacon and Freddie Mercury. Talk about London in the 1970s and the amazing video techniques used in Bohemian Rhapsody. Of course, this will not help you rank anymore for the term “Queen, band” because that entity is already fully defined and you don’t own the official website (unless you do, in which case, can you please link to this article?). However, you CAN still generate organic traffic relating to Queen, the band. We describe this in the section “optimizing for the edge”.

That didn’t sound TOO scary, I hope. If you want to be a good SEO, though, you’ll need to know more about how machines can take text and convert text into hierarchies and numbers in a way that they can use to provide us, humans, with useful search results.

You can analyze any web page semantically, the way Google does for free at Inlinks.net.

PreviousSemantic Search Guide ContentsNext

Semantic SEO Guide

This Semantic SEO guide provides an in-depth foundational understanding of Semantic SEO. Web search engines and particularly Google increasingly focus on Semantic indexing and retrieval of content based on the concept of “entities” (or “things”) rather than the concept of “words” (or “strings”). On the upside, this is a much more efficient way of storing the world’s information. On the other hand, it occasionally loses colour and variety in results. This guide will give you a solid grounding on both the concepts behind Semantic search and SEO strategies you might adopt to leverage Semantic or Entity orientated search.

This guide now forms the basis of the book “Entity SEO” on Amazon (open in a new tab).

Semantic SEO helps to inform a Knowledge Graph

What is Semantic SEO?

There are a number of facets to semantic search and semantic SEO is the art of optimizing the underlying data. In one context, a semantic search is an information search that uses entities, rather than web page URLs as the primary record structure in an information retrieval system. Semantic search is efficient for machines due to structured formats and relatively specific vocabulary. SEOs are starting to wake up to the profound difference between an entity vs. keyword approach to content strategies. Google now has a large and sophisticated “Knowledge Graph” which helps it to understand the relationships between concepts.

The task, now, for your online marketing strategy is to understand how to optimize an organization’s online presence in such a way that the organization’s core competencies are expressed within that knowledge graph. This is “Semantic Search” or “Semantic SEO” and this series of posts is a comprehensive training guide that will help digital marketing experts incorporate this Artificial Intelligence aspect into search marketing.

Another way to describe Semantic search is that a search engine is trying to derive meaning from the content it crawls on a page, not simply by counting words, but also by considering the markup that makes up how the page is presented. A web search engine may still relate this content to meaning, but there are several mark-up attributes that a crawler can readily use to help with meaning, such as:

  • Header tags (H1 to H3) to identify important concepts
  • Bullet points to help group concepts
  • Tables to help organize data

More recently, JSON-LD has become a prevalent way to add structured data to content.

Gone are the days of keyword stuffing. Semantic SEO leverages deep learning concepts. The search algorithms known as Hummingbird and RankBrain are part of a set of tools that have changed the whole meaning of “ranking in the SERPs”. This Semantic search guide will provide an invaluable and updatable resource. We trust it will prove invaluable for SEOs.

How Semantic SEO improves the User Experience

Semantic SEO improves the user experience as it provides the users with new concepts that are closely related to the original query, by answering the search intent Google’s algorithm gradually learns that you have provided a great user experience and thus tends to rank your content high in the search results. This approach also allows your content to be seen in Google Discover and generally increases discoverability (in context) generally.

Semantic SEO Guide Contents

Jump into a section, or read like a book by clicking <Next> on the bottom of each page.

Section 1: Target Audience and Contributions

Who this guide is designed for.

Section 2: Google’s Knowledge Graph Explained

Understand what the Knowledge Graph is. Gain a grounded understanding of what Google’s Knowledge graph is, without needing a Ph.D. in Information Retrieval systems. Read More…

The first Search Engine was really just a list of websites organized into Topic Trees by humans. Now in a world of Knowledge graphs and Semantic Markup, are things really that different? Read More…

Section 4: Using Google’s Entity Search Tool

Google has an Entity lookup tool. If you cannot use APIs, they also built a web interface for you to use. A practical way to find out EXACTLY what Google considers an entity. Read More...

Section 5: Semantic SEO Strategies

How do you create a Semantic SEO Strategy? Knowing about the knowledge graph and entities in itself does not improve your online presence. You need to approach SEO differently from the keyword days. At a strategic level, there are a number of approaches to take. Read More…

Sub Section ( a ): Getting a Wikipedia Listing

Becoming an entity in Wikipedia is the most well-known way to become listed in Google’s Knowledge Graph and pretty well defines you as an entity. Wikipedia, however, is full of pitfalls for the unwary. We asked a bunch of experts how they believe you should approach getting a Wikipedia listing. Read More…

Sub Section ( b ): Expand the Knowledge Graph to get listed

If you are not directly able to have an entry in Wikipedia, you may be able to become an entity within the Knowledge graph through association with an existing entity in Wikipedia. Read More…

Sub Section ( c ): Other ways to become an Entity

Beyond Wikipedia, other databases exist that Google accesses to help build out its knowledge graph. Here are several ideas as to other approaches to becoming an Entity. Read More…

Sub Section ( d ): Align your online presence with your Niche

Businesses do not usually agree to the message that they want to portray and the niche they want to dominate succinctly enough. By defining your own brand well, you tie it to a semantically close entity. Nike is associated with shoes, for example. Read More…

Section 6: Creating Digital Assets

How do you go about creating Digital Assets beyond your web pages? Read More…

Section 7: How to add Structured Markup

Adding Structured Markup to your web pages is easily done with the inLinks tools, but it helps to also understand the underlying ideas. Getting the code correct is important because Google is known to penalize sites that misuse structured markup. Read More…

Semantic SEO Writing and the Flesh-Kincaid Algorithm

The readability of semantic SEO writing is based on the Flesh-Kincaid algorithm which helps you determine at what level a reader needs to be to understand your content. The Flesh-Kincaid algorithm takes into account several elements such as the length of your sentences and paragraphs, the use of headers, and the vocabulary level.

InLinks uses the Flesh-Kincaid scale of 1 to 100 to provide you with a readability score for your content; the higher this score, the easier it is to read.

More Resources

Here are our FREE Training Courses on Semantic SEO for Content Writers.

Here is the Inlinks Youtube Channel.

This guide now forms the basis of the book “Entity SEO” on Amazon.

PreviousSemantic Search Guide ContentsNext