October 31, 2023 (Updated: January 19, 2024)
Most folks who have been around SEO in the last few years have likely heard the terms “knowledge graph” and “entity SEO” batted around a fair bit.
But what are entities? What is the knowledge graph? And most importantly, what ways can you leverage them to improve a website’s performance in organic search?
These are some of the questions I have tried to answer in this article, with a specific focus on optimizing your website for branded searches and possibly improving Google’s approximation of a brand’s “E-E-A-T”. If you’re not familiar with that term, we will discuss it more in the article, but it stands for “Expertise, Experience, Authority, and Trust”.
There has been a lot of discussion recently about the introduction of AI systems within search, especially with the advent of SGE (Search Generative Experience) that updates Google’s SERP results with AI generated answers. However, this is part of a much longer shift that Google has been undertaking to move from being a lexical search engine, to a semantic first one.
You might be asking, “What is the difference between lexical and semantic search?”
Traditionally Google has relied mostly on signals such as links, alt attributes, page speed, and most notably keywords on the page, to understand how relevant a piece of content is to answer a specific query.
This is broadly referred to as lexical search. Essentially, search that doesn’t seek to understand the meaning of the query. This is the type of search that has been dominant for most of the history of search. It is “easy” to game, and the results it produces often are lacking.
Since the early days, Google has been striving to deliver more accurate and relevant results based on for what they users looking for. A big part of this involves being able to actually understand what the searcher is asking. As Google puts it, understanding “things, not strings.”
A knowledge graph is a knowledge representation model that displays the relationship between different data points or entities, via vectors and relationships.
But what does that actually mean? What even is an entity?
Google defines an entity as being:
“A thing or concept that is singular, unique, well-defined and distinguishable”,
In theory, absolutely everything in the known universe is an entity. Whether it be a place, a city, a person, or a concept. For example – your site, your company, your authors, your products, your reviews, and your mentions – are all entities related to the thing you are trying to rank, i.e. your content. (Which is also an entity!)
Google’s knowledge graph is a collection of billions of data points about all those different entities and how they relate to each other.
When, Amit Singhal introduced the Knowledge Graph in 2012 he explained that Google had been:
“Working on an intelligent model — in geek-speak, a “graph” — that understands real-world entities and their relationships to one another: things, not strings.”
So in short, Google has been building a massive database that is able to identify entities and then understand how they relate to each other.
Related reading: Demystifying Authorship and Authority
Knowing how Google builds the knowledge graph helps us to better understand what they can do with it and how we can utilize that for search.
First, they try to determine what the different entities on the web are. This can be done in a few ways, but primarily Google has the option to either parse through richly structured data like schema, or it can try to recognize entities within unstructured data such as web pages or other documents.
They then assess how entities may be related to each other, and how confident they are in that relationship or vector. This is how semantic systems are able to get an approximate “meaning” from text. And the stronger and clearer the connections they find, the more confidence the system can have in that information or meaning being factual.
All of this may sound familiar to those who have been looking into how LLMs and some other machine learning systems work, but Google has been doing this for years — well before the world met Chat GPT!
The most common way searchers will see this information used is when it is pulled as content from SERP features such as knowledge panels and answer boxes:
Everything highlighted above is drawn from the Knowledge Graph. For all the important entities related to your site and content, having a clear entry in the Knowledge Graph can be adventitious. For example, once in the knowledge graph, Google can better understand:
This is direct knowledge that Google can keep about your site. Where you are based, what you specialize in, what you sell, who works for you, and how long you have been in business. This can manifest itself within your branded search (such as the presence of a knowledge panel) or it could potentially be used to determine topical authority.
Savvy readers have probably noticed that a lot of this information crosses over with things typically associated with the conceptual framework Google refers to for assessing site quality. E-E-A-T (Expertise, Experience, Authority, Trust).
For example, the more confidence Google has that a mention of a company name in a niche’s most authoritative publication is about a specific brand, and that they are confident about what that brand’s website is, the more confidence it may have in serving that site for queries related to that niche – or even the topic of the article they are noted in.
If at some point along that chain Google is less confident in the connections between those entities (the mention, the brand, and the website), then the signal is weaker. If your competitor has as many signals, and they are clearer all the way along the chain – Google may have more confidence in serving their site.
This last part crosses over with older algorithms such as Google RankBrain, or more recently BERT and MUM. Dixon Jones has an excellent breakdown on how RankBrain and the Knowledge Graph likely work which you can find here.
Essentially, we know that Google is getting better at recognizing entities within queries and what entity the user is actually referring to in their query (like semantic search!).
We don’t know specifically how Google is approaching these tasks – but a lot of semantic search and machine learning driven models require a knowledge graph/base to help determine topical relevance.
So, while it is not possible to directly optimize for RankBrain and the other machine learning algorithms that may use the Knowledge Graph(s), it is possible to ensure that the clearest possible picture of your brand, website, and all related entities are included in the Knowledge Graph where possible.
Traditionally, it seemed that Google gathered most of this information from sources like Wikipedia, Wikimedia, freebase, etc. but as Gary Illyes stated in a Reddit AMA, they may also use unstructured data such as unlinked mentions:
Most interestingly, by around 2019, Google appeared to have started to expand the number of places it is drawing this information from, and the number of clearly recognized entities has been increasing ever since. The first major update to the Knowledge Ggraph noticed by SEOs was Budapest, and since then they have become much more frequent. The most recent update dubbed “Killer Whale” saw over 7 million data points added! Jason Barnard’s company, Kalicube, tracks these.
We have also seen Google draw information for knowledge panels from lesser-known websites. Here is an interesting example from Bill Hartzer from a few years ago. This kind of thing is has gotten more and more common in recent years. In May 2020 Danny Sullivan posted an updated blog post regarding how the Knowledge Graph and knowledge panels work. A notable change was an acknowledgment that Google uses multiple sources besides Wikipedia for the Knowledge Graph:
Ultimately, this means there are loads more opportunities for brands and websites to take proactive steps to get themselves into the knowledge graph with a high degree of confidence.
Related reading: Expanding Authority Past Your Website
One of the most direct and known benefits of being recognized with a lot of confidence within the Knowledge Graph is the search real estate it affords on branded searches.
Branded search is a very often overlooked aspect of SEO, especially for enterprise level sites.
When you think about it, users who are Googling your name directly are very likely to be existing readers. As such, you want to provide them with as many indicators that you are who they want to get their information from!
For example, if a user finds your company’s content through search, social, or somewhere else and they want to find out who is responsible for the content, they will likely search for your brand name.
This is the same even if they come across you at an IRL event or through more traditional marketing methods.
If they come to a branded SERP that contains very little to confirm you are a good source of information, this may lower their trust in your content, and they may not return to convert. On the other hand, if you have a clear knowledge panel and a well-maintained branded search presence this can really help with conversions and improve consumer confidence in the brand.
Get the highlights:
On top of this, you want to ensure that there is no negative content about the site on the branded search. It is not helpful at all for someone to Google your brand name and then see multiple articles or sites calling into question your integrity!
Equally so, if you are doing PR outreach to help acquire authoritative mentions, someone considering whether to link to your site will do the same and a branded SERP filled with negativity could impact whether they link to you or another site in their content.
Lastly, searches for YMYL (Your Money or Your Life) topics often have a “more about this site” feature now. For example:
The blue link takes us to a page like this, which is all populated with content from the knowledge graph:
For all these reasons, and more, at Blue Orchid Digital, we refer to the SERP that appears for a brand name search as a “Digital Business Card.”
E-E-A-T is one of the most hotly debated and most misunderstood aspects of Google search.
It’s a framework that Google crafted that states what they believe constitutes a high quality search result, and in particular one that is appropriate for YMYL queries. That’s it. Google believes that a high quality result is one that comes from:
Google is not assessing these things like a human would – they are building systems that deliver search results that (when tested) have these qualities as much of the time as possible. They have a number of methods to do this. Most notably links, machine learning systems, and possibly the knowledge graph.
Why do I say possibly? Well, because it seems very probable that they do considering the type of information they have in the KG (Knowledge Graph).
This is why at Blue Orchid we always say that improving your knowledge graph presence should be focused on improving your branded search presence, and it is a happy bonus if you get a positive signal for organic search from it as well.
Becoming a known entity in the Knowledge Graph, and having that entity related to other entities (such as subject matters) boils down to three major signals or steps.
Below I am going to run through each step. If this is all a little overwhelming, or you don’t have time to do it yourself, Blue Orchid Digital offers an affordable program to do this for you.
This guide is geared towards building a brands entity recognition, but you can follow the same steps with most other entities such as your sites authors.
We recommend gathering as much information as you can about the entity (usually a brand) you want to establish.
We want information such as when the company was founded, where it is based, what service it provides, what it specializes in, awards it may have won, places you have been mentioned, etc.
You then want to put that all in one place that Google considers the ultimate source of information about the entity.
I always recommend using a brand “About” page, since this is a page you have complete control over. Google will often select a Wikipedia page as the entity home, but the downside of this is that you can’t control the information on there.
If Google has already selected a page as the entity home (which you will see as the source in an existing knowledge panel) it can be hard to change but is doable with time and patience.
This relates to all the off-site mentions and references about the brand. You want to ensure that this is as consistent across the board as possible.
As noted, there are other places you should consider placing consistent off-site information about your brand.
Kalicube.pro’s current count is of over 63k different sources that are being used to pull information for the Knowledge Graph (as demonstrated in brands’ knowledge panels).
Some do appear to be more important than others, but again, the key is consistency across the board.
Go through the kalicube.pro list and focus on getting citations and mentions specifically on these sites, ensuring that the information is as consistent with that on your entity home as possible. Use the same main description if you can – and at the very least, have a link back to the entity home.
The final step is to ensure that this is all consistently signaled to Google through detailed and relevant Schema.org Structured Markup.
Related reading: How To Create Your Authority-Building Machine
Schema.org is a non-profit organization that provides a standardized HTML vocabulary for “structured data markup”. This is used by most major search engines including Google, Microsoft (Bing!), Pinterest, and Yandex. SEOs will use the terms “Schema,” “Structured Markup,” or “Structured Data” to refer to this vocabulary set.
Structured Markup serves a few functions on the internet. One is to improve the accessibility of websites for people with impairments that prevent them from having a traditional in-browser experience of the web. Another is to provide web-crawlers with detailed information about the site or page they are visiting.
Schema.org Structure Markup is one of the clearest signals you can send to a search engine, like Google, regarding what the page is about. The reason most webmasters use this is to markup for “rich results” in the SERPs, such as review snippets, FAQ snippets, and recipe cards.
However, it is also incredibly useful to mark up your site to tell search engines information about your site/brand/authors (as entities); how that relates to what the content is about; and how that is connected to other entities.
The late Bill Slawski put it well when he observed this all the way back in 2016:
“One way to think of schema is as a way of describing and organizing information about entities in a machine-readable way.” Bill Slawski, April 4th, 2016
And in 2017, Gary Illyes spoke at PubCon regarding the importance of structured markup for Google to understand a site:
“Structured data, This is one of those things that I want you to pay lots of attention to this year… Add structured data to your pages because during indexing, we will be able to better understand what your site is about”– Gary Illyes, November 10, 2017
You can view the structured data on your site using the Structured Data Testing Tool.
Google prefers if you implement structured data using JSON-LD, but you can also use Microdata and RDFa. Google also provided detailed information on what to do and what not to do with schema implementation.
In this final section, we are going to go over the best practices for implementing structured data for entity recognition to help Google better “connect the dots” between on-site and off-site information about your brand.
It is important to ensure your schema is organized properly. The best practice for this is to use a “nested schema” approach. Alexis Sanders has a good resource about how to do this with JSON-LD.
Lily Ray provides a good example of how this looks. This is from the homepage of her personal site:
You can see how each piece of information is nested clearly under each appropriate category in order or relevance to the page.
You should be able to read the information straight from the structured data testing tool in a way that tells you the most about the site. This helps to organize the relationships between the entities on the page.
The type of schema you implement will depend on the type of entity you are trying to establish in the knowledge graph. Here is an example of the type of markup you could use for an organization.
Some of these may appear to be over the top, such as numberOfEmployees, but this kind of information is listed clearly on sites such as Glassdoor, LinkedIn, Bloomberg, and others, which Google uses to populate the knowledge graph.
So, having this in your schema is one extra little signal to help corroborate information about the site to Google. If you are a member of any professional organizations where this content is also listed (which would be another corroborating citation), then you could include memberOf schema also.
One of the most important mark ups is the sameAs markup. While most sites only add in social media information, you can actually include anything that “unambiguously indicates the item [or entity’s] identity.”
This would extend past a LinkedIn profile to Wikipedia pages, or any of the sources referred to in the corroboration step. This can also help to disambiguate entities that are named the same as another entity in the Knowledge Graph.
I hope that this breakdown has helped you to better understand the knowledge graph, how it works, what it may be used for, and how it can be utilized as part of your SEO strategy.
If you follow all of the steps laid out, you have a much higher chance of being successful in getting your brand into the knowledge graph. In general we have found this takes a minimum of 12 months.
You can research entity optimization and try your best to follow best practices or find a service entity and branded search management service to help you through the process.