Methods to get probably the most out of the Google Search Console API utilizing regex

Google Search Console is an incredible instrument that gives invaluable search knowledge by actual customers immediately from Google. Whereas the charts and tables are pleasant to work with, a big a part of the info will not be accessible from the UI. 

The one method to get to this hidden knowledge is to make use of the API and extract all that invaluable search knowledge that’s obtainable to you – if you understand how. That is potential with common expressions.

Right here’s how one can maximize the Google Search Console API utilizing common expressions, in response to Eric Wu, VP of Product Development at Honey, a PayPal Firm, who spoke at SMX Superior.

Diagnosing search engine optimization points with GSC

Engaged on a web site experiencing stagnant or declining development or a core replace drop?

Most search engine optimization professionals flip to Google Search Console (GSC) to diagnose such points.

(Or if sources allow, you might even use a paid instrument like Ryte or construct your personal platform.)

Thankfully for the search engine optimization neighborhood, there’s no scarcity of Looker Studio dashboards (previously Google Information Studio) helpful for GSC evaluation, together with:

Dashboards enable SEOs to have a look at an outline of various traits versus utilizing GSC and doing a number of clicks to get to the info you want.

However for those who’re analyzing enterprise websites, you possibly can run into some roadblocks.

  • Looker Studio and Google Sheets each load slowly, particularly while you’re coping with massive websites. 
  • GSC’s interface has a 1,000-row export restrict.
  • GSC has an enormous sampling downside. Enterprise search engine optimization groups miss 90% of their GSC key phrases, in response to And if you understand how to extract the info, you possibly can truly get 14x the key phrases. 

Overcoming GSC’s sampling downside

Explorer for Search is one other instrument that you should use for GSC evaluation. From Noah Learner and the workforce at Two Octobers, it’s constructed with knowledge pipelines utilizing GSC’s API which then outputs knowledge to BigQuery (mainly bypassing Google Sheets and downloading CSV information), after which visualizes info with Information Studio.

With this, you possibly can trust that you just’re attending to nearly all the info. 

There’s nonetheless a caveat as a result of GSC’s sampling downside, particularly for big, ecommerce websites with numerous completely different classes. GSC received’t essentially present all the info that’s coming in from these directories.

After conducting varied checks to get probably the most knowledge out of the GSC API, the workforce found a method to shut the GSC sampling hole.

They discovered that by including extra subdirectories as completely different profiles inside your GSC dashboard, you possibly can extract much more knowledge as Google offers you extra info at that decrease stage. 


For instance, for those who’re and also you add “televisions” as a subdirectory in your GSC profile, Google gives you solely the key phrases and the clicking info for that subdirectory and down.

And by including quite a lot of these completely different subdirectories, you possibly can extract much more info.

That solves the sampling downside, however you will get much more knowledge by utilizing common expressions.

Getting extra GSC knowledge with common expressions

Common expression, or regex, is a robust instrument to grasp your knowledge. 

In April 2021, Google added regex assist to GSC – giving SEOs extra methods to slice and cube natural search knowledge. 

A number of occasions, knowledge will not be helpful until you possibly can understand it. And regex helps to extract actionable insights from GSC’s wealthy knowledge.

However as highly effective as it might be, regex could be tough to study. 

The most effective place to grasp and dive deep into common expressions is Google’s official documentation on GitHub. (Google makes use of RE2 in its merchandise, which is a taste of standard expression.)

Whereas regex is offered in all types of various programming languages, you’ll discover it nearly in every single place even to those that are modifying .htaccess information.

Within the subsequent few sections are use instances for leveraging regex for GSC. 

Regex informational queries

When precise informational search queries in GSC, you usually wish to perceive:

  • How are individuals truly coming to your web site?
  • What questions are they extracting?

Taking a look at these issues from a one-off standpoint, inside GSC could be tough. 

You’re at all times looking for the phrases “what,” “how,” “why” after which “when.”

There are a few methods to make extracting informational queries much less tedious with regex.

Daniel Okay. Cheung shared a regex string that can present you all queries containing “what,” “how,” “why” and “when” that both received a click on or an impression:

And this regex string shared by Steve Toth takes the earlier instance up a notch:

  • ^(who|what|the place|when|why|how)[" "]

You should use this string if you wish to seize question-based queries that begin with both “who,” “what,” “the place,” “when,” “why” and “how” after which adopted by an area. 

It is a nice record to make use of while you’re on the lookout for any sort of phrase that may begin a query:

  • are, can, can’t, may, couldn’t, did, didn’t, do, does, doesn’t, how, if, is, isn’t, ought to, shouldn’t, was, wasn’t, have been, weren’t, what, when, the place, who, whom, whose, why, will, received’t, would, wouldn’t

Placing all this into regex kind would look one thing like this: 

  • ^(are|can|cannot|may|could not|did|did not|do|does|does not|how|if|is|is not|ought to|should not|was|wasn't|have been|weren't|what|when|the place|who|whom|whose|why|will|will not|would|would not)s

On this 178-character string:

  • You’ve gotten the caret (^) which tells you the question wants to start with this phrase:
  • The phrases are separated with pipes (|) as a substitute of commas. 
  • All of the phrases are wrapped in parentheses. 
  • There’s a backslash and the “s” (s) which denotes an area after the phrase. 

That is good, however may also get tedious to do.  

Beneath, Wu simplified the earlier record of phrases to be extra regex-friendly and shorter which is good for copying and pasting. Sustaining it this manner additionally helps with effectivity. 


Within the first column are the traditional phrases and within the second column, the compressed regex. 

As an example, the phrase “can” makes use of the compressed model can(‘t)?.

What the query mark signifies is that something throughout the parentheses is non-obligatory. The compressed syntax permits you to cowl each the phrase “can” and “can’t.” 

Extra apparently, you are able to do this with may/couldn’t, ought to/shouldn’t, and would/wouldn’t the place the -ould a part of the phrases is the widespread base, like (c|sh|w)ould(n’t)?. This brief string covers all six of these instances.

Whereas simplifying that lengthy record of phrases turned the string much less readable, what’s nice is that it suits extra into the regex discipline and permits you to copy-paste simpler.

  • ^(are|can('t)?|(c|sh|w)ould(n't)?|did(n't)?|do(es)?(n't)?|how|if|is(n't)?|was(n't)?|have been(n't)?|wh(at|en|ere|y)who(m|se)?|will|will not)s

When you go a step additional, you possibly can compress it much more. On this case, Wu diminished the character rely from 135 to 113 characters. 

  • ^(are|can('t)?|how|if|wh(at|en|ere|y)|who(m|se)?|will|will not|((c|sh|w)ould|did(n't)?|do(es)?|was|is|have been)(n't)?)s

Common expressions can get actually sophisticated. When you’re getting a regex string from another person and wish to disambiguate what’s doing what, you should use Regexper that can assist you visualize it. 

Beneath you’ll see a comparability of the completely different regex string variations. It’s simpler to keep up the primary one, and clearly tougher to keep up and browse the final one. 

However generally character rely actually will matter particularly when you’ve got longer common expressions.

Regex filter limits for GSC is 4,096 characters, in response to Google Search Advocate Daniel Waisberg. 

That would appear fairly a bit. Nonetheless, you probably have an ecommerce web site and have so as to add domains, subdomains or longer directories, you’ll most certainly hit that restrict.

Regex branded queries

One other occasion the place you might begin hitting the regex character restrict in GSC is while you use it for branded queries.

When you concentrate on all of the several types of misspellings of a model title that an individual may sort, you’ll rapidly run into that 4,096 character rely. As an example:

  • aamaung, damsung, mamsang, sam sung, samaung, samdung, samesung, sameung, samgsung, samgung, samsang, samsaung, samsgu, samshgg, samshng, samsing, samsnug, samssung, samsu, samsuag, samsubg, samsubng, samsug, samsumg, samsumng, samsun g, samsunb, samsund, samsund, samsunh, samsunt …

That is the place understanding regex helps. With this string, you possibly can seize the model title “samsung” together with misspellings:

  • (s+|a|d|z)[a-zs]1,4m?[a-zs]1,6(m|u|n|g|t|h|b|v)

A number of occasions, individuals will misspell the center elements of the phrase. However generally, they get the format and size proper and you may strategy your syntax this manner.

For model question misspellings, think about the next:

  • Essential letters that make up the model question.
  • Consonants.
  • Letters surrounding onerous consonants.

In pink are the onerous consonants that folks usually don’t miss once they’re typing in a model title. These are the primary letters that make up that individual model. For “samsung”, the “s” to start with, the ”m” within the center, after which “n” and “g” on the finish.

The blue letters surrounding these major consonants on the keyboard are those individuals usually mistype. Within the instance, round “s”, you see the “a”, “d” and “z”. (Whereas the format is completely different for worldwide keyboards, the idea remains to be the identical.) 

The regex string above captures all of the potential variants of “samsung.”

The opposite main trick right here is in [a-zs]1,4.

In regex kind, this mainly says, “I wish to match any letter “a” to “z”, or an area, one to 4 occasions.” 

This captures all these bizarre misspellings that may occur in the course of a model question – the place an individual can doubtlessly hit the identical key a number of occasions or by chance press house.

Moreover, the model title is a sure size (“samsung” has seven characters). Folks possible received’t find yourself typing 20–50 characters. 

So on this common expression, we’re guessing that between “s” and “m” in “samsung,” somebody’s going to mistype 1–4 characters. After which from “m” to “g” on the finish, they’ll mistype 1–6 characters, with areas included. 

Including all this lets you seize the numerous variations of a branded question comprehensively.

The opposite factor to notice is that the model title may seem in several elements of the question.

So we have to guarantee that the model title itself, is captured. It ought to both be:

  • Firstly of the question.
  • In the course of the question (thus surrounded by areas).
  • Or on the finish of the question.

The common expression for that is as follows:

  • (^|s)(s+|a|d|z)[a-zs]1,4m?[a-zs]1,6(m|u|n|g|t|h|b|v)(s|$)

This captures all queries the place the model title “samsung” is both at the beginning, center or finish.

  • Begin of string = ^ 
  • Surrounded by areas = s
  • Finish of string = $

JC Chouinard’s put up, Common Expressions (RegEx) in Google Search Console, dives even deeper into regex examples. 

Regex and the GSC API in motion

Common expressions got here in helpful for Wu and his workforce once they labored with a shopper that encountered visitors drops following a core replace.

After wanting on the ecommerce web site’s completely different points, they found that the issue resided in some product element pages. 

They wanted to phase pagetypes for evaluation in GSC. However this was a posh process due to the completely different URL constructions for U.S. and worldwide merchandise.  

The location’s worldwide product URLs included language and nation codes, whereas U.S. product URLs didn’t. 

Even utilizing regex syntax was difficult as a result of letters and dashes exist within the product slug, classes and subcategories. Moreover, they wanted to filter out the worldwide product URLs to seize solely U.S. pages.

To get all U.S. product touchdown + element pages (not i18n pages), they got here up with the next regex strings:

Embrace: /([^/]+/)1,2p? 

Exclude: /[a-zA-Z]2|[a-zA-Z]2-[a-zA-Z]2/ 

Right here’s a breakdown:

The workforce needed to match the class, the subcategory and all of the merchandise so that they included:

  • Any character that’s not a slash = [^/]+
  • 1 or 2 directories = /)1,2
  • Typically adopted by a product slug = p?

A caret (^) usually means the beginning of the string. However when it’s inside brackets (as in [^/]), it signifies a negation (i.e., “not something inside this field”). 

So this string /([^/]+/)1,2p? means “I need any variety of characters that isn’t a slash, main as much as a slash (which denotes the listing), and generally adopted by the letter ‘p’ (the prefix for product slugs).” 

On the identical time, the workforce didn’t wish to match the nation and language mixture which additionally contained letters and dashes, so that they excluded:

  • Any 2 letter listing = [a-zA-Z]2
  • 2 letter + 2 letter lang-country combo = [a-zA-Z]2-[a-zA-Z]2

Creating a daily expression to match all of the language and nation codes on their very own could be tedious due to all of the potential combos, so that they have been unable to strategy this the best way did for informational queries (the place each single sort of mixture was excluded). 

However even after creating these regex strings, that they had an issue. 

In Google Search Console, there’s just one discipline to stick a regex string. You’ll have to decide on both Matches regex or Doesn’t match regex – you possibly can’t use each on the identical time.  

That is the place the GSC API got here in helpful because it permits becoming a member of regex strings.

Within the Google Search Console API documentation, there’s a Strive it now hyperlink.  

As soon as clicked, it’s going to open up a console that permits you to choose a web site and make your API request by means of the net view.

However to raised handle API queries, Wu recommends utilizing Postman on the desktop or Paw (which is native to Mac).

Postman permits you to create queries and save them for later. And you probably have entry to different websites, you don’t must create a brand new question every time. You simply merely change out the location title with a variable after which make a number of requests.

Paw, alternatively, is far simpler to look by means of and make the most of.

To entry the API, you’ll must get your API keys. (Right here’s a useful tutorial from Chouinard.) 

When you get this information, you’ll have your shopper ID and shopper secrets and techniques, which you’ll add to your OAuth 2.0 authentication inside both Postman or Paw.

From there, you’ll be capable to sign up together with your regular account.  

Wu primarily made GSC API requests utilizing the regex strings in Paw. The question is entered in the course of the interface.

The response from Google is much like that of the GSC API net view. The information can then be exported for processing.

Because the knowledge is in JSON, the knowledge could be messy and onerous to learn. 

For this, you should use a free and open-source command-line JSON processor known as JQ to pretty-print the knowledge.

The information will not be that helpful till you get it right into a spreadsheet. Pipe within the file you’ve exported from Paw to JQ. Open it after which iterate over every row – saving every component so you possibly can output them to a CSV.

Right here, you’ll must convert clicks and impressions that are floats (a quantity that has a decimal place). Each have to be transformed into strings suitable with a CSV. 

JQ will then output the next much-simpler format. 

Subsequent, you’ll use Dasel to take this format after which make it right into a CSV. 

And right here’s the top consequence. 

What’s superb for Wu’s workforce is that they have been in a position to make use of the Google Search Console API and common expressions to:

  • Filter out all of the worldwide queries and take a look at simply the U.S. the place they have been having the primary points.
  • Establish the times the location was having points. 

Watch: Getting probably the most out of the Google Search Console API

Beneath is the whole video of Wu’s SMX Superior presentation.

New on Search Engine Land

About The Creator

Angel Niñofranco

Angel Niñofranco is Senior Content material Editor at Third Door Media, specializing in modifying content material from Search Engine Land and MarTech’s rosters of subject-matter consultants. She has over 5 years of mixed editorial and advertising and marketing expertise within the digital publishing business, specializing in content material modifying, copywriting and electronic mail advertising and marketing. Previous to becoming a member of Third Door Media, Angel labored with the editorial and advertising and marketing groups of Search Engine Journal in varied roles, most notably as challenge editor and electronic mail advertising and marketing supervisor.

What's your reaction?

Leave A Reply

Your email address will not be published. Required fields are marked *