SEO
15
min read

How To Use Google’s Indexing Api In Bulk With Python?

Learn how to use Google's Indexing API in bulk with Python. Discover how this application programming interface can be used to add or remove pages from Google's index, and how to get the most recent request status. This article will walk you through the steps needed to transfer your URLs to a text file, build a dictionary, set up a URL, and delete the URL, among other things. Get started with the Indexing API today and streamline your web page indexing process.
Written by
Guillaume
Published on
May 8, 2023
Updated on
November 21, 2023
Readers rating
69
votes

What does Indexing API mean?

An application programming interface called the indexing API enables website owners to inform Google when pages are added or removed. Google can now instantaneously index any webpage thanks to this. Short-lived content, including job listings and news pieces, are primarily used for it. 

The indexing API enables the following:

  • Update the index's URLs.
  • Delete the links from the index.
  • Get the most recent request status.
  • Deliver bulk requests to cut down on API calls.

The Indexing API can be used to instruct Google to add or remove pages from its index. The location of a web page must be specified in the requests. Furthermore, you may learn the status of the Google notifications you've sent. The Indexing API can only initially be employed to crawl sites that include a BroadcastEvent or a JobPosting embedded inside a VideoObject.

Requirements:

  • Python and Anaconda must be installed on Windows and Mac.
  • A Google Developer Console account is required
  • Setting up a billing account in the Google Developer Console's settings is necessary (Optional)

Libraries we’ll need:

To build this script in Python we will use Google Collab and we will also need the following libraries:

  • oauth2client
  • Google ApiClient
  • httplib2
  • JSON
  • Google Collab
  • OS

Use the command below to install these libraries on Google Collab:

!pip install oauth2client

Instead, you can execute the following command in Window's "Command Prompt" or in Mac’s "Terminal" to install the libraries:

pip install oauth2client

Making Use of Libraries:

The following codes must be used and call the required libraries after installation:

from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient. discovery import build
from googleapiclient.http import BatchHttpRequest
from google.colab import files

import httplib2
import json
import os

Getting the URLs Ready:

The next step is to transfer our URLs into a text file so that we can update Google on any changes, new releases, or deleted pages. Remember that the Indexing API has a daily link limit of 100 links or less. 

When using Google Collab, you can upload and call the matching text file for each of your URLs by using the following code:

uploaded_file = files.upload()

The next step is to build a dictionary and have the URLs ready for sending requests. The source procedure will allow us to achieve that:

list=[ ]

for filename in uploaded_file.keys():
	lines = uploaded_file[filename].splitlines()
  
for line in lines:
	list.append(line.decode('utf-8'))
requests ={}

for i in list :
	requests[i]="URL_UPDATED"
	print(requests)

It is important to note that the necessary dictionary will be created for updating or publishing new content in the code. Use the URL DELETED command in place of the URL UPDATED if you need to remove URLs.

Setup a URL: The actions listed below should be followed to inform Google that a new URL has been submitted or that content at an existing URL has been updated:

  1. “POST” a request to the specified endpoint: 
Send an HTTP POST request to https://indexing.googleapis.com/v3/urlNotifications:publish
  1. Use the following syntax to indicate the page's position in the request's body:
{
	"url": "content_location",
	"type": "URL_UPDATED"
}
  1. Google returns an HTTP 200 in response to successful Indexing API requests. A 200 response code indicates that Google may attempt to crawl this URL again shortly. A URL notification metadata object, whose fields match those returned by a notification status request, is contained in the response's body.
  2. If an HTTP 200 response isn't returned, see the Indexing API problems.
  3. If the information on the website changes, send another update notification, and Google should re-crawl the page.
  4. More quota than the default may be required. Visit Quota to view your current quota and to request more.

Delete the URL:

Notify Google so that we can remove the page from our index and avoid trying to crawl and index it again when you delete a page from your servers or add a meta name="robots" content="no index" /> tag in the head> section of a particular page. The URL must yield a 404 or 410 status code prior to removal requests, or the page must contain the meta element "meta name="robots" content="no index"/>.

The measures listed below should be followed to request removal from our index:

  1. Send a POST request to the specified endpoint.
Send an HTTP POST request to https://indexing.googleapis.com/v3/urlNotifications:publish
  1. Use the following syntax to specify the URL that you wish to remove in the body of the request:

{
	"url": "content_location",
	"type": "URL_DELETED"
}

Examples include:

{
	"url": "https://careers.google.com/jobs/google/technical-writer",
	"type": "URL_DELETED"
}
  1. Google returns an HTTP 200 in response to successful Indexing API requests. An HTTP 200 response indicates that Google may deindex this Page. A Url Notification Metadata object, whose fields match those returned by a notification status request, is contained in the response's body.
  2. If an HTTP 200 response isn't returned, see the Indexing API problems.
  3. More quota than the default may be required. Visit Quota to view your current quota and to request more.

Indexing API Setup and Activation

You must visit the Google Developer Console, click on "Choose a project," then choose "New project" to establish a brand-new project in order to create and activate the API.

click on "Choose a project"
then choose "New project"

After that, you must choose a name for your project and click "Create."

you must choose a name for your project and click "Create."

You must first create the project before selecting it from the menu's project section, IAM & ADMIN from the left menu, and then "Service Accounts."

create the project before selecting it from the menu's project section, IAM & ADMIN from the left menu, and then "Service Accounts."

Following this, click "Create service account" to continue with account creation. Choose a name for your account in the first area, then click "Create and proceed." After finishing, you can proceed to step two. Choose a role for your account in the "Give this service account access to project" section, making sure to select "Owner" from the Quick access menu before moving on to the "Basic" section. After that, simply click "Done" to complete the process without making any changes.

click "Create service account" to continue with account creation

Save the email address that is in the "Email" field on the newly opened page; we'll need it later. Choose "Actions," then select "Manage Keys."

How To Use Google’s Indexing Api In Bulk With Python

Click on "Create new key" in the "Add Key" area of the newly opened page, then build a JSON file, and save the downloaded file.

Click on "Create new key" in the "Add Key" area

It's time to turn on the Indexing API; to do this, click "Enable APIs & Services" under the "APIs & Services" section.

Search for "Indexing API" on the following page. After choosing it, press "Enable" to start using your API.

Paste the created Service Account email

You must set up an access account for the Indexing API in Google Search Console before you can use it. You must first access your Google Search Console, go to the "Settings" tab, click on "Users and permissions," and then select "Add user" to add a new user. Enter the email you saved previously and set its permission to "Owner" when a new page appears.

Simply return it to your service accounts and copy the email if you forgot to save it. By opening the JSON file you downloaded earlier and searching for "client email," you should be able to find your email address there as well.

Putting the JSON File Online

Use the following code to upload the JSON file to Google Colab:

json_key = files.upload()

The next step is to identify the location where the uploaded files are located. The OS library can be used to accomplish this. To verify that the file has been uploaded, you can use the following code, but you must include an "if" at the beginning:

if jsonkey:
  path_to_json = '/content'
  json_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.json')]
  path = "/content/" + json_files[0]

Granting Requests

As you are aware, Google has said that our software and script must utilize OAuth 2.0 to approve requests in order to use the Indexing API. The following link from the Indexing API should be used to obtain the data you need to use OAuth 2.0 and send requests. See the Authorize Requests page for further details on this subject.

SCOPES = [“https://www.googleapis.com/auth/indexing" ]

Asking For Requests

The following conditions must be met by the endpoint used to make queries, according to Google's explanation on the page using the Indexing API, and the requests must be sent using the post method:

If you only want to make one request:

ENDPOINT = "https://indexing.googleapis.com/v3/urlNotifications:publish"

To send many requests at once:

ENDPOINT = "https://indexing.googleapis.com/batch"

The queries are sent one at a time with this script, but there is a difference because we have previously created a dictionary with 100 URLs. Now that we've created the ServiceAccountCredentials variables, we need to construct the variables for "Authorize credentials." To do this, we use the oauth2client library.

credentials = ServiceAccountCredentials.from_json_keyfile_name(path,scopes=SCOPES)
http = credentials.authorize(httplib2.Http())

The following stage will be to create service tools, after which we will create a function. The final request is then handled by us. On the Class BatchHttpRequest page, you may get more details about how the scripts operate. The last codes in this section appear as follows:

from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build

# Function to create or load credentials
def get_credentials():
    creds = None
    # The token.json file contains previously obtained user credentials.
    # Make sure to create or rename it accordingly.
    if os.path.exists('token.json'):
        creds = Credentials.from_authorized_user_file('token.json')
    # If credentials are not valid or do not exist, prompt the user to authenticate.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            # Replace 'YOUR_CLIENT_SECRET_FILE.json' with the path to your 
            # client_secret.json file downloaded from the Google Developer Console.
            flow = InstalledAppFlow.from_client_secrets_file(
            	'YOUR_CLIENT_SECRET_FILE.json', 
            	['https://www.googleapis.com/auth/webmasters']
            )
            creds = flow.run_local_server(port=0)
        # Save credentials for future use.
        with open('token.json', 'w') as token:
            token.write(creds.to_json())
    return creds

# Function to index a list of URLs
def index_urls(credentials, site_url, urls):
    service = build('webmasters', 'v3', credentials=credentials)
    
    for url in urls:
        request_body = {
            'url': site_url + url,
            'type': 'URL_UPDATED'
        }
        try:
            request = service.urlNotifications().publish(
                body=request_body, siteUrl=site_url).execute()
            print('URL successfully indexed:', url)
        except Exception as e:
            print('Error indexing URL', url, ':', str(e))

if __name__ == '__main__':
    import os
    from google_auth_oauthlib.flow import InstalledAppFlow

    # Replace 'YOUR_SITE_URL' with your website URL (e.g., 'https://example.com')
    site_url = 'YOUR_SITE_URL'
    
    # Replace 'YOUR_CLIENT_SECRET_FILE' with the path to your
    # client_secret.json file downloaded from the Google Developer Console.
    index_urls(get_credentials(), site_url, ['url1', 'url2', 'url3'])
    
Code updated on November 21th 2023.

If everything was done correctly, requests will be printed after being sent.

Steps of Google’s Indexing API with Python
Steps of Google’s Indexing API with Python

Why using Google's indexing?

Google scans a page's text, photos, and video files before storing the information in its massive database, the Google Index. Providing search results Google only displays results for user's queries that are pertinent to their inquiry. With a string of characters or a list of items, the Python index method enables you to get the element's or item's position in the index. It generates the lowest possible index for the element specified in the list. If the requested item isn't present in the list, a ValueError is returned.

You are permitted to do the following with Google's indexing API:

  • Post updates for particular URLs.
  • Find out the most recent changes to a URL.
  • Run a batch of requests.

Why do we use Google’s Indexing API in bulk with Python?

Google's Indexing API is a tool that allows website owners to directly notify Google about updates or changes made to their website's content. This is done by sending a request to the API, which will then trigger Google's indexing process for that particular page.

Using the Indexing API in bulk with Python allows website owners to submit multiple URLs at once, which can save a lot of time and effort compared to submitting each URL individually. Python is a popular programming language for working with APIs and web scraping, making it a natural choice for automating the process of submitting URLs to the Indexing API.

Submitting URLs in bulk can be particularly useful for large websites or websites that frequently update their content, as it ensures that Google is aware of all changes promptly. It can also be helpful for websites that are newly launched or have undergone significant updates, as it can help to get the site indexed and ranked more quickly.

Overall, using Google's Indexing API in bulk with Python can streamline the process of keeping a website's content up-to-date and ensuring that it is properly indexed by Google.

Benefits of Using Google’s Indexing API in Bulk with Python

There are several benefits to using Google's Indexing API in bulk with Python:

  • Efficient indexing: Using the Indexing API in bulk allows for the efficient and timely indexing of a large number of URLs, which can be particularly beneficial for websites with a lot of content or those that update frequently.
  • Time-saving: Submitting URLs in bulk with Python can save a lot of time compared to submitting each URL individually, as it can automate the process and reduce manual effort.
  • Improved SEO: Ensuring that Google is aware of all updates to a website's content promptly can help improve search engine optimization (SEO), as it can lead to better indexing and ranking of the website's pages.
  • Automated monitoring: Automating the process of submitting URLs to the Indexing API can also enable website owners to monitor their website's indexing status more easily, allowing them to quickly identify and address any issues that may arise.
  • Flexibility: Python is a versatile programming language that can be used for a wide range of tasks, including working with APIs, web scraping, and data analysis. This makes it a flexible tool for working with the Indexing API and customizing the process to meet specific needs.
FlashSERP API indexing dashboard
Want to use something easier than Python?
Discover FlashSERP, the all-in-one API indexing tool. Sign up now and enjoy an exclusive 7-day free trial, giving you full access to our premium features and tools.⚡

Take a look at our blog posts

Interviews, tips, guides, industry best practices and news.
SEO Technical Checklist 2024
SEO
1
min read

SEO Technical Checklist 2024

In the dynamic world of web development, optimizing performance and search engine visibility...
Read post
Why Index Your Websites on Search Engines?
SEO
5
min read

Why Index Your Websites on Search Engines?

Explore the vital importance of indexing websites in search engines with our comprehensive guide
Read post
Google Indexing : Learn Everything About It
SEO
3
min read

Google Indexing : Learn Everything About It

Dive into the world of Google Indexing and understand how Google's search index operates.
Read post