Thursday, September 28, 2023
HomeWeb DevelopmentFetching Knowledge from an HTTP API with Python

Fetching Knowledge from an HTTP API with Python

On this fast tip, excerpted from Helpful Python, Stuart reveals you the way simple it’s to make use of an HTTP API from Python utilizing a few third-party modules.

More often than not when working with third-party information we’ll be accessing an HTTP API. That’s, we’ll be making an HTTP name to an internet web page designed to be learn by machines slightly than by folks. API information is generally in a machine-readable format—often both JSON or XML. (If we come throughout information in one other format, we are able to use the strategies described elsewhere on this e-book to transform it to JSON, in fact!) Let’s take a look at how one can use an HTTP API from Python.

The final ideas of utilizing an HTTP API are easy:

  1. Make an HTTP name to the URLs for the API, presumably together with some authentication info (reminiscent of an API key) to indicate that we’re licensed.
  2. Get again the info.
  3. Do one thing helpful with it.

Python offers sufficient performance in its customary library to do all this with none extra modules, however it’ll make our life quite a bit simpler if we decide up a few third-party modules to easy over the method. The primary is the requests module. That is an HTTP library for Python that makes fetching HTTP information extra nice than Python’s built-in urllib.request, and it may be put in with python -m pip set up requests.

To indicate how simple it’s to make use of, we’ll use Pixabay’s API (documented right here). Pixabay is a inventory picture web site the place the photographs are all out there for reuse, which makes it a really useful vacation spot. What we’ll deal with right here is fruit. We’ll use the fruit footage we collect afterward, when manipulating recordsdata, however for now we simply wish to discover footage of fruit, as a result of it’s tasty and good for us.

To begin, we’ll take a fast take a look at what footage can be found from Pixabay. We’ll seize 100 photos, shortly look by means of them, and select those we would like. For this, we’ll want a Pixabay API key, so we have to create an account after which seize the important thing proven within the API documentation underneath “Search Photos”.

The requests Module

The essential model of constructing an HTTP request to an API with the requests module entails setting up an HTTP URL, requesting it, after which studying the response. Right here, that response is in JSON format. The requests module makes every of those steps simple. The API parameters are a Python dictionary, a get() operate makes the decision, and if the API returns JSON, requests makes that out there as .json on the response. So a easy name will appear like this:

import requests

PIXABAY_API_KEY = "11111111-7777777777777777777777777"

base_url = ""
base_params = {
    "key": PIXABAY_API_KEY,
    "q": "fruit",
    "image_type": "picture",
    "class": "meals",
    "safesearch": "true"

response = requests.get(base_url, params=base_params)
outcomes = response.json()

This may return a Python object, because the API documentation suggests, and we are able to take a look at its components:

>>> print(len(outcomes["hits"]))
>>> print(outcomes["hits"][0])
{'id': 2277, 'pageURL': '', 'sort': 'picture', 'tags': 'berries, fruits, meals', 'previewURL': '', 'previewWidth': 150, 'previewHeight': 99, 'webformatURL': '', 'webformatWidth': 640, 'webformatHeight': 426, 'largeImageURL': '', 'imageWidth': 4752, 'imageHeight': 3168, 'imageSize': 2113812, 'views': 866775, 'downloads': 445664, 'collections': 1688, 'likes': 1795, 'feedback': 366, 'user_id': 14, 'person': 'PublicDomainPictures', 'userImageURL': ''}

The API returns 20 hits per web page, and we’d like 100 outcomes. To do that, we add a web page parameter to our checklist of params. Nevertheless, we don’t wish to alter our base_params each time, so the way in which to method that is to create a loop after which make a copy of the base_params for every request. The built-in copy module does precisely this, so we are able to name the API 5 occasions in a loop:

for web page in vary(1, 6):
    this_params = copy.copy(base_params)
    this_params["page"] = web page
    response = requests.get(base_url, params=params)

This may make 5 separate requests to the API, one with web page=1, the following with web page=2, and so forth, getting totally different units of picture outcomes with every name. This can be a handy method to stroll by means of a big set of API outcomes. Most APIs implement pagination, the place a single name to the API solely returns a restricted set of outcomes. We then ask for extra pages of outcomes—very like trying by means of question outcomes from a search engine.

Since we would like 100 outcomes, we may merely resolve that that is 5 calls of 20 outcomes every, however it will be extra sturdy to maintain requesting pages till we have now the hundred outcomes we want after which cease. This protects the calls in case Pixabay adjustments the default variety of outcomes to fifteen or related. It additionally lets us deal with the scenario the place there aren’t 100 photos for our search phrases. So we have now a whereas loop and increment the web page quantity each time, after which, if we’ve reached 100 photos, or if there aren’t any photos to retrieve, we get away of the loop:

photos = []
web page = 1
whereas len(photos) < 100:
    this_params = copy.copy(base_params)
    this_params["page"] = web page
    response = requests.get(base_url, params=this_params)
    if not response.json()["hits"]: break
    for end in response.json()["hits"]:
            "pageURL": end result["pageURL"],
            "thumbnail": end result["previewURL"],
            "tags": end result["tags"],
    web page += 1

This manner, once we end, we’ll have 100 photos, or we’ll have all the photographs if there are fewer than 100, saved within the photos array. We will then go on to do one thing helpful with them. However earlier than we try this, let’s speak about caching.

Caching HTTP Requests

It’s a good suggestion to keep away from making the identical request to an HTTP API greater than as soon as. Many APIs have utilization limits as a way to keep away from them being overtaxed by requesters, and a request takes effort and time on their half and on ours. We should always attempt to not make wasteful requests that we’ve carried out earlier than. Luckily, there’s a helpful manner to do that when utilizing Python’s requests module: set up requests-cache with python -m pip set up requests-cache. This may seamlessly report any HTTP calls we make and save the outcomes. Then, later, if we make the identical name once more, we’ll get again the domestically saved end result with out going to the API for it in any respect. This protects each time and bandwidth. To make use of requests_cache, import it and create a CachedSession, after which as a substitute of requests.get use session.get to fetch URLs, and we’ll get the advantage of caching with no further effort:

import requests_cache
session = requests_cache.CachedSession('fruit_cache')
response = session.get(base_url, params=this_params)

Making Some Output

To see the outcomes of our question, we have to show the photographs someplace. A handy manner to do that is to create a easy HTML web page that reveals every of the photographs. Pixabay offers a small thumbnail of every picture, which it calls previewURL within the API response, so we may put collectively an HTML web page that reveals all of those thumbnails and hyperlinks them to the primary Pixabay web page—from which we may select to obtain the photographs we would like and credit score the photographer. So every picture within the web page would possibly appear like this:

    <a href="">
        <img src="" alt="berries, fruits, meals">

We will assemble that from our photos checklist utilizing a checklist comprehension, after which be a part of collectively all the outcomes into one large string with "n".be a part of():

html_image_list = [
            <a href="{image["pageURL"]}">
                <img src="{picture["thumbnail']}" alt="{picture["tags"]}">
    for picture in photos
html_image_list = "n".be a part of(html_image_list)

At that time, if we write out a really plain HTML web page containing that checklist, it’s simple to open that in an internet browser for a fast overview of all of the search outcomes we acquired from the API, and click on any one in all them to leap to the total Pixabay web page for downloads:

html = f"""<!doctype html>
<html><head><meta charset="utf-8">
<title>Pixabay seek for {base_params['q']}</title>
ul {{
    list-style: none;
    line-height: 0;
    column-count: 5;
    column-gap: 5px;
li {{
    margin-bottom: 5px;
output_file = f"searchresults-{base_params['q']}.html"
with open(output_file, mode="w", encoding="utf-8") as fp:
print(f"Search outcomes abstract written as {output_file}")

The search results page, showing many fruits

This text is excerpted from Helpful Python, out there on SitePoint Premium and from e book retailers.


Most Popular

Recent Comments