Skip to main content
SUBMIT A PRSUBMIT AN ISSUElast edit: Dec 18, 2024

OCR Subnet Tutorial

In this tutorial you will learn how to quickly convert your validated idea into a functional Bittensor subnet. This tutorial begins with a Python notebook that contains the already validated code for optical character recognition (OCR). We demonstrate how straightforward it is to start with such notebooks and produce a working subnet.

Motivation

Bittensor subnets are:

  • Naturally suitable for continuous improvement of the subnet miners.
  • High throughput environments to accomplish such improvements.

This is the motivation for creating an OCR subnet for this tutorial. By using the OCR subnet, one can extract the text from an entire library of books in a matter of hours or days. Moreover, when we expose the subnet miners, during training, to examples of real-world use-cases, the OCR subnet can be fine-tuned to be maximally effective.

Takeaway lessons

When you complete this tutorial, you will know the following:

  • How to convert your Python notebook containing the validated idea into a working Bittensor subnet.
  • How to use the Bittensor Subnet Template to accomplish this goal.
  • How to perform subnet validation and subnet mining.
  • How to design your own subnet incentive mechanism.

Tutorial code

Python notebook

The Python notebook we use in this tutorial contains all the three essential components of the OCR subnet:

OCR subnet repository

  • We will use the OCR subnet repository as our starting point and then incorporate the notebook code to build the OCR subnet.

Tutorial method

For the rest of this tutorial we will proceed by demonstrating which blocks of Python notebook code are copied into specific sections of the OCR subnet repository.

Prerequisites

Required reading

If you are new to Bittensor, read the following sections before you proceed:

  1. Introduction that describes how subnets form the heartbeat of the Bittensor network.
  2. Bittensor Building Blocks that presents the basic building blocks you use to develop your subnet incentive mechanism.
  3. Anatomy of Incentive Mechanism that introduces the general concept of a subnet incentive mechanism.

OCR subnet summary

This tutorial OCR subnet works like this. The below numbered items correspond to the numbers in the diagram:

Incentive Mechanism Big PictureIncentive Mechanism Big Picture
  1. The subnet validator sends a challenge simultaneously to multiple subnet miners. In this tutorial the challenge consists of an image file of a synthetic invoice document. The serialized image file is attached to a synapse object called OCRSynapse. This step constitutes the query from the subnet validator to subnet miners.
  2. The subnet miners respond after performing the challenge task. After receiving the synapse object containing the image data, each miner then performs the task of extracting, from the image data, its contents, including the text content, the positional information of the text, the fonts used in the text and the font size.
  3. The subnet validator then scores each subnet miner based on the quality of the response and how quickly the miner completed the task. The subnet validator uses the original synthetic invoice document as the ground truth for this step.
  4. Finally, the subnet validator sets the weights for the subnet miners by sending the weights to the blockchain.

Step 1: Generate challenge and query the miners

Step 1.1: Synthetic PDF as challenge

In this tutorial, the subnet validator will generate synthetic data, which is a PDF document containing an invoice. The subnet validator will use this synthetic PDF as the basis for assessing the subnet miner performance. Synthetic data is an appropriate choice as it provides an unlimited source of customizable validation data. It also enables the subnet validators to gradually increase the difficulty of the task so that the miners are required to continuously improve. This is in contrast to using a pre-existing dataset from the web, where subnet miners can "lookup" the answers on the web.

The contents of the PDF document are the ground truth labels. The subnet validator uses them to score the miner responses. The synthetic PDF document is corrupted with different types of noise to mimic poorly scanned documents. The amount of noise can also be gradually increased to make the task more challenging.

To generate this challenge, the subnet validator applies the following steps:

  • Creates a synthetic invoice document using the Python Faker library.
  • Converts this synthetic data into PDF using ReportLab Python library.
  • Finally, the validator creates the challenge by converting this PDF into a corrupted image, called noisy_image.

Code snapshot

See below for a snapshot view of the code.

# Generates a PDF invoice from the raw data passed in as "invoice_data" dictionary 
# and saves the PDF with "filename"
def create_invoice(invoice_data, filename):
...

# Using Faker, generate sample data for the invoice
invoice_info = {
"company_name": fake.company(),
"company_address": fake.address(),
"company_city_zip": f'{fake.city()}, {fake.zipcode()}',
...
}
...

# Pass the "invoice_info" containing the Faker-generated raw data
# to create_invoice() method and generate the synthetic invoice PDF
pdf_filename = "sample_invoice.pdf"
data = create_invoice(invoice_info, pdf_filename)
...

# Loads PDF and converts it into usable PIL image using Pillow library
# Used by the corrupt_image() method
def load_image(pdf_path, page=0, zoom_x=1.0, zoom_y=1.0):
...

# Accepts a PDF, uses load_image() method to convert to image
# and adds noise, blur, spots, rotates the page, curls corners, darkens edges so
# that the overall result is noisy. Saves back in PDF format.
# This is our corrupted synthetic PDF document.
def corrupt_image(input_pdf_path, output_pdf_path, border=50, noise=0.1, spot=(100,100), scale=0.95, theta=0.2, blur=0.5):
...

Collab Notebook source: The validated code for the above synthetic PDF generation logic is in Validation flow cell.

All we have to do is to copy the above Notebook code into a proper place in the OCR subnet repo.

...
├── ocr_subnet
│   ├── __init__.py
│  ...
│   └── validator
│   ├── __init__.py
│   ├── corrupt.py
│   ├── forward.py
│   ├── generate.py
│   ├── reward.py
│   └── utils.py
...

We copy the above Notebook code into the following code files. Click on the OCR repo file names to see the copied code:

Python Notebook sourceOCR repo destination
Methods: create_invoice, random_items, load_image, and lists items_list and invoice_info and all the import statements in cell 34.ocr_subnet/validator/generate.py
Method: corrupt_imageocr_subnet/validator/corrupt.py

Step 1.2: Query miners

Next, the subnet validator sends this noisy_image to the miners, tasking them to perform OCR and content extraction.

Collab Notebook source: In the validated Collab Notebook code, this step is accomplished by directly passing the path information of the noisy_image from the Validator cell to the miner.

Define OCRSynapse class

However, in a Bittensor subnet, any communication between a subnet validator and a subnet miner must use an object of the type Synapse. Hence, the subnet validator must attach the corrupted image to a Synapse object and send this object to the miners. The miners will then update the passed synapse by attaching their responses into this same object and send them back to the subnet validator.

Code snapshot

# OCRSynapse class, using bt.Synapse as its base.
# This protocol enables communication between the miner and the validator.
# Attributes:
# - image: A pdf image to be processed by the miner.
# - response: List[dict] containing data extracted from the image.
class OCRSynapse(bt.Synapse):
"""
A simple OCR synapse protocol representation which uses bt.Synapse as its base.
This protocol enables communication between the miner and the validator.

Attributes:
- image: A pdf image to be processed by the miner.
- response: List[dict] containing data extracted from the image.
"""

# Required request input, filled by sending dendrite caller. It is a base64 encoded string.
base64_image: str

# Optional request output, filled by receiving axon.
response: typing.Optional[typing.List[dict]] = None
important

The OCRSynapse object can only contain serializable objects. This is because both the subnet validators and the subnet miners must be able to deserialize after receiving the object.

See the OCRSynapse class definition in ocr_subnet/protocol.py.

...
├── ocr_subnet
│   ├── __init__.py
│   ├── base
│   │   ├── __init__.py
│   │   ...
│   ├── protocol.py
...

Send OCRSynapse to miners

With the OCRSynapse class defined, next we use the network client dendrite of the subnet validator to send queries to the Axon server of the subnet miners.

Code snapshot

# Create synapse object to send to the miner and attach the image.
# convert PIL image into a json serializable format
synapse = OCRSynapse(base64_image = serialize_image(image))
# The dendrite client of the validator queries the miners in the subnet
responses = self.dendrite.query(
# Send the query to selected miner axons in the network.
axons=[self.metagraph.axons[uid] for uid in miner_uids],
# Pass the synapse to the miner.
synapse=synapse,
...
)

See ocr_subnet/validator/forward.py which contains all this communication logic.

Also note that the scripts/ directory contains the sample invoice document and its noisy version. The subnet validator uses these as ground truth labels to score the miner responses.

...
├── ocr_subnet
│   ├── __init__.py
│  ...
│   └── validator
│   ├── __init__.py
│   ├── corrupt.py
│   ├── forward.py
│   ├── generate.py
│   ├── reward.py
│   └── utils.py
...

Step 2: Miner response

Having received the OCRSynapse object with the corrupted image data in it, the miners will next perform the data extraction.

Base miner

The Python notebook contains an implementation of the base miner, which uses pytesseract, a popular open source OCR tool to extract data from the image sent by the subnet validator.

Collab Notebook source: See the miner method in this Miner cell of the Collab Notebook.

Code snapshot

import pytesseract
# Extracts text data from image using pytesseract. This is the baseline miner.
def miner(image, merge=True, sort=True)
...
response = miner(noisy_image, merge=True)

We copy the above miner code from the Notebook into the following code files. Click on the OCR repo file names to see the copied code:

...
├── neurons
│   ├── __init__.py
│   ├── miner.py
│   └── validator.py
...
Python Notebook sourceOCR repo destination
Methods: group_and_merge_boxes and miner and all the import statements in this Miner cell of the Collab Notebook.neurons/miner.py
Student exercise

pytesseract is well-suited for this OCR problem. But it can be beaten by a subnet miner using more sophisticated approaches such as deep learning for OCR.

Step 3: Scoring miner responses

When a miner sends its response, the subnet validator scores the quality of the response in the following way:

Prediction reward : Compute the similarity between the ground truth and the prediction of the miner for the text content, text position and the font. This is conceptually equivalent to a loss function that is used in a machine learning setting, with the only difference being that rewards are a function to be maximized rather than minimized. The total prediction reward is calculated as below:

  • For each section of the synthetic invoice document, compute the three partial reward quantities:
    • text reward.
    • position reward.
    • font reward.
  • This is done by comparing a section in the miner response to the corresponding section in the ground truth synthetic invoice document.
  • Add the above three partial reward quantities to compute the total loss for the particular section.
  • Take the mean score of all such total rewards over all the sections of the invoice document.

Response time penalty : Calculate the response time penalty for the miner for these predictions. The goal here is to assign higher rewards to faster miners.

Code snapshot

# Calculate the edit distance between two strings.
def get_text_reward(text1: str, text2: str = None):
...
# Calculate the intersection over union (IoU) of two bounding boxes.
def get_position_reward(boxA: List[float], boxB: List[float] = None):
...
# Calculate the distance between two fonts, based on the font size and font family.
def get_font_reward(font1: dict, font2: dict = None, alpha_size=1.0, alpha_family=1.0):
...
# Score a section of the image based on the section's correctness.
# Correctness is defined as:
# - the intersection over union of the bounding boxes,
# - the delta between the predicted font and the ground truth font,
# - and the edit distance between the predicted text and the ground truth text.
def section_reward(label: dict, pred: dict, alpha_p=1.0, alpha_f=1.0, alpha_t=1.0, verbose=False):
...
reward = {
'text': get_text_reward(label['text'], pred.get('text')),
'position': get_position_reward(label['position'], pred.get('position')),
'font': get_font_reward(label['font'], pred.get('font')),
}

reward['total'] = (alpha_t * reward['text'] + alpha_p * reward['position'] + alpha_f * reward['font']) / (alpha_p + alpha_f + alpha_t)
...
# Reward the miner response.
def reward(image_data: List[dict], predictions: List[dict], time_elapsed: float) -> float:
time_reward = max(1 - time_elapsed / max_time, 0)
total_reward = (alpha_prediction * prediction_reward + alpha_time * time_reward) / (alpha_prediction + alpha_time)
...
Rewards are exponential moving averaged

The rewards attained by miners are averaged over many turns using an exponential moving average (EMA). This is done to obtain a more reliable estimate of the overall performance on the task. We often refer to these smoothed rewards as EMA scores.

Collab Notebook source: See the Incentive mechanism cell.

We copy the above miner code from the Notebook into the following code files. Click on the OCR repo file names to see the copied code:

...
├── ocr_subnet
│   ├── __init__.py
│  ...
│   └── validator
│   ├── __init__.py
│   ├── corrupt.py
│   ├── forward.py
│   ├── generate.py
│   ├── reward.py
│   └── utils.py
...
Python Notebook sourceOCR repo destination
Methods: reward and miner and all the import statements in this Miner cell of the Collab Notebook.ocr_subnet/validator/reward.py
Method section_reward.Method loss in the reward.py.
Methods: get_position_reward, get_text_reward and get_font_reward.Methods: get_iou, get_edit_distance and get_font_distance, respectively, in ocr_subnet/validator/utils.py

Step 4: Set weights

Finally, as shown in the above OCR subnet summary, the subnet validator normalizes the EMA scores and sets the weights of the subnet miners to the blockchain. This step is not in the Python notebooks. This step is performed by the function set_weights in the ocr_subnet/base/validator.py and it is already available fully implemented in the OCR subnet repo.

Next steps

Congratulations, you have successfully transformed your Python Notebook into a working Bittensor subnet. See below tips for your next steps.

Can you think of ways your incentive mechanism would lead to undesirable behavior? For example:

  • The positional structure of the invoice, i.e., how sections are positioned in the invoice, is mostly static and thus easily predictable. Hence all subnet miners may predict the position correctly without doing much work. This will render the position reward as ineffective. How can you avoid this?
  • Experiment with the α\alpha hyperparameters to make the subnet miners compete more effectively. See Reward model (incentive mechanism).