tl;dr: Numerai offers the numerai-cli, a tool to help Data Scientists automate their weekly submission pipeline for the Numerai tournament. This CLI configures a Numerai Prediction Node in Amazon Web Services (AWS) and automatically deploys your model to it.

This guide describes how I set up my own weekly submission pipeline from scratch, using Microsoft Azure and python for free. πŸš€

πŸ€– Just give me the code: https://github.com/papaemman/azure-functions-with-python

What is Numerai?

Numerai is a quant hedge founded in San Francisco, CA, in 2015, built on thousands of crowdsourced machine learning models. Every week, Numerai gives away data so that Data Scientists worldwide have free, hedge-fund quality data to develop their Machine Learning (ML) / Deep Learning (DL) models β€” data they would otherwise not have access to.

At the beginning of each round, competitors can stake NMR tokens (a cryptocurrency issued by Numerai) on the predictions they submit if they want to get paid (or "burnt") based on the predictive accuracy of their models. Staking on a model is fundamentally a type of "bet" competitors take towards the performance of their own model's predictions.

Numerai Compute CLI

Competing in Numerai is an ongoing, never-ending challenge because competitors need to download the new data every week and submit their predictions based on them.

Numerai Compute is a framework aiming to help participants automate their weekly submission workflow. Using the numerai-cli, participants can "provision" their cloud infrastructure in Amazon Web Services (AWS) and deploy their pre-trained models as a Prediction Node that can

The numerai-cli configures a Numerai Prediction Node in Amazon Web Services (AWS). This solution is architected to cost less than $5/month on average, but actual costs may vary.

You need 4 things to use Numerai CLI: Docker, Python, Numerai API Keys, and AWS API Keys.

Why not just use Numerai Compute?

1. Cost

Last week was tough for the crypto market. The massive crash of LUNA carried away the whole crypto market. Currently (May 2022), the NMR token has a price of around 13$. Therefore, saving 5$ per month and investing them in buying more NMR to stake in your models is an appealing idea.

2. Learn something new

When building something from scratch, you have the opportunity to understand how it works under the hood. Last year, I deep-dived into Microsoft Azure cloud technologies, and I wanted to test if my skills were enough to build a real-world solution.

4. Don't have an AWS account

Many professionals prefer to work with other Cloud providers than AWS, such as Microsoft Azure or Google Cloud Platform. Setting up a new AWS account, adding billing information, and finding your way into the AWS console might seem like a lot of work. If you prefer to use Azure*, this guide is for you.*

Goal

Automate my weekly submission pipeline, for the Numerai Tournament, using Azure and python, with zero cost.

Azure offers many services adequate to solve this task, but I decided to use Azure Functions after some research.

Azure Functions is a cloud service available on-demand that provides all the continually updated infrastructure and resources needed to run your applications. You focus on the pieces of code that matter most to you, and Functions handles the rest. Functions provides serverless compute for Azure.

Going for a serverless architecture is a great way to save money and time because you don't need to pay for the infrastructure you don't use. And if you already have the code you want to run, setting up an Azure function is easy. Also, Azure offers a free tier, which is more than enough for this project's scope.

Thus my goal was to create an Azure Function that will:

MS Azure Architecture Design

Technical Implementation

Prerequisites

  1. Azure Account
  2. VSCode and Azure Extension
  3. Python 3.8
  4. Azure Functions Core Tools
  5. Azure Storage Explorer
  6. Numerai API Keys
  7. Trained ML/DL models and inference python code

Setup

1. Setup all prerequisites

2. Open VSCode and create a new local Azure function project

While you don't have any workspace opened in VSCode, go to the Azure extension, find FUNCTIONS and select _Create New Project…- S_elect Directory, Language (Python), and a python interpreter to create a virtual environment- Select the Time Trigger template, give a name to the function, and set up the timer to run every Sunday [* 0 0 * * SUN]. You don't need to set up the function to run in short time intervals. Azure functions support "Execute Function now…" to trigger it manually within VSCode for testing purposes.

3. Then, the VSCode will automatically create a Local project with a time-trigger function sample code.

## Azure function project codebase structure
.
β”œβ”€β”€ .venv                     # Python Virtual environment
β”œβ”€β”€ .vscode                   # Configuration options for VSCode
β”œβ”€β”€ host.json                 # Configuration options that affect all functions in a function app instance
β”œβ”€β”€ local.settings.json       # Maintains settings used when running functions locally.These settings aren't used when running in Azure
|
β”œβ”€β”€ AzureFunction_1
β”‚ β”œβ”€β”€ function.json           # Azure Function Settings
β”‚ β”œβ”€β”€ __init__.py             # Python Code
β”‚ └── readme.md               # Documentation
|
β”œβ”€β”€ AzureFunction_2
β”‚ β”œβ”€β”€ function.json           
β”‚ β”œβ”€β”€ __init__.py             
β”‚ └── readme.md               
|
└── requirements.txt          # Package dependencies

4. Run the Function locally (Open VSCode and Press F5 or Run β†’Start Debugging)

To debug, you must select a storage account for internal use by the Azure Functions runtime: Select a subscription, create a new storage account, create a new resource group, and select a location for resources. Keep in mind that this is a temporary resource group.

Important note ⚠️

If you get an error β€œconnect ECONNREFUSED 127.0.0.1:9091”, update the extensionBundle to version β€œversion”: β€œ[2.*, 4.0.0)”, in host.json file.

5. Publish the project to Azure

This will create a new resource group with an App Service Plan, a Function App, a Storage Account, and an Application Insights service.

6. Run the Function in Azure

7. Download the function app settings

8. Setup the Azure Storage account.

9. Purchase the Twilio SendGrid SaaS offering from Microsoft Azure Marketplace

To send a notification email every time the function triggered and submitted predictions, I decided to use the Twilio SendGrid's Email API

10. Configure Function App

Having the Numerai API Keys and Twillio Seng Grid API keys hardcoded in the source code is a terrible practice. Azure Functions offers a dedicated Function's App configuration to store environmental variables and secrets.

NumeaiAPIKeys β†’secret_key=***;public_id=***”

SendGridKeyss β†’***

11. Write the Python code

Now it's time to write (or copy-paste) the python code responsible for reading the numerai live data & pre-trained ML/DL models and submitting predictions using Numerai API.

The input and output configurations (configurations and paths to load and store files) are the only thing you have to modify in your previous python code (running locally) to run it as an Azure Function. Instead of local paths, you should use the newly created Storage Account. For this, you need to use the Azure Storage Python SDK.

12. Run the Function locally and verify that it is working

13. Redeploy and verify the updated App

Conclusions

Last Sunday, I woke up to this fun email from myself, letting me know that everything went as planned. The Azure Function was triggered on Sunday at 00:00:00 UTC, the Numerai submissions were successful, and I was able to enjoy my day! 🌞

Email Notification for successful submission

PS. Check the awesome quote at the end. I added a random quote generator inside the function creating the email notification just to spark things up ✨

Get in touch

I hope you liked this guide. If you have any questions or just want to introduce yourself, don't hesitate to reach out!

Panagiotis Papaemmanouil