When working with Databricks pipelines and workflows, failures are inevitable. While Databricks provides built-in notifications for job failures, these alerts are often not customizable and may not fit specific reporting or formatting needs. A more flexible and cost-effective approach is to set up custom email notifications that include pipeline details and error messages in a structured format, such as an Excel attachment.

This blog walks through approaches to implement custom notifications using SMTP, SendGrid, Azure Logic Apps, and Microsoft Graph API.

Why Custom Notifications?

Approach 1: SMTP-Based Notifications

Using SMTP with Python inside a Databricks notebook, you can generate an Excel report and send it via email whenever a pipeline fails.

Example Implementation

import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email.mime.text import MIMEText
from email import encoders
from io import BytesIO
import pandas as pd

#Sample pipeline history
df = spark.createDataFrame([ ('pipeline1', 'success', '7min'), ('pipeline1', 'fail', '3min'), ('pipeline1', 'success', '10min') ], ["PipelineName", "Status", "Duration"])

# Convert DataFrame to Excel
output = BytesIO() 
with pd.ExcelWriter(output, engine='xlsxwriter') as writer: 
df_pd = df.toPandas() 
df_pd.to_excel(writer, index=False, sheet_name='Sheet1') 
workbook = writer.book 
worksheet = writer.sheets['Sheet1']
# Apply formatting
header_format = workbook.add_format({
    'bold': True,
    'bg_color': '#FFF00',
    'border': 1,
    'align': 'center',
    'valign': 'vcenter'
})
for col_num, value in enumerate(df_pd.columns):
    worksheet.write(0, col_num, value.upper(), header_format)

cell_format = workbook.add_format({'border': 1})
for row in range(1, len(df_pd) + 1):
    for col in range(len(df_pd.columns)):
        worksheet.write(row, col, df_pd.iloc[row-1, col], cell_format)

for i, col in enumerate(df_pd.columns):
    worksheet.set_column(i, i, 20)
output.seek(0)

# Email configuration
sender = "[email protected]"
receiver = "[email protected]"
subject = "Pipeline Execution Report"
body = """Hello Team,

Please find the attachment of the latest pipeline report.

Thanks,
Pipeline Team"""

msg = MIMEMultipart()
msg['From'] = sender
msg['To'] = receiver
msg['Subject'] = subject
msg.attach(MIMEText(body, 'plain'))

part = MIMEBase('application', 'vnd.openxmlformats-officedocument.spreadsheetml.sheet')
part.set_payload(output.read())
encoders.encode_base64(part)
part.add_header('Content-Disposition', 'attachment; filename="pipeline_report.xlsx"')
msg.attach(part)

smtp_server = "smtp.office.com"
smtp_port = 587

with smtplib.SMTP(smtp_server, smtp_port) as server:
    server.starttls()
    server.login(sender, "sender_password")
    server.send_message(msg)

print("Email sent successfully with Excel attachment")

Scheduling Notifications

You can automate the notification trigger by scheduling the notebook:

Option 1: Databricks Jobs

Option 2: Azure Logic Apps

Conclusion

While Databricks provides basic failure notifications, extending them with custom SMTP or Logic App workflows ensures:

This approach is cost-effective, scalable, and easily adaptable for large-scale pipeline monitoring.