When working with Databricks pipelines and workflows, failures are inevitable. While Databricks provides built-in notifications for job failures, these alerts are often not customizable and may not fit specific reporting or formatting needs. A more flexible and cost-effective approach is to set up custom email notifications that include pipeline details and error messages in a structured format, such as an Excel attachment.
This blog walks through approaches to implement custom notifications using SMTP, SendGrid, Azure Logic Apps, and Microsoft Graph API.
Why Custom Notifications?
- Flexible formatting: Include pipeline metadata, error messages, and runtime details.
- Attachments: Share structured reports (Excel, CSV, etc.) instead of plain text.
- Cost efficiency: Avoid additional third-party monitoring solutions.
- Integration options: Easily plug into existing email infrastructure.
Approach 1: SMTP-Based Notifications
Using SMTP with Python inside a Databricks notebook, you can generate an Excel report and send it via email whenever a pipeline fails.
Example Implementation
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email.mime.text import MIMEText
from email import encoders
from io import BytesIO
import pandas as pd
#Sample pipeline history
df = spark.createDataFrame([ ('pipeline1', 'success', '7min'), ('pipeline1', 'fail', '3min'), ('pipeline1', 'success', '10min') ], ["PipelineName", "Status", "Duration"])
# Convert DataFrame to Excel
output = BytesIO()
with pd.ExcelWriter(output, engine='xlsxwriter') as writer:
df_pd = df.toPandas()
df_pd.to_excel(writer, index=False, sheet_name='Sheet1')
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Apply formatting
header_format = workbook.add_format({
'bold': True,
'bg_color': '#FFF00',
'border': 1,
'align': 'center',
'valign': 'vcenter'
})
for col_num, value in enumerate(df_pd.columns):
worksheet.write(0, col_num, value.upper(), header_format)
cell_format = workbook.add_format({'border': 1})
for row in range(1, len(df_pd) + 1):
for col in range(len(df_pd.columns)):
worksheet.write(row, col, df_pd.iloc[row-1, col], cell_format)
for i, col in enumerate(df_pd.columns):
worksheet.set_column(i, i, 20)
output.seek(0)
# Email configuration
sender = "[email protected]"
receiver = "[email protected]"
subject = "Pipeline Execution Report"
body = """Hello Team,
Please find the attachment of the latest pipeline report.
Thanks,
Pipeline Team"""
msg = MIMEMultipart()
msg['From'] = sender
msg['To'] = receiver
msg['Subject'] = subject
msg.attach(MIMEText(body, 'plain'))
part = MIMEBase('application', 'vnd.openxmlformats-officedocument.spreadsheetml.sheet')
part.set_payload(output.read())
encoders.encode_base64(part)
part.add_header('Content-Disposition', 'attachment; filename="pipeline_report.xlsx"')
msg.attach(part)
smtp_server = "smtp.office.com"
smtp_port = 587
with smtplib.SMTP(smtp_server, smtp_port) as server:
server.starttls()
server.login(sender, "sender_password")
server.send_message(msg)
print("Email sent successfully with Excel attachment")
Scheduling Notifications
You can automate the notification trigger by scheduling the notebook:
Option 1: Databricks Jobs
- Create or edit a Databricks job.
- Add a task dependency so the notification script runs only if the previous task fails.
- This ensures error details are captured and reported immediately.
Option 2: Azure Logic Apps
- Configure a Logic App that listens for pipeline failures.
- Pass pipeline details and attachments via an API call in JSON format.
- Logic Apps handle email delivery and retry mechanisms.
Conclusion
While Databricks provides basic failure notifications, extending them with custom SMTP or Logic App workflows ensures:
- Rich, formatted reports.
- Team visibility with detailed context.
- Seamless integration with enterprise communication tools.
This approach is cost-effective, scalable, and easily adaptable for large-scale pipeline monitoring.