Overview
This cookbook demonstrates automated product datasheet analysis using Mistral AI's Document AI.
Use Case: Battery Procurement & Vendor Validation
You're sourcing lithium-ion batteries for a portable device. Vendors send PDF datasheets with hundreds of specifications. Manually comparing each against your design requirements is time-consuming and error-prone.
This cookbook automates the process:
- Extract structured data from lithium battery PDF datasheets using Document AI - Mistral OCR with Document Annotations
- Compare specifications against design requirements
- Generate detailed technical reports with comprehensive analysis for each parameter
Input Files Required:
-
📄 Product Datasheet PDF (
lithium_iron_cell_datasheet.pdf)- Vendor-provided specification document containing technical specs, safety info, and performance data
-
📋 Design Requirements (
battery_requirements.txt)- Your project's specification criteria defining acceptable ranges for capacity, voltage, temperature, safety, etc.
Technology Stack:
- ✅ Mistral OCR (
mistral-ocr-latest) - PDF parsing with document annotations - ✅ Mistral Medium (
mistral-medium-latest) - Technical report generation
Key Features:
- Document AI for OCR + structured extraction
- Direct Pydantic schema extraction
- Comprehensive battery specification coverage
- Safety-focused validation
- Professional technical report generation
Benefits: Fast, accurate, and generates professional documentation for procurement decisions.
1. Setup and Imports
# Install required packages (uncomment if needed)
# !pip install mistralaiimport base64
import os
import json
from mistralai import Mistral
from mistralai.extra import response_format_from_pydantic_model
from pydantic import BaseModel, Field
from typing import List, Optional
print("✓ All imports successful")# Initialize Mistral client
api_key = os.getenv("MISTRAL_API_KEY")
if not api_key:
raise ValueError("MISTRAL_API_KEY environment variable not set. Please set it before running.")
client = Mistral(api_key=api_key)
print("✓ Mistral client initialized")2. Define Data Schemas
We define comprehensive Pydantic schemas for lithium battery specifications including capacity, voltage, current, temperature, dimensions, and safety features.
# Schema for lithium battery specifications
class CapacitySpec(BaseModel):
"""Battery capacity specifications."""
normal_capacity: float = Field(..., description="Normal capacity in mAh")
minimum_capacity: float = Field(..., description="Minimum capacity in mAh")
unit: str = Field("mAh", description="Capacity unit")
class VoltageSpec(BaseModel):
"""Voltage specifications."""
nominal_voltage: float = Field(..., description="Nominal voltage in Volts")
charge_voltage: float = Field(..., description="Charge voltage in Volts")
discharge_cutoff_voltage: float = Field(..., description="Discharge cut-off voltage in Volts")
class CurrentSpec(BaseModel):
"""Current specifications."""
standard_charge_current: float = Field(..., description="Standard charge current in mA")
maximum_charge_current: float = Field(..., description="Maximum charge current in mA")
standard_discharge_current: float = Field(..., description="Standard discharge current in mA")
maximum_discharge_current: float = Field(..., description="Maximum discharge current in mA")
max_instantaneous_discharge: float = Field(..., description="Maximum instantaneous discharge current in mA")
class TemperatureRange(BaseModel):
"""Temperature range specifications."""
min_temp: float = Field(..., description="Minimum temperature in °C")
max_temp: float = Field(..., description="Maximum temperature in °C")
condition: str = Field(..., description="Condition (e.g., 'Charge', 'Discharge', 'Storage')")
class DimensionsSpec(BaseModel):
"""Physical dimensions specifications."""
height: float = Field(..., description="Cell height in mm")
diameter: float = Field(..., description="Diameter in mm")
weight: float = Field(..., description="Weight in grams")
class PerformanceSpec(BaseModel):
"""Performance test results."""
test_name: str = Field(..., description="Name of the performance test")
criteria: str = Field(..., description="Performance criteria/requirement")
result: str = Field(..., description="Test result status")
class LithiumBatterySpec(BaseModel):
"""Complete specification for a lithium battery cell."""
model_name: str = Field(..., description="Model name or number")
product_type: str = Field(..., description="Product type (e.g., 'Lithium-ion Cell Battery')")
capacity: CapacitySpec = Field(..., description="Capacity specifications")
voltage: VoltageSpec = Field(..., description="Voltage specifications")
current: CurrentSpec = Field(..., description="Current specifications")
internal_impedance: str = Field(..., description="Internal impedance specification")
dimensions: DimensionsSpec = Field(..., description="Physical dimensions")
cycle_life: int = Field(..., description="Cycle life (number of cycles)")
operating_temperatures: List[TemperatureRange] = Field(..., description="Operating temperature ranges")
storage_temperatures: List[TemperatureRange] = Field(..., description="Storage temperature ranges")
performance_tests: List[PerformanceSpec] = Field(default=[], description="Performance test results")
certifications: List[str] = Field(default=[], description="Certifications and standards")
manufacturer: str = Field(..., description="Manufacturer company name")
distributor: str = Field(..., description="Distributor/vendor information")
warnings: List[str] = Field(default=[], description="Key safety warnings and precautions")
class LithiumBatterySchema(BaseModel):
"""Wrapper for extracted lithium battery specifications."""
specs: List[LithiumBatterySpec] = Field(
..., description="List of extracted lithium battery specifications"
)
print("✓ Pydantic schemas for lithium battery defined")3. Helper Functions
def encode_pdf(pdf_path: str) -> str:
"""Encode PDF file to base64 string.
Args:
pdf_path: Path to the PDF file
Returns:
Base64 encoded string of the PDF
"""
try:
with open(pdf_path, "rb") as pdf_file:
return base64.b64encode(pdf_file.read()).decode('utf-8')
except FileNotFoundError:
raise FileNotFoundError(f"PDF file not found: {pdf_path}")
except Exception as e:
raise Exception(f"Error encoding PDF: {str(e)}")
print("✓ Helper functions defined")4. File Setup
Verify that the required files exist.
# Define file paths
PDF_PATH = "lithium_iron_cell_datasheet.pdf"
REQUIREMENTS_PATH = "battery_requirements.txt"
# Verify files exist
if os.path.exists(PDF_PATH):
print(f"✓ Found PDF: {PDF_PATH}")
else:
raise FileNotFoundError(f"❌ PDF not found: {PDF_PATH}")
if os.path.exists(REQUIREMENTS_PATH):
print(f"✓ Found requirements: {REQUIREMENTS_PATH}")
else:
raise FileNotFoundError(f"❌ Requirements not found: {REQUIREMENTS_PATH}")5. Extract Structured Data with Document Annotations
This is the key feature of this cookbook. We use Mistral OCR's document_annotation_format parameter to extract structured battery specifications directly from the PDF in a single API call.
How it works:
- The PDF is encoded to base64
- Mistral OCR processes the document
- The
document_annotation_formatparameter tells the OCR to extract data matching our comprehensive battery schema - We get back structured data including capacity, voltage, current, temperatures, dimensions, and safety specs
Benefits:
- ✅ Single API call (no separate LLM call needed)
- ✅ Direct schema extraction during OCR
- ✅ More accurate (extraction happens with full document context)
- ✅ Captures complex nested specifications
- ✅ Safety-critical validation
Note:
Document annotations are limited to 8 pages. For larger documents, split them into chunks.
print("📄 Extracting structured data from battery datasheet...")
print(f" Processing: {PDF_PATH}")
# Encode PDF to base64
base64_pdf = encode_pdf(PDF_PATH)
print(" ✓ PDF encoded to base64")
# Extract structured data using Mistral OCR with document annotations
print(" 🔍 Running Mistral OCR with document annotations...")
annotations_response = client.ocr.process(
model="mistral-ocr-latest",
pages=list(range(8)), # Document Annotations limited to 8 pages
document={
"type": "document_url",
"document_url": f"data:application/pdf;base64,{base64_pdf}"
},
document_annotation_format=response_format_from_pydantic_model(LithiumBatterySchema),
include_image_base64=True
)
print(f" ✓ OCR completed - {len(annotations_response.pages)} pages processed")
print(" ✓ Structured data extracted successfully")# Parse the extracted data into our Pydantic model
extracted_data = LithiumBatterySchema(**json.loads(annotations_response.document_annotation))
print("\n" + "="*60)
print("🔋 EXTRACTED BATTERY SPECIFICATIONS")
print("="*60)
print(json.dumps(extracted_data.model_dump(), indent=2))6. Generate Comparison Report
Now that we have structured battery data, we use Mistral LLM to compare it against design requirements and generate a detailed safety and performance report.
# Load design requirements
print("📋 Loading battery design requirements...")
with open(REQUIREMENTS_PATH, 'r') as f:
requirements = f.read()
print(f" ✓ Requirements loaded from {REQUIREMENTS_PATH}")
print("\nDesign Requirements:")
print(requirements)print("\n📊 Generating detailed technical report with Mistral LLM...")
# Prepare the comparison prompt for narrative report generation
comparison_prompt = f"""You are an expert battery engineer specializing in lithium-ion battery safety, performance validation, and technical documentation.
I need you to write a comprehensive technical evaluation report comparing a lithium battery's specifications against design requirements.
Design Requirements:
{requirements}
Extracted Battery Specifications:
{json.dumps(extracted_data.model_dump(), indent=2)}
Please write a detailed technical report with the following sections:
# BATTERY VALIDATION REPORT
## 1. EXECUTIVE SUMMARY
Provide a 2-3 paragraph summary of the battery model, manufacturer, and overall compliance status. Include the final recommendation (APPROVED/REJECTED/CONDITIONAL APPROVAL).
## 2. BATTERY IDENTIFICATION
- Model Number
- Manufacturer
- Product Type
- Distributor
## 3. SPECIFICATION ANALYSIS
### 3.1 Capacity Analysis
Compare the normal and minimum capacity against requirements. Explain if it meets or fails the criteria with actual values.
### 3.2 Voltage Characteristics
Analyze nominal voltage, charge voltage, and discharge cut-off voltage. Discuss compliance with safety margins.
### 3.3 Current Capabilities
Evaluate standard and maximum charge/discharge currents. Discuss whether the battery can handle the required load profiles.
### 3.4 Physical Specifications
Verify dimensional compliance (diameter, height, weight) for 18650 standard format.
### 3.5 Performance Characteristics
Assess cycle life and internal impedance against requirements. Discuss implications for product lifetime.
### 3.6 Operating Conditions
Evaluate temperature ranges for charging, discharging, and storage. Identify any limitations or concerns.
## 4. SAFETY EVALUATION
Review safety certifications, protection features (over-charge, over-discharge, short-circuit), and compliance with standards (UN38.3, IEC62133).
## 5. QUALITY ASSESSMENT
Evaluate manufacturing facility certification and performance test results.
## 6. RISK ASSESSMENT
Identify any specification gaps, safety concerns, or operational limitations. Discuss potential risks and mitigation strategies.
## 7. FINAL RECOMMENDATION
Provide clear recommendation: APPROVED, REJECTED, or CONDITIONAL APPROVAL with specific conditions.
Write the report in professional technical language suitable for engineering documentation and procurement decisions. Be thorough, objective, and include specific values and comparisons throughout."""
# Generate narrative report using Mistral LLM (NO response_format - free text)
comparison_response = client.chat.complete(
model="mistral-medium-latest",
messages=[
{"role": "user", "content": comparison_prompt}
],
temperature=0.3 # Slightly higher for more natural writing
)
# Extract the narrative report
narrative_report = comparison_response.choices[0].message.content
print(" ✓ Technical report generated successfully")7. Display Results
print("\n" + "="*80)
print("📋 BATTERY TECHNICAL EVALUATION REPORT")
print("="*80)
print(narrative_report)8. Export Results
Save the complete battery analysis to a JSON file for future reference and compliance records.
# Export complete results including narrative report
output_json = "battery_analysis_results.json"
output_report = "battery_technical_report.md"
# Save JSON results
results = {
"extracted_data": extracted_data.model_dump(),
"narrative_report": narrative_report,
"requirements": requirements
}
with open(output_json, 'w') as f:
json.dump(results, f, indent=2)
# Save narrative report as markdown
with open(output_report, 'w') as f:
f.write(narrative_report)
print(f"\n💾 Complete analysis saved to: {output_json}")
print(f"📄 Technical report saved to: {output_report}")Conclusion
What We Built:
This cookbook demonstrated a production-ready workflow for analyzing lithium battery datasheets using pure Mistral AI capabilities:
- ✅ Comprehensive Data Extraction - Capacity, voltage, current, temperature, dimensions, safety specs
- ✅ Safety-Focused Validation - Protection features, certifications, operating limits
- ✅ Automated Compliance - Compare against industry standards and design requirements
- ✅ Detailed Reporting - Pass/fail analysis for each specification category
Key Advantages:
- Document AI for extraction (no separate OCR/ LLM processing needed)
- Comprehensive schema covering all critical battery specifications
- Safety-critical validation for charge/discharge limits and temperature ranges
Technology Stack:
- Mistral OCR (
mistral-ocr-latest) with document annotations - Mistral Large (
mistral-medium-latest) for comparison reports - Pydantic for comprehensive schema validation
Use Cases:
This workflow is perfect for:
- Battery procurement - Validate vendor specifications
- Quality control - Ensure compliance with design requirements
- Safety validation - Check protection features and operating limits
- Product development - Compare multiple battery options
- Compliance reporting - Generate validation records
Extending This Cookbook:
You can easily adapt this workflow for:
- Other electronic components (capacitors, resistors, ICs)
- Different battery chemistries (LiFePO4, NiMH, etc.)
- Resumes matching with Job description.
Limitations:
- Document annotations are limited to 8 pages
- For larger documents, split them into chunks and process separately