Loading...
Loading...
Guide for implementing Google Gemini API document processing - analyze PDFs with native vision to extract text, images, diagrams, charts, and tables. Use when processing documents, extracting structured data, summarizing PDFs, answering questions about document content, or converting documents to structured formats. (project)
npx skill4agent add aia-11-hn-mib/mib-mockinterviewaibot gemini-document-processingGEMINI_API_KEY.env.claude/.env.claude/skills/.env.env.claude/skills/gemini-document-processing/.envexport GEMINI_API_KEY="your-api-key-here"echo "GEMINI_API_KEY=your-api-key-here" > .env# Enable Vertex AI
export GEMINI_USE_VERTEX=true
export VERTEX_PROJECT_ID=your-gcp-project-id
export VERTEX_LOCATION=us-central1 # Optional, defaults to us-central1.envGEMINI_USE_VERTEX=true
VERTEX_PROJECT_ID=your-gcp-project-id
VERTEX_LOCATION=us-central1pip install google-genai python-dotenv# Use the provided script
python .claude/skills/gemini-document-processing/scripts/process-document.py \
--file invoice.pdf \
--prompt "Extract invoice details as JSON" \
--format json# Process and summarize
python .claude/skills/gemini-document-processing/scripts/process-document.py \
--file report.pdf \
--prompt "Provide a concise executive summary"# Q&A on document content
python .claude/skills/gemini-document-processing/scripts/process-document.py \
--file contract.pdf \
--prompt "What are the key terms and conditions?"from google import genai
client = genai.Client()
# Read PDF
with open('document.pdf', 'rb') as f:
pdf_data = f.read()
# Process document
response = client.models.generate_content(
model='gemini-2.5-flash',
contents=[
'Extract key information from this document',
genai.types.Part.from_bytes(
data=pdf_data,
mime_type='application/pdf'
)
]
)
print(response.text)from google import genai
from pydantic import BaseModel
class InvoiceData(BaseModel):
invoice_number: str
date: str
total: float
vendor: str
client = genai.Client()
response = client.models.generate_content(
model='gemini-2.5-flash',
contents=[
'Extract invoice details',
genai.types.Part.from_bytes(
data=open('invoice.pdf', 'rb').read(),
mime_type='application/pdf'
)
],
config=genai.types.GenerateContentConfig(
response_mime_type='application/json',
response_schema=InvoiceData
)
)
invoice_data = InvoiceData.model_validate_json(response.text)PDF < 20MB?
├─ Yes → Use inline base64 encoding
└─ No → Use File API
Need structured JSON output?
├─ Yes → Define response_schema with Pydantic
└─ No → Get text response
Multiple queries on same PDF?
├─ Yes → Use File API + Context Caching
└─ No → Inline encoding is sufficient# Basic usage
python scripts/process-document.py --file document.pdf --prompt "Your prompt"
# With JSON output
python scripts/process-document.py --file document.pdf --prompt "Extract data" --format json
# With File API (for large files)
python scripts/process-document.py --file large-document.pdf --prompt "Summarize" --use-file-api
# Multiple prompts
python scripts/process-document.py --file document.pdf --prompt "Question 1" --prompt "Question 2"references/gemini-document-processing-report.mdreferences/quick-reference.mdreferences/code-examples.md# Check API key is set
./scripts/check-api-key.sh--use-file-api