---
name: ai-image-generation
description: Integrate AI image generation APIs (DALLE-3, Stability AI, Replicate) into applications. Includes API setup, prompt engineering for renovation/interior design, cost comparison, and fallback strategies.
trigger:
  - "integrate DALLE"
  - "add image generation"
  - "Stability AI API"
  - "Replicate image generation"
  - "生成装修效果图"
  - "AI 图像生成"
---

# AI Image Generation Integration

This skill covers integrating AI image generation services into your application, with a focus on **interior design/renovation use cases**.

## Service Comparison

| Service | Price (1024x1024) | Quality | Setup Difficulty |
|---------|-------------------|---------|----------------------|
| **DALLE-3 (OpenAI)** | $0.04/image | ⭐⭐⭐⭐⭐ Photo-realistic | Easy (API key) |
| **Stability AI (SDXL)** | $0.002-0.005/image | ⭐⭐⭐⭐ Great quality | Easy (API key) |
| **Replicate** | $0.001-0.002/image | ⭐⭐⭐ Good | Medium (model selection) |
| **Local Stable Diffusion** | $0 (GPU cost) | ⭐⭐⭐ Customizable | Hard (GPU setup) |

**Recommendation for renovation apps**: Start with **DALLE-3** for quality, then optionally switch to **Stability AI** for cost savings at scale.

---

## Integration Pattern (Python FastAPI)

### Step 1: Environment Setup

Create `.env.example` (never commit real keys):
```
OPENAI_API_KEY=your_openai_api_key_here

# Image Settings
IMAGE_SIZE=1024x1024
IMAGE_QUALITY=hd
```

Install dependencies:
```bash
pip install openai httpx python-dotenv pillow
```

### Step 2: API Integration Code

```python
from fastapi import FastAPI
import httpx
import os
from dotenv import load_dotenv

load_dotenv()

app = FastAPI()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

async def generate_image(prompt: str, size: str = "1024x1024") -> bytes:
    """Generate image using DALLE-3"""
    if not OPENAI_API_KEY:
        return generate_placeholder()  # Fallback
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "https://api.openai.com/v1/images/generations",
            headers={"Authorization": f"Bearer {OPENAI_API_KEY}"},
            json={
                "model": "dalle-3",
                "prompt": prompt,
                "size": size,
                "quality": "standard",
                "n": 1
            },
            timeout=60.0
        )
        
        if response.status_code != 200:
            return generate_placeholder()
        
        image_url = response.json()["data"][0]["url"]
        img_response = await client.get(image_url)
        return img_response.content
```

### Step 3: Style-Specific Prompts (Renovation Use Case)

**Critical**: To avoid all styles looking similar, add **5-8 specific visual elements** per style. Generic prompts like "clean lines, white walls" produce similar results across styles.

```python
def generate_renovation_prompt(style: str, layout_id: str, room_type: str = "living_room") -> tuple:
    """Generate differentiated prompts for interior design styles.
    Returns: (prompt, negative_prompt)
    """
    
    # Room type mapping
    room_prompts = {
        "living_room": "living room, sofa, coffee table, TV cabinet",
        "bedroom": "bedroom, bed, nightstand, wardrobe",
        "kitchen": "kitchen, cabinets, countertops, sink",
    }
    room_desc = room_prompts.get(room_type, room_prompts["living_room"])
    
    # Style prompts with HIGH DIFFERENTIATION (5-8 specific visual elements each)
    style_prompts = {
        "现代简约": f"ultra modern minimalist {room_type}, floor-to-ceiling windows, pure white walls, no decorations, sleek glass coffee table, hidden LED strip lighting, open concept, IKEA-style furniture, concrete floor, high ceiling, 8K realistic photo",
        "modern minimalist": f"ultra modern minimalist {room_type}, floor-to-ceiling windows, pure white walls, no decorations, sleek glass coffee table, hidden LED strip lighting, open concept, IKEA-style furniture, concrete floor, high ceiling, 8K realistic photo",
        
        "北欧风": f"authentic Scandinavian {room_type}, light birch wood floor, sheepskin rug, hygge atmosphere, pastel color palette (mint, blush, sky blue), pendant lamp with fabric shade, minimalist bookshelf, Monstera plant, white walls with wood accents, cozy textiles, 8K realistic photo",
        "Scandinavian": f"authentic Scandinavian {room_type}, light birch wood floor, sheepskin rug, hygge atmosphere, pastel color palette (mint, blush, sky blue), pendant lamp with fabric shade, minimalist bookshelf, Monstera plant, white walls with wood accents, cozy textiles, 8K realistic photo",
        
        "中式风格": f"traditional Chinese {room_type}, dark rosewood furniture with intricate carvings, red lacquer cabinets, golden dragon motifs, silk embroidered cushions, paper lanterns hanging from ceiling, moon gate entrance, bonsai tree, calligraphy wall art, terra cotta floor tiles, 8K realistic photo",
        "Chinese traditional": f"traditional Chinese {room_type}, dark rosewood furniture with intricate carvings, red lacquer cabinets, golden dragon motifs, silk embroidered cushions, paper lanterns hanging from ceiling, moon gate entrance, bonsai tree, calligraphy wall art, terra cotta floor tiles, 8K realistic photo",
        
        "地中海风格": f"authentic Mediterranean {room_type}, white stucco walls with blue accents, arched doorways and windows, terracotta floor tiles, mosaic tile borders in blue and white, exposed wooden ceiling beams, wicker furniture with blue cushions, large clay pots with olive trees, sea view through window, wrought iron details, 8K realistic photo",
        "Mediterranean": f"authentic Mediterranean {room_type}, white stucco walls with blue accents, arched doorways and windows, terracotta floor tiles, mosaic tile borders in blue and white, exposed wooden ceiling beams, wicker furniture with blue cushions, large clay pots with olive trees, sea view through window, wrought iron details, 8K realistic photo",
        "mediterranean": f"authentic Mediterranean {room_type}, white stucco walls with blue accents, arched doorways and windows, terracotta floor tiles, mosaic tile borders in blue and white, exposed wooden ceiling beams, wicker furniture with blue cushions, large clay pots with olive trees, sea view through window, wrought iron details, 8K realistic photo",
    }
    
    base_prompt = style_prompts.get(style, style_prompts["现代简约"])
    
    # Combine with room description
    prompt = f"{base_prompt}, {room_desc}, professional architectural photography, Canon EOS R5, 24mm lens, f/8, high detail"
    
    # Negative prompt to prevent style mixing
    negative_prompt = "cartoon, drawing, sketch, low quality, blurry, distorted, ugly, bad anatomy, watermark, text, signature, other styles mixed, messy, cluttered"
    
    return prompt, negative_prompt
```

**Key Practices**:
1. **Add both Chinese and English keys** in `style_prompts` to handle backend sending either language
2. **Each style needs 5-8 unique visual elements** (materials, colors, furniture, decorations)
3. **Return tuple (prompt, negative_prompt)** - negative prompts prevent style confusion
4. **Include room_type in prompt** using f-string formatting
5. **Avoid generic suffixes** like "floor plan view, photorealistic" that make all styles look similar

---

## Backend Service Integration Pattern

When building a multi-service architecture (e.g., FastAPI backend + AI service), use this pattern:

### AI Service Endpoint (returns JSON with base64)
```python
# ai-service/app.py - Stability AI integration
@app.post("/inference")
async def inference(style: str = "现代简约", layout_id: str = None, room_type: str = "living_room"):
    """Generate image and return JSON (not binary) for easy backend consumption"""
    prompt = generate_prompt(style, layout_id)
    image_bytes = await generate_with_stability(prompt)
    
    image_base64 = base64.b64encode(image_bytes).decode('utf-8')
    return {
        "image_base64": image_base64,
        "style": style,
        "layout_id": layout_id,
        "room_type": room_type,
        "message": "生成成功"
    }
```

### Backend Calling AI Service
```python
# backend/main.py - Calling AI service
AI_SERVICE_URL = os.getenv("AI_SERVICE_URL", "http://ai-service:8001")

async def call_ai_service(style: str, layout_id: str, room_type: str):
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"{AI_SERVICE_URL}/inference",
            params={
                "style": style,
                "layout_id": layout_id,
                "room_type": room_type
            },
            timeout=60.0
        )
        
        if response.status_code == 200:
            data = response.json()
            if "image_base64" in data:
                return data["image_base64"]  # Return base64 string
            else:
                print(f"Missing image_base64 in response: {data.keys()}")
                return None
        else:
            print(f"AI service error: {response.status_code} - {response.text}")
            return None
```

**Critical**: Always call `/inference` (POST with params), NOT `/inference/{i}` (old pattern).

---

## Fallback Strategy

Always implement a fallback for when the API is unavailable:

```python
def generate_placeholder(width=512, height=512) -> bytes:
    """Fallback placeholder image"""
    from PIL import Image
    import io
    img = Image.new('RGB', (width, height), (240, 240, 240))
    img_byte_arr = io.BytesIO()
    img.save(img_byte_arr, format='PNG')
    return img_byte_arr.getvalue()
```

---

## Docker Deployment

Update `docker-compose.yml` to pass environment variables:

```yaml
ai-service:
  build: ./ai-service
  ports:
    - "8001:8001"
  environment:
    - OPENAI_API_KEY=${OPENAI_API_KEY}
  env_file:
    - ./ai-service/.env
  restart: unless-stopped
```

**Critical**: Add `.env` to `.gitignore` to prevent committing API keys!

---

## Cost Optimization

| Strategy | Savings |
|----------|---------|
| Use Stability AI instead of DALLE-3 | **80-90%** |
| Generate 512x512 instead of 1024x1024 | **50%** |
| Cache results (same prompt + style) | **100%** for repeats |
| Use Replicate with SDXL model | **95%** |

---

## Pitfalls

- **Never commit API keys**: Always use `.env` file + `python-dotenv`, and add `.env` to `.gitignore`
- **API key exposed in chat**: If user pastes `sk-...` in chat, warn them to rotate immediately at https://platform.openai.com/api-keys
- **Timeout issues**: Image generation can take 30-60 seconds. Set `timeout=60.0` in httpx
- **Web client compatibility**: When serving to Flutter Web, ensure:
  - API returns proper CORS headers
  - Image is base64-encoded or accessible via URL
  - Backend handles `application/octet-stream` MIME type for uploads
- **Prompt too long**: DALLE-3 has a 4000 character limit. Keep prompts concise but descriptive
- **Wrong endpoint pattern**: Don't call `/inference/{i}` (this will return 404). Use `/inference` POST with query params (`style`, `layout_id`, `room_type`)
- **AI service returns binary instead of JSON**: If your backend expects JSON, ensure AI service returns `{"image_base64": "..."}` not binary image via `StreamingResponse`. Modify endpoint to return JSON
- **Frontend undefined error**: Always check `data.plans` exists and is an array before calling `.map()`. Add defensive check: `if (!data.plans || !Array.isArray(data.plans))`
- **Docker service communication**: In docker-compose, set `AI_SERVICE_URL=http://ai-service:8001` (service name), not `localhost`. Verify with `docker-compose exec backend printenv | grep AI_SERVICE`
- **Stability AI model ID**: Use `stable-diffusion-xl-1024-v1-0` for 1024x1024 images. Other model IDs may cause "allowed dimensions" errors
- **Base64 response handling**: Stability AI returns base64 in `data["artifacts"][0]["base64"]`. Decode with `base64.b64decode(image_base64)`
- **Style prompts too generic**: Generic prompts like "clean lines, white walls" make ALL styles look similar. Add **5-8 specific visual elements** per style (materials, colors, furniture, decorations). See "Style-Specific Prompts" section
- **Missing bilingual keys**: If backend sends English keys (e.g., "Mediterranean") but prompt dict only has Chinese keys (e.g., "地中海风格"), it will fallback to default style. **Always add both Chinese and English keys** in `style_prompts` dict
- **Negative prompts not used**: Not using `negative_prompt` can cause style mixing. Always pass `{"text": prompt, "weight": 1}` and `{"text": negative_prompt, "weight": -1}` in `text_prompts` array
- **Function returns string not tuple**: If your `generate_prompt()` was updated to return `(prompt, negative_prompt)`, ensure ALL callers unpack it correctly: `prompt, negative_prompt = generate_prompt(style, layout_id, room_type)`
- **Backend container 404 on valid route**: If backend returns 404 on existing route, check: (1) Python syntax is valid (`python -m py_compile`), (2) FastAPI app loaded correctly (check startup logs), (3) No route conflicts. Restart container with `docker-compose up -d --build` to ensure code updates
- **Memory storage lost on restart**: Using in-memory `dict` (like `layouts_db = {}`) loses data on container restart. Uploaded files + layout_ids become invalid. Use persistent storage (PostgreSQL) for production
- **Silent fallback to placeholders from missing API keys**: If any integrated image service (e.g., Leonardo AI) has no API key set, the code will fallback to blank placeholder images (solid color, ~1.9KB). These are hard to detect visually. **Detection steps**:
  1. Check file size: Placeholders are ~1-2KB, real images are 1-2MB (1024x1024 PNG)
  2. Check unique color count: Run `python3 -c "from PIL import Image; import numpy as np; img=Image.open('image.png'); arr=np.array(img); print(len(np.unique(arr.reshape(-1,3), axis=0)))"` → 1 for placeholder, 200k+ for real
  3. Check mean pixel value: Placeholders are often ~240 (white), real images have varied means (~100-150)
- **Inconsistent service usage when switching providers**: When switching from Stability AI to Leonardo AI (or vice versa), audit ALL endpoints (e.g., `/inference`, `/inference/all`) to ensure they use the new service. Missing updates to secondary endpoints will cause mixed service behavior and unexpected costs.
- **Leonardo AI integration missing key**: Leonardo AI requires `LEONARDO_API_KEY` in `.env`. If not set, `generate_with_leonardo` raises an error, triggering fallback to placeholder. Always verify all required keys are set with:
  ```bash
  python3 -c "import os; from dotenv import load_dotenv; load_dotenv(); print('LEONARDO_API_KEY:', 'SET' if os.getenv('LEONARDO_API_KEY') else 'NOT SET')"
  ```

---

## Switching to Cheaper Services

### Stability AI (Recommended for cost savings)

**Cost**: $0.002-0.005/1024x1024 image (80-90% cheaper than DALLE-3).  
**Model**: Use `stable-diffusion-xl-1024-v1-0` for 1024x1024 images.

**Working Integration Pattern (HTTP API, not SDK)**:
```python
import httpx
import base64
import os

STABILITY_API_KEY = os.getenv("STABILITY_API_KEY")

async def generate_with_stability(prompt: str, negative_prompt: str = "", size: str = "1024x1024") -> bytes:
    """Generate image using Stability AI HTTP API (verified working pattern)"""
    if not STABILITY_API_KEY:
        return generate_placeholder_image()  # Fallback
    
    try:
        width, height = map(int, size.split('x'))
        
        # Build text_prompts with negative prompt
        text_prompts = [{"text": prompt, "weight": 1}]
        if negative_prompt:
            text_prompts.append({"text": negative_prompt, "weight": -1})
        
        async with httpx.AsyncClient() as client:
            response = await client.post(
                "https://api.stability.ai/v1/generation/stable-diffusion-xl-1024-v1-0/text-to-image",
                headers={
                    "Authorization": f"Bearer {STABILITY_API_KEY}",
                    "Content-Type": "application/json",
                    "Accept": "application/json"
                },
                json={
                    "text_prompts": text_prompts,
                    "cfg_scale": 7,
                    "clip_guidance_preset": "NONE",
                    "height": height,
                    "width": width,
                    "samples": 1,
                    "steps": 30,
                },
                timeout=60.0
            )
            
            if response.status_code != 200:
                print(f"Stability AI API Error: {response.status_code} - {response.text}")
                return generate_placeholder_image()
            
            data = response.json()
            # Stability AI returns base64 in artifacts[0]["base64"]
            image_base64 = data["artifacts"][0]["base64"]
            image_bytes = base64.b64decode(image_base64)
            return image_bytes
            
    except Exception as e:
        print(f"Error calling Stability AI API: {e}")
        return generate_placeholder_image()
```

**Critical**: Stability AI returns **base64-encoded images** in JSON response, NOT binary image directly. You must decode `data["artifacts"][0]["base64"]`.

### Leonardo AI (Cheapest - 50% less than Stability AI)

**Cost**: $0.0015/1024x1024 image (50% cheaper than Stability AI).  
**API Docs**: https://docs.leonardo.ai/

**Integration Pattern**:
```python
import httpx
import os

LEONARDO_API_KEY = os.getenv("LEONARDO_API_KEY")
LEONARDO_API_URL = "https://cloud.leonardo.ai/api/rest/v1"

async def generate_with_leonardo(prompt: str, size: str = "1024x1024") -> bytes:
    """Generate image using Leonardo AI (cheapest option)"""
    if not LEONARDO_API_KEY:
        raise ValueError("LEONARDO_API_KEY not set")
    
    width, height = map(int, size.split('x'))
    
    async with httpx.AsyncClient() as client:
        # Step 1: Create generation job
        response = await client.post(
            f"{LEONARDO_API_URL}/generations",
            headers={
                "Authorization": f"Bearer {LEONARDO_API_KEY}",
                "Content-Type": "application/json"
            },
            json={
                "prompt": prompt,
                "width": width,
                "height": height,
                "num_images": 1,
                "modelId": "6b645e3a-d64f-4341-a370-20de11a56288",  # SDXL model
                "guidance_scale": 7,
                "num_inference_steps": 30
            },
            timeout=30.0
        )
        
        if response.status_code != 200:
            raise Exception(f"Leonardo API Error: {response.text}")
        
        generation_id = response.json()["sdGenerationJob"]["generationId"]
        
        # Step 2: Poll for completion (simplified)
        import asyncio
        await asyncio.sleep(10)  # Wait for generation
        
        # Step 3: Get result
        result = await client.get(
            f"{LEONARDO_API_URL}/generations/{generation_id}",
            headers={"Authorization": f"Bearer {LEONARDO_API_KEY}"}
        )
        
        data = result.json()
        image_url = data["generations_by_pk"]["generated_images"][0]["url"]
        
        # Step 4: Download image
        img_response = await client.get(image_url)
        return img_response.content
```

**Cost Comparison (per 1000 images)**:
- DALLE-3: $40
- Stability AI: $3
- **Leonardo AI: $1.50** ✅ Cheapest

### Replicate (Most flexible)

```python
# Install: pip install replicate
import replicate

output = replicate.run(
    "stability-ai/sdxl:latest",
    input={"prompt": prompt, "width": 512, "height": 512}
)
```

---

## Verification

After integration:
1. Test with `curl -X POST http://localhost:8001/inference?style=现代简约`
2. Verify image is returned (not placeholder)
3. Check API costs in OpenAI/Stability dashboard
4. Test fallback (temporarily remove API key)

---

## References

- **Material Database Pattern**: See `references/material-database-pattern.md` for JSON structure and recommendation algorithm used in renovation apps.
- **Debug Blank Images**: See `references/debug-blank-images.md` for script to detect placeholder/silent fallback images.
