Loading...
Loading...
Turn video moments into AI-generated hand-drawn storyboards with Gemini-powered frame analysis and social media content generation
npx skill4agent add aradotso/devtools-skills clipsketch-ai-video-storyboardSkill by ara.so — Devtools Skills collection.
git clone https://github.com/RanFeng/clipsketch-ai.git
cd clipsketch-ai
npm install.env.localGEMINI_API_KEY=your_google_gemini_api_key_heregemini-3-pro-image-previewgemini-3-pro-preview# Development mode
npm run dev
# Production build
npm run build
npm start
# Docker deployment
docker run -d --restart=always --name clipsketch-ai -p 3000:3000 earisty/clipsketch-ai:latesthttp://localhost:3000// Bilibili patterns
const bilibiliPatterns = [
/bilibili\.com\/video\/(BV\w+)/,
/b23\.tv\/\w+/
];
// Xiaohongshu patterns
const xhsPatterns = [
/xhslink\.com\/\w+/,
/xiaohongshu\.com\/.*\/(\w+)/
];// User pastes share link (can include promotional text)
const shareText = "超好看的视频!https://b23.tv/abc123 快来看看";
// App extracts and resolves to video URLSpace←/→Tinterface VideoTag {
id: string;
timestamp: number; // milliseconds
thumbnailDataUrl: string; // base64 canvas capture
videoUrl: string;
}
// Tagging implementation pattern
const captureFrame = (videoElement: HTMLVideoElement, currentTime: number): VideoTag => {
const canvas = document.createElement('canvas');
canvas.width = videoElement.videoWidth;
canvas.height = videoElement.videoHeight;
const ctx = canvas.getContext('2d');
ctx?.drawImage(videoElement, 0, 0);
return {
id: `tag-${Date.now()}`,
timestamp: currentTime * 1000,
thumbnailDataUrl: canvas.toDataURL('image/jpeg', 0.9),
videoUrl: videoElement.src
};
};import { GoogleGenAI } from '@google/genai';
const analyzeFrames = async (frames: VideoTag[], apiKey: string) => {
const genAI = new GoogleGenAI(apiKey);
const model = genAI.getGenerativeModel({ model: 'gemini-3-pro-preview' });
const prompt = `
Analyze these video frames and extract:
1. Main narrative steps
2. Key visual elements
3. Character actions
4. Scene transitions
Frames: ${frames.length}
`;
const imageParts = frames.map(frame => ({
inlineData: {
data: frame.thumbnailDataUrl.split(',')[1], // Remove data:image/jpeg;base64,
mimeType: 'image/jpeg'
}
}));
const result = await model.generateContent([prompt, ...imageParts]);
return result.response.text();
};const generateStoryboard = async (
frames: VideoTag[],
analysis: string,
apiKey: string,
customCharacter?: string // Optional character image URL
) => {
const genAI = new GoogleGenAI(apiKey);
const model = genAI.getGenerativeModel({
model: 'gemini-3-pro-image-preview'
});
const prompt = `
Create a cute hand-drawn style storyboard combining these ${frames.length} frames.
Style requirements:
- Cute, friendly illustration style
- Coherent visual narrative
- Warm color palette
- Clear panel separation
Narrative context: ${analysis}
${customCharacter ? `Integrate this character: ${customCharacter}` : ''}
`;
const imageParts = frames.map(frame => ({
inlineData: {
data: frame.thumbnailDataUrl.split(',')[1],
mimeType: 'image/jpeg'
}
}));
if (customCharacter) {
imageParts.push({
inlineData: {
data: customCharacter.split(',')[1],
mimeType: 'image/jpeg'
}
});
}
const result = await model.generateContent([prompt, ...imageParts]);
return result.response.text(); // Returns image data or URL
};const refinePanels = async (
storyboardImage: string,
panelCount: number,
apiKey: string,
useBatchAPI: boolean = false
) => {
const genAI = new GoogleGenAI(apiKey);
const model = genAI.getGenerativeModel({
model: 'gemini-3-pro-image-preview'
});
if (useBatchAPI) {
// Cost-efficient batch processing
const batchPromises = Array.from({ length: panelCount }, (_, i) => ({
prompt: `Enhance panel ${i + 1} from this storyboard in high resolution`,
image: storyboardImage
}));
// Process in batches to save API costs
const results = await model.batchGenerate(batchPromises);
return results;
} else {
// Individual requests for immediate results
const refinedPanels = [];
for (let i = 0; i < panelCount; i++) {
const result = await model.generateContent([
`Enhance and upscale panel ${i + 1} from this storyboard`,
{
inlineData: {
data: storyboardImage.split(',')[1],
mimeType: 'image/jpeg'
}
}
]);
refinedPanels.push(result.response.text());
}
return refinedPanels;
}
};const generateSocialCopy = async (
storyboardImage: string,
videoContext: string,
apiKey: string
) => {
const genAI = new GoogleGenAI(apiKey);
const model = genAI.getGenerativeModel({ model: 'gemini-3-pro-preview' });
const prompt = `
Generate 3 different styles of social media copy (for Xiaohongshu/Little Red Book):
1. Emotional Storytelling Style (200-300 chars)
- Personal, relatable narrative
- Emotional hooks
- Conversational tone
2. Tutorial/Guide Style (150-250 chars)
- Step-by-step breakdown
- Practical tips
- Clear value proposition
3. Short & Punchy Style (80-120 chars)
- Attention-grabbing opener
- Concise key message
- Call-to-action
Video context: ${videoContext}
Include relevant emojis and hashtags.
`;
const result = await model.generateContent([
prompt,
{
inlineData: {
data: storyboardImage.split(',')[1],
mimeType: 'image/jpeg'
}
}
]);
return parseCopyVariants(result.response.text());
};
interface CopyVariant {
style: 'emotional' | 'tutorial' | 'punchy';
text: string;
hashtags: string[];
}
const parseCopyVariants = (aiResponse: string): CopyVariant[] => {
// Parse AI response into structured copy variants
// Implementation depends on AI output format
return [
{ style: 'emotional', text: '...', hashtags: ['#story', '#life'] },
{ style: 'tutorial', text: '...', hashtags: ['#howto', '#tips'] },
{ style: 'punchy', text: '...', hashtags: ['#viral', '#trending'] }
];
};const generateCoverImage = async (
selectedCopy: string,
storyboardImage: string,
originalFrames: VideoTag[],
apiKey: string
) => {
const genAI = new GoogleGenAI(apiKey);
const model = genAI.getGenerativeModel({
model: 'gemini-3-pro-image-preview'
});
const prompt = `
Create a vertical video cover (9:16 aspect ratio) that:
- Incorporates key visual from the storyboard
- Highlights this copy: "${selectedCopy}"
- Optimized for social media thumbnail
- Eye-catching, high-contrast design
- Maintains hand-drawn aesthetic
Cover should work as standalone promotion image.
`;
const imageParts = [
{
inlineData: {
data: storyboardImage.split(',')[1],
mimeType: 'image/jpeg'
}
},
...originalFrames.slice(0, 3).map(frame => ({
inlineData: {
data: frame.thumbnailDataUrl.split(',')[1],
mimeType: 'image/jpeg'
}
}))
];
const result = await model.generateContent([prompt, ...imageParts]);
return result.response.text();
};import JSZip from 'jszip';
const exportTagsAsZip = async (tags: VideoTag[]) => {
const zip = new JSZip();
// Add timestamp index file
const timestampList = tags.map((tag, idx) =>
`Frame ${idx + 1}: ${formatTimestamp(tag.timestamp)}`
).join('\n');
zip.file('timestamps.txt', timestampList);
// Add frame images
for (let i = 0; i < tags.length; i++) {
const imageData = tags[i].thumbnailDataUrl.split(',')[1];
zip.file(`frame_${i + 1}.jpg`, imageData, { base64: true });
}
const blob = await zip.generateAsync({ type: 'blob' });
// Trigger download
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = `video-tags-${Date.now()}.zip`;
a.click();
URL.revokeObjectURL(url);
};
const formatTimestamp = (ms: number): string => {
const totalSeconds = Math.floor(ms / 1000);
const minutes = Math.floor(totalSeconds / 60);
const seconds = totalSeconds % 60;
const milliseconds = ms % 1000;
return `${minutes}:${seconds.toString().padStart(2, '0')}.${milliseconds.toString().padStart(3, '0')}`;
};const exportTimeline = (tags: VideoTag[]) => {
const content = tags.map((tag, idx) =>
`[${formatTimestamp(tag.timestamp)}] Frame ${idx + 1}`
).join('\n');
const blob = new Blob([content], { type: 'text/plain' });
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = `timeline-${Date.now()}.txt`;
a.click();
URL.revokeObjectURL(url);
};// Detect mobile/tablet and adjust layout
const useResponsiveLayout = () => {
const [isMobile, setIsMobile] = useState(false);
useEffect(() => {
const checkMobile = () => {
setIsMobile(window.innerWidth < 768);
};
checkMobile();
window.addEventListener('resize', checkMobile);
return () => window.removeEventListener('resize', checkMobile);
}, []);
return isMobile;
};
// Layout component
const VideoWorkspace: React.FC = () => {
const isMobile = useResponsiveLayout();
return (
<div className={isMobile ? 'flex-col' : 'flex-row'}>
<VideoPlayer className={isMobile ? 'w-full' : 'w-2/3'} />
<TagList className={isMobile ? 'w-full mt-4' : 'w-1/3 ml-4'} />
</div>
);
};// Persist tags and workspace state
const saveToIndexedDB = async (videoUrl: string, tags: VideoTag[]) => {
const db = await openDB('clipsketch-db', 1, {
upgrade(db) {
db.createObjectStore('tags', { keyPath: 'videoUrl' });
}
});
await db.put('tags', { videoUrl, tags, updatedAt: Date.now() });
};
const loadFromIndexedDB = async (videoUrl: string): Promise<VideoTag[]> => {
const db = await openDB('clipsketch-db', 1);
const record = await db.get('tags', videoUrl);
return record?.tags || [];
};gemini-3-pro-image-preview// Use referrerPolicy to bypass some restrictions
<video
crossOrigin="anonymous"
referrerPolicy="no-referrer"
src={videoUrl}
/>
// If still failing, implement server-side proxy
// Add to next.config.js or custom server
async rewrites() {
return [
{
source: '/api/proxy/:path*',
destination: 'https://external-video-cdn.com/:path*'
}
];
}const BATCH_THRESHOLD = 5; // Use batch if >5 panels
const shouldUseBatch = (panelCount: number): boolean => {
return panelCount > BATCH_THRESHOLD;
};
// Implement retry with exponential backoff for rate limits
const generateWithRetry = async (
fn: () => Promise<any>,
maxRetries: number = 3
) => {
for (let i = 0; i < maxRetries; i++) {
try {
return await fn();
} catch (error) {
if (error.status === 429 && i < maxRetries - 1) {
await new Promise(resolve =>
setTimeout(resolve, Math.pow(2, i) * 1000)
);
} else {
throw error;
}
}
}
};// Add touch gesture support for mobile
const useTouchControls = (videoRef: React.RefObject<HTMLVideoElement>) => {
const [touchStartX, setTouchStartX] = useState(0);
const handleTouchStart = (e: React.TouchEvent) => {
setTouchStartX(e.touches[0].clientX);
};
const handleTouchEnd = (e: React.TouchEvent) => {
const touchEndX = e.changedTouches[0].clientX;
const diff = touchEndX - touchStartX;
if (Math.abs(diff) > 50) { // Swipe threshold
const video = videoRef.current;
if (video) {
video.currentTime += diff > 0 ? -2 : 2; // Seek 2 seconds
}
}
};
return { handleTouchStart, handleTouchEnd };
};| Function | Model | Purpose |
|---|---|---|
| gemini-3-pro-preview | Extract narrative structure |
| gemini-3-pro-image-preview | Create hand-drawn composite |
| gemini-3-pro-image-preview | Enhance individual panels |
| gemini-3-pro-preview | Generate 3 copy variants |
| gemini-3-pro-image-preview | Create promotional thumbnail |
process.env.GEMINI_API_KEY