StormKeep Book a call
Field notes

Notes on video data infrastructure

Engineering, compliance, and intelligence writing from the StormKeep team. Long-form, technically dense, occasionally opinionated. Subscribe to the newsletter to get new pieces in your inbox.

No spam. Unsubscribe anytime. ~2 emails per month.

Filter
Latest
# the symptom $ ./pipeline.py --videos 100000 [ERROR] HTTP 403 Forbidden — 38,221 / 100,000 [ERROR] TLS fingerprint mismatch [ERROR] rate limited, sleeping 600s # the cause eight things, in order →
Technical Published

Why your in-house yt-dlp pipeline keeps breaking in 2026 (and what to do about it)

yt-dlp works great for one video. At a million videos, it becomes a full-time job. Here's what actually breaks, why, and how production teams handle it in 2026.

May 2, 2026 · 9 min read · StormKeep Engineering
Read the piece

All articles

Compliance Drafting

ISO/IEC 27037 chain of custody for YouTube video evidence: a practical guide

How to capture and preserve YouTube videos so they hold up in court, audit, or regulatory review. Aligned with ISO/IEC 27037 and ISO 27042.

Coming soon · 12 min read
Technical Drafting

YouTube bot detection in 2026: what changed and how to survive

A practical breakdown of YouTube's anti-bot stack — TLS fingerprinting, behavioral analysis, rolling cipher — and what works against each layer.

Coming soon · 11 min read
Compliance Drafting

RFC 3161 timestamping for digital evidence: a practitioner's guide

How to apply RFC 3161 timestamps to digital evidence (including YouTube video) so the capture is verifiable years later, with sample code.

Coming soon · 9 min read
Compliance Drafting

How to export YouTube videos to Relativity, Everlaw, and Logikcull

A practical export guide for e-discovery teams: file formats, load files, metadata fields, and gotchas for ingesting YouTube video into the major platforms.

Coming soon · 11 min read
Technical Drafting

Residential vs datacenter vs mobile proxies for YouTube ingestion

A 2026 comparison of proxy types for production YouTube ingestion — pricing, reliability, geographic coverage, and detection resistance.

Coming soon · 9 min read
Compliance Drafting

DMCA §1201 after the January 2026 ruling: what AI teams need to know

A US federal magistrate ruled that YouTube's rolling cipher counts as access control. Here's what that changes for AI training data sourcing.

Coming soon · 10 min read
Brand intel Drafting

Multilingual YouTube data: capturing APAC and MENA at scale

Why APAC and MENA YouTube content is structurally harder to capture, what that means for global brands, and how to set up multilingual ingestion.

Coming soon · 8 min read
Technical Drafting

Bright Data vs Oxylabs vs Apify for YouTube data ingestion: a 2026 comparison

Hands-on comparison of the three major data infrastructure vendors for YouTube ingestion — pricing, reliability, support, and where each one fits.

Coming soon · 10 min read
Compliance Drafting

Building a defensible video data sourcing methodology for AI

A template methodology document AI teams can adapt and ship to General Counsel before the next training run.

Coming soon · 11 min read

Already running into the failure modes we write about?

A 20-minute walkthrough. We'll show how a real customer's pipeline runs end-to-end. You walk out with a sized quote and a clear answer.