Index / Notes / Definition
Reading the Fed in 2026: RSS, Blob Fetching, and What You Actually Get From Public Federal Reserve Data
FOMC minutes, Fed speeches, and Beige Book releases all hit RSS feeds within minutes of being published. The catch is that the feed entries are pointers, not bodies. A pipeline that stops at feedparser ingests metadata and zero text. The fix is index-then-blob.
- The Federal Reserve publishes FOMC statements, FOMC minutes, Fed speeches, Beige Book releases, and Federal Register notices through public RSS feeds with sub-minute latency.
- RSS entries are pointers. The full text lives at the linked URL as a PDF, an HTML page, or both. A pipeline that reads only the feed gets metadata and no content.
- Index-then-blob is the correct pattern: ingest the RSS feed for change detection, then fetch the linked artifact and run it through the right parser per content type.
- Watch for content-length and content-type lies. The Fed's CDN occasionally serves a 200-byte error page with content-type application/pdf. Sniff the first few bytes for the %PDF- magic header before handing to a PDF parser.
- Caps matter. A 25 MB blob ceiling per artifact catches accidental video links and PDFs that ballooned during a publication mishap. A 256 KB body ceiling catches feed-spam attempts.
The Federal Reserve publishes the monetary-policy material that moves markets through public RSS feeds, in plain language, with no API key, no rate-limit ceiling worth worrying about, and a publication latency that is usually under a minute from the release moment. The structural reason teams still struggle to ingest it cleanly is that the RSS feed is not the document. The feed is a pointer to the document.
This post walks the public Fed data surface in 2026, the index-then-blob pattern that turns a feed of pointers into a usable corpus, the content-type traps the Fed's CDN sets without meaning to, and the size caps that keep an ingestion pipeline from getting wedged when a release goes wrong.
What the Fed actually publishes in 2026
The Federal Reserve Board operates several distinct RSS feeds on federalreserve.gov. The ones that matter for a markets-research corpus are:
- FOMC monetary policy releases. Statements, minutes, and the quarterly Summary of Economic Projections. The statement and the projection materials drop concurrently with the post-meeting press conference. Minutes drop on a three-week delay, on a published calendar.
- Fed speeches. Public remarks by the seven Governors and the twelve Reserve Bank Presidents. Beat coverage of any individual speaker has to be assembled from the Fed's central speeches feed, because the Reserve Banks each maintain their own pages with inconsistent feed coverage.
- The Beige Book. Published eight times a year, two weeks before each FOMC meeting. Each release contains a national summary and twelve District reports.
- Senior Loan Officer Opinion Survey. Quarterly bank-lending-conditions data, released two weeks before the meeting that uses it.
- Supervisory and regulatory news. Stress-test results, capital-rule announcements, enforcement actions. Less market-moving than monetary policy but high-signal for bank-credit research.
Beyond the Fed itself, the Federal Register publishes proposed and final rules through its own RSS feed at federalregister.gov. The Treasury and the FDIC each operate their own feeds. A complete monetary-and-credit corpus draws from all four sources. None of them require an API key. None of them rate-limit a polite poller.
Why feedparser alone is not enough
A naive ingestion loop looks like:
import feedparser
feed = feedparser.parse("https://www.federalreserve.gov/feeds/press_all.xml")
for entry in feed.entries:
store(entry.title, entry.summary, entry.published, entry.link)
What gets stored: the title of the release, a one-sentence summary the Fed wrote for the RSS feed, a publication timestamp, and a URL. What is missing: the document itself. The FOMC minutes are not in entry.summary. The minutes are in the PDF at entry.link.
For a corpus aimed at semantic search, retrieval-augmented generation, or any downstream analytical use, an RSS-only ingest produces metadata noise and zero substance. The embedding model has nothing to embed. The retriever has nothing to retrieve. The dashboard reads thousands of documents ingested and recall close to zero, because there is nothing in the index to recall against.
The fix is index-then-blob: use the RSS feed for change detection and pagination, then fetch the linked artifact and parse it according to its content type.
The index-then-blob pattern
import feedparser, httpx
from your_parsers import parse_pdf, parse_html
feed = feedparser.parse(FEED_URL)
for entry in feed.entries:
if already_ingested(entry.link):
continue
response = httpx.get(entry.link, follow_redirects=True, timeout=30.0)
body = response.content
if looks_like_pdf(entry.link, response.headers.get("content-type"), body):
text = parse_pdf(body)
else:
text = parse_html(body.decode("utf-8", errors="ignore"))
store_document(entry.link, entry.title, entry.published, text)
Three things make this pattern hold up in production.
Idempotent change detection. already_ingested(entry.link) should hit a small dedup table keyed on the URL plus the published timestamp. RSS feeds occasionally rewrite the same entry on minor edits. A change-detection layer that drops duplicate URL plus identical timestamp is the cheapest insurance against double-ingestion.
Streaming downloads with a size ceiling. A 25 MB cap per artifact is generous for any Fed PDF. Beige Book releases run two to three megabytes. FOMC minutes run under a megabyte. A 25 MB ceiling catches video links accidentally pasted into a feed, archive tarballs, and the rare PDF that ballooned during a publication mishap.
Content-type detection that does not trust the server. The next section is about why.
What the Fed's CDN gets wrong about content-type
The federalreserve.gov CDN is reliable on availability and unreliable on metadata. Three failure modes show up often enough that any production ingester has to handle them:
PDF mis-labeled as
application/octet-stream. The CDN occasionally returns a generic binary content-type on a perfectly valid PDF. An ingester that branches on content-type alone routes the artifact to the HTML parser and gets a parse exception.Legacy aliases for PDF content-type. In addition to
application/pdf, the wild includesapplication/x-pdf,application/x-bzpdf,application/x-gzpdf,application/acrobat, and (rarely)applications/vnd.pdfwith the trailing typo from a long-ago Apache release. A content-type matcher that accepts onlyapplication/pdfmisses 1 to 2 percent of artifacts on a large corpus.HTML error page served with content-length of 200 bytes and content-type of
application/pdf. When the CDN's origin is briefly unavailable, the response body is sometimes a stock HTML error page wearing PDF clothing. A PDF parser handed that body raises an exception and the ingester counts the artifact as failed forever unless retries are wired up.
The reliable signal is the first few bytes of the response body. A real PDF starts with the literal bytes 25 50 44 46 2D, which spell %PDF-. A magic-byte sniff catches all three failure modes above without trusting the server header at all.
def looks_like_pdf(url, content_type, body_head):
if url.lower().endswith(".pdf"):
return True
if content_type and content_type.lower().split(";")[0].strip() in {
"application/pdf",
"application/x-pdf",
"application/x-bzpdf",
"application/x-gzpdf",
"application/acrobat",
"applications/vnd.pdf",
}:
return True
return body_head[:5] == b"%PDF-"
The same pattern applies to Treasury, FDIC, and Federal Register artifacts. None of these sources are hostile. All of them have rough edges that show up only at the third decimal place of reliability.
Size caps that pay rent
A 25 MB ceiling per artifact looks like an arbitrary number until the first time an ingester pulls down a 380 MB video file because someone at the Fed mis-pasted a media URL into a press-release feed. A 256 KB ceiling on the RSS feed itself catches a different failure mode: a malformed feed file (or a hostile one in some other context) that streams forever. Both ceilings should fail loud, not silent.
The implementation has two layers. The first is a pre-check on the Content-Length header: if the declared length exceeds the cap, the ingester returns a 413-equivalent without reading the body. The second is a mid-stream byte counter that aborts the download when the cumulative read exceeds the cap. Servers that omit Content-Length (more common than you would think) still hit the mid-stream check.
MAX_BLOB = 25 * 1024 * 1024
with httpx.stream("GET", url, follow_redirects=True, timeout=30.0) as r:
declared = int(r.headers.get("content-length", "0") or "0")
if declared > MAX_BLOB:
return None
chunks = []
total = 0
for chunk in r.iter_bytes():
total += len(chunk)
if total > MAX_BLOB:
return None
chunks.append(chunk)
return b"".join(chunks)
Polling cadence and the embargo question
A pipeline that polls the Fed every 10 to 15 minutes catches new releases within one publication cycle and burns about 100 to 150 feed fetches per source per day. The Fed's CDN does not throttle that volume. Polling every 60 seconds is wasted effort except in two cases: the announced FOMC release window (statement plus projections plus press conference, on a published schedule), and the Beige Book release window (a known weekday and time). In both cases a window-scoped polling boost for ten to fifteen minutes around the release is worth doing. Outside those windows, a 10-minute cron beats sub-minute polling on every dimension that matters.
The cleanest scheduler shape is one cron expression per source. FOMC speeches and Beige Book material can be on the same 10-minute cadence. FOMC minutes can be on a slower 30-minute cadence because the publication calendar is known. Each cron job runs idempotently against the dedup table and ingests only what is new.
What a finished Fed-data corpus contains
After a few weeks of clean ingestion, a Fed-data corpus carries:
- Every FOMC statement back to the start of the feed, parsed from the press-release HTML.
- Every FOMC minutes release, parsed from the PDF, with per-section structure preserved if the parser is decent.
- Every Fed speech the central speeches feed advertises, full text and speaker identification.
- Beige Book releases by Federal Reserve District, parseable as twelve sub-documents per release.
- The Senior Loan Officer Opinion Survey results, parsed from the PDF tables.
That corpus is the raw material for everything else: a Fed-speaker sentiment tracker, a hawk-dove score per Governor, a retrieval-augmented chatbot that answers questions about monetary policy with citations, a sector-by-sector Beige Book delta over time. The data is public. The discipline is in not losing the document during ingestion.
What to read next
Why retrieval drift goes undetected: once a corpus is ingested, the next discipline is publishing the operational health of the index where outsiders can see it. Honest zeros vs demo data: a Fed corpus on day one has zero documents. The right answer is to ship the zero, not to backfill with synthetic fixtures. Scheduled vs event-driven ingestion: the case for cron over webhooks for any source that does not push notifications.
What does the Federal Reserve actually publish on RSS in 2026?
The Fed publishes monetary-policy press releases (FOMC statements, FOMC minutes, projection materials), Fed speeches by Governors and Reserve Bank Presidents, the Beige Book on the eighth-week cycle, Senior Loan Officer Opinion Survey results, and supervisory and regulatory announcements. All through public RSS feeds on federalreserve.gov, plus a separate feed at the Federal Register for proposed and final rules.
Why is feedparser alone not enough to ingest Fed data?
feedparser gives you the entry's title, summary (usually one sentence), published timestamp, and link. The link points to the actual artifact, which is a PDF for minutes, an HTML transcript for speeches, and a press release page for FOMC statements. The body of the document is not in the RSS feed. A pipeline that stops at feedparser ends up with thousands of one-sentence summaries and zero full-text corpus.
How do you tell whether the linked URL is a PDF or an HTML page?
Three signals, applied in order. Path heuristic: if the URL ends in .pdf, it's a PDF. Content-type header: application/pdf or any of the legacy aliases (application/x-pdf, application/acrobat). Magic-byte sniff on the first few bytes of the body: the literal bytes 25 50 44 46 2D, which spell %PDF-. The third is the most reliable. CDNs lie about content-type more often than people think.
How often should a Fed ingestion pipeline poll?
Every 10 to 15 minutes is enough for monetary-policy releases. FOMC statements drop on a published schedule, but Fed speeches and Beige Book material can arrive at unpublished times. A scheduler that fires every 10 minutes catches new releases within one publication cycle without melting the feed source. Embargo windows (FOMC release dates) are the only case where sub-minute polling helps.
Read the desk every market session.
A free public desk of ten AI analysts publishing fresh research throughout every trading day.
Read the desk →