If you have ever embedded an audio player on a podcast site, a music portfolio, or a media archive, you have probably noticed that pretty rendered waveform behind the playhead. Libraries like Wavesurfer.js and Peaks.js can draw those visuals on the client, but decoding a 60-minute MP3 in the browser is slow, memory-hungry, and unreliable on mobile. The clean solution is to pre-compute the waveform data on the server and serve it as a tiny JSON file. That is exactly what d9media/audiowaveform-server does – it wraps the BBC audiowaveform CLI inside a small Flask API so any application can POST an audio file and receive ready-to-render waveform JSON.
What Does This Thing Actually Do?
The audiowaveform-server is a lightweight Alpine-based Docker image that exposes a single HTTP endpoint. You upload an audio file, choose a resolution and a bit depth, and the server returns a JSON document containing the peak data needed to render a visual waveform anywhere – in WordPress, a Vue dashboard, a static site, a mobile app, you name it.
Under the hood it combines two well-known pieces:
- BBC audiowaveform – The reference CLI built by the BBC R&D team for generating waveform data from MP3, WAV, FLAC, and Ogg Vorbis sources
- Flask – A minimal Python web framework that exposes the CLI through a REST endpoint
- Alpine Linux – Keeps the resulting image around 217 MB so it boots in seconds and runs comfortably on a small VPS
- Multipart File Upload – Accepts audio files via standard
multipart/form-data, which means every HTTP client and every CMS can talk to it - Dynamic Parameters – The
resolution(zoom factor) andbit_depthare passed per request, so you can generate compact 8-bit overviews for thumbnails or detailed 16-bit data for full editors - JSON Output – Returns the waveform array directly in the response body, ready to be cached or piped into Wavesurfer.js, Peaks.js, or any custom canvas renderer
- Stateless Design – No database, no queue, no auth layer in the way; it does one thing per request and gets out of the way
The original use case mentioned by the maintainer is integrating with WordPress to show waveforms on podcast posts, but the API is generic enough for any backend that can POST a file.
Docker Compose Setup
Because the service holds no persistent state and uses a single container, the compose file is refreshingly small. Here is a complete configuration that adds a small uploads cache and binds the API to a sensible internal port:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
services: audiowaveform: image: d9media/audiowaveform-server:latest container_name: audiowaveform-server restart: unless-stopped ports: - "5000:5000" environment: FLASK_ENV: "production" MAX_CONTENT_LENGTH: "104857600" TMPDIR: "/tmp/audiowaveform" volumes: - audiowaveform-tmp:/tmp/audiowaveform networks: - audiowaveform-network healthcheck: test: ["CMD", "wget", "--spider", "-q", "http://localhost:5000/"] interval: 30s timeout: 5s retries: 3 volumes: audiowaveform-tmp: networks: audiowaveform-network: |
For most installations the defaults work out of the box and you can trim the compose file down to just the image, port, and restart policy. The named volume only matters if you process very large files and want to keep the temporary directory off the container’s writable layer.
Installation Steps
1. System Requirements
This is one of the lightest containers you will ever run, but audiowaveform itself is CPU-bound when decoding compressed audio. Plan for:
- Docker and Docker Compose installed
- 512 MB RAM minimum (1 GB comfortable)
- 1 vCPU for casual use, 2+ vCPUs if you process long files in parallel
- Disk space proportional to the largest audio file you intend to upload (the file is buffered in
/tmpduring processing)
2. Create the Project Directory
|
1 2 3 4 |
mkdir -p ~/docker/audiowaveform cd ~/docker/audiowaveform |
3. Save the Compose File
Create a docker-compose.yml in the new directory and paste the configuration from above.
4. Pull the Image
|
1 2 3 |
docker compose pull |
The image is around 217 MB so the pull is quick on most connections.
5. Launch the Service
|
1 2 3 |
docker compose up -d |
6. Verify the Container is Running
|
1 2 3 4 |
docker compose ps docker compose logs -f audiowaveform |
7. Generate Your First Waveform
Pick any local MP3 or WAV file and post it to the API. The endpoint expects a multipart/form-data payload with three fields: the file, the resolution, and the bit depth.
|
1 2 3 4 5 6 7 |
curl -X POST http://localhost:5000/generate_waveform \ -F "file=@/path/to/episode.mp3" \ -F "resolution=4096" \ -F "bit_depth=8" \ -o waveform.json |
If the call succeeds, waveform.json contains the peak array, sample rate, and the metadata needed by any waveform renderer.
Environment Variables Explained
The image is intentionally minimal and does not require environment variables to start. Everything that matters is passed per request. The variables listed here are useful adjustments for production deployments.
FLASK_ENV
Purpose: Sets the Flask runtime mode.
Format: String (production or development)
Default: production
|
1 2 3 |
FLASK_ENV=production |
Always keep this on production for live deployments. The development mode enables verbose stack traces and the debugger, which is a security risk on a public host.
MAX_CONTENT_LENGTH
Purpose: Maximum upload size accepted by the Flask server, expressed in bytes.
Format: Integer (bytes)
Example: 104857600 for 100 MB
|
1 2 3 |
MAX_CONTENT_LENGTH=104857600 |
Increase this when processing long-form content like full DJ mixes or audiobooks. A two-hour 320 kbps MP3 is roughly 280 MB, so a 300 MB ceiling is a safe value for podcasting workflows.
TMPDIR
Purpose: Directory used to store the uploaded file while audiowaveform reads it.
Format: Absolute filesystem path
Default: /tmp
|
1 2 3 |
TMPDIR=/tmp/audiowaveform |
Pointing this at a named volume keeps the temporary writes off the container layer and lets the host monitor disk usage if you process large batches.
PYTHONUNBUFFERED
Purpose: Forces Python to flush stdout/stderr immediately so logs appear in real time.
Format: Integer (1 to enable)
|
1 2 3 |
PYTHONUNBUFFERED=1 |
Useful when you tail the container logs with docker compose logs -f and want to see Flask output without buffering delays.
GUNICORN_WORKERS (Optional)
Purpose: If you swap the built-in Flask development server for Gunicorn behind a reverse proxy, this controls the worker count.
Format: Integer
Default: Not set (the image runs Flask directly)
|
1 2 3 |
GUNICORN_WORKERS=4 |
A common rule of thumb is 2 * CPU + 1. Only relevant if you build a derivative image with Gunicorn baked in.
Volume Mounts Explained
audiowaveform-tmp
Purpose: Holds uploaded audio files briefly while audiowaveform decodes them.
Mount Point: /tmp/audiowaveform
|
1 2 3 |
- audiowaveform-tmp:/tmp/audiowaveform |
The container does not require a persistent volume to function – everything lives in memory or in /tmp for the duration of a request. A named volume is still useful because it keeps temporary I/O off the container’s overlay filesystem and survives container recreations.
Bind Mount for Bulk Processing (Optional)
Purpose: If you batch-process a fixed library of audio files, mount the source directory into the container so you can shell in and run the CLI directly.
Mount Point: /audio (or any path you choose)
|
1 2 3 4 |
volumes: - /srv/podcasts:/audio:ro |
Mounting read-only protects the original library from accidental writes. You can still docker exec into the container and call the underlying audiowaveform binary directly when you need raw control.
Using the API
Endpoint Reference
The image exposes a single endpoint:
|
1 2 3 4 5 6 7 8 9 |
POST /generate_waveform Content-Type: multipart/form-data Fields: file (required) - audio file binary resolution (optional) - integer, samples per pixel, defaults around 256 bit_depth (optional) - integer, 8 or 16 |
cURL Example
|
1 2 3 4 5 6 7 |
curl -X POST http://localhost:5000/generate_waveform \ -F "file=@track.mp3" \ -F "resolution=2048" \ -F "bit_depth=8" \ -H "Accept: application/json" |
PHP Example
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
<?php $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, 'http://audiowaveform:5000/generate_waveform'); curl_setopt($ch, CURLOPT_POST, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_POSTFIELDS, [ 'file' => new CURLFile('/var/uploads/episode.mp3'), 'resolution' => 4096, 'bit_depth' => 8, ]); $response = curl_exec($ch); curl_close($ch); $waveform = json_decode($response, true); file_put_contents('/var/cache/waveforms/episode.json', $response); |
Node.js Example
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
import fs from "node:fs"; import FormData from "form-data"; import fetch from "node-fetch"; const form = new FormData(); form.append("file", fs.createReadStream("./song.wav")); form.append("resolution", 1024); form.append("bit_depth", 16); const res = await fetch("http://localhost:5000/generate_waveform", { method: "POST", body: form, }); const waveform = await res.json(); fs.writeFileSync("./song.waveform.json", JSON.stringify(waveform)); |
Wavesurfer.js Frontend
Once the JSON is generated, feed it straight into Wavesurfer.js to skip client-side decoding entirely:
|
1 2 3 4 5 6 7 8 9 10 11 |
const wavesurfer = WaveSurfer.create({ container: "#waveform", waveColor: "#9ca3af", progressColor: "#2563eb", height: 96, }); const peaks = await fetch("/cache/episode.json").then(r => r.json()); wavesurfer.load("/audio/episode.mp3", peaks.data); |
Common Use Cases
Podcast Hosting
Run the server alongside your CMS and pre-generate a waveform JSON every time an episode is uploaded. Visitors load a tiny JSON file instead of decoding a 100 MB MP3 in JavaScript, which dramatically improves time-to-first-paint on podcast pages.
WordPress Audio Players
The maintainer’s original target audience. Hook into the add_attachment action, post the file to http://audiowaveform:5000/generate_waveform, and store the JSON next to the audio file in wp-content/uploads. Themes and plugins can then render the waveform without an external API.
DJ and Music Portfolio Sites
Producers showcasing tracks, mixes, or remix stems get a consistent waveform across browsers without forcing visitors to download the full audio. High-resolution variants (resolution 1024 or lower) work nicely for full-page hero sections.
Audio Editors and Annotation Tools
Pair audiowaveform-server with Peaks.js to build interview annotation tools, transcription editors, or radio production dashboards. The 16-bit JSON output gives you enough resolution to draw zoomable timelines.
Bulk Library Processing
Mount your audio archive into the container, exec in, and loop over files with the bundled CLI. The container becomes a portable processing environment without polluting the host with libraries like libmad, libsndfile, or libid3tag.
Microservice Inside a Larger Stack
Drop the container next to your API gateway and let internal services call http://audiowaveform:5000/generate_waveform. Because there is no auth, keep the port internal to the Docker network and let the upstream service handle access control.
Tuning Resolution and Bit Depth
The two parameters drive both the visual fidelity and the size of the response. Use this rough guide:
- Resolution 8192, bit_depth 8 – Tiny preview thumbnails, list views, sidebars
- Resolution 4096, bit_depth 8 – Standard podcast players (sweet spot for most sites)
- Resolution 2048, bit_depth 8 – Detailed full-width hero waveforms
- Resolution 1024, bit_depth 16 – Editing interfaces with zoom and selection
- Resolution 256, bit_depth 16 – High-fidelity DAW-style scrubbing (large payload)
The resolution value is “samples per pixel” – higher numbers mean fewer data points and smaller JSON. Bit depth affects the precision of each peak; 8-bit is more than enough for visual playback bars, while 16-bit only matters for editing tools.
Useful Links
- Docker Hub: https://hub.docker.com/r/d9media/audiowaveform-server
- BBC audiowaveform Project: https://github.com/bbc/audiowaveform
- Wavesurfer.js: https://wavesurfer.xyz/
- Peaks.js by BBC R&D: https://github.com/bbc/peaks.js
- Pull Command:
docker pull d9media/audiowaveform-server
Conclusion
Audiowaveform-server solves a tightly scoped problem with admirable focus: turn an audio file into renderable waveform JSON over a single HTTP call. There is no database to manage, no auth to misconfigure, no queue to monitor. The container starts in seconds, fits on the cheapest VPS, and slots cleanly into any stack that can speak multipart/form-data.
For podcasters, WordPress site owners, and any team building audio interfaces, it removes the painful step of decoding audio in the browser and replaces it with cached JSON that loads instantly on every device. Pair it with Wavesurfer.js or Peaks.js and you have a complete, self-hosted pipeline for beautiful audio visualization without a single third-party API key.
Spin it up once, point your application at port 5000, and forget it is running – exactly what a good single-purpose microservice should feel like.
FAQ
What is audiowaveform-server?
It is a Docker image that wraps the BBC’s audiowaveform command-line tool inside a small Flask HTTP API. You POST an audio file and receive JSON peak data that any waveform renderer can consume.
Who maintains the image?
The image is published on Docker Hub by d9media. It packages BBC R&D’s open-source audiowaveform binary together with a thin Python wrapper.
Is it free to use?
Yes. The container is freely available on Docker Hub, and the underlying BBC audiowaveform tool is released under a permissive open-source license.
What audio formats does it support?
It supports MP3, WAV, FLAC, and Ogg Vorbis – the same formats the BBC audiowaveform CLI accepts. M4A, AAC, and Opus are not supported directly and need to be transcoded first.
What is the typical image size?
Around 217 MB. It is built on Alpine Linux to keep the footprint minimal.
What port does the API run on?
Port 5000. You can map it to any host port in your compose file, or keep it internal-only on a Docker network.
What is the API endpoint?
POST /generate_waveform with a multipart/form-data body containing file, resolution, and bit_depth fields.
What does the JSON response look like?
The response contains the peak data array along with metadata such as sample rate, channel count, length, and bit depth. The structure mirrors the JSON output produced by the BBC audiowaveform CLI.
How do resolution and bit_depth affect the output?
Resolution is samples per pixel – higher numbers mean fewer points and smaller JSON. Bit depth controls precision; 8-bit produces compact files suitable for visual playback, 16-bit yields detailed data for editing tools.
What is a good default resolution?
4096 with bit depth 8 is a balanced choice for most podcast and music players. It produces a small JSON file that still looks crisp on full-width waveforms.
Does it require authentication?
No. The image has no built-in auth layer. If you expose it publicly, put it behind a reverse proxy with basic auth, an API gateway, or restrict it to an internal Docker network.
Is HTTPS required?
Not for the container itself. If you call it from another container in the same network, plain HTTP is fine. If you expose it to the internet, terminate TLS at a reverse proxy in front of the container.
How big can uploaded files be?
Flask’s default upload limit applies, but you can raise it with the MAX_CONTENT_LENGTH environment variable. 100–300 MB is a comfortable range for most podcast workflows.
Why does my upload time out?
Decoding very long MP3s on a small CPU can exceed the default request timeout of upstream proxies. Increase the proxy timeouts (Nginx proxy_read_timeout, for example) or process large files asynchronously.
Can I generate waveforms for multiple files in parallel?
Yes, but the bundled Flask development server handles requests serially. For concurrent processing, run multiple replicas behind a load balancer or build a derivative image that uses Gunicorn.
How do I integrate with WordPress?
Hook into the add_attachment action and POST the new audio file to the container with wp_remote_post. Save the JSON response next to the audio file and use a frontend library like Wavesurfer.js to render it.
How do I integrate with Wavesurfer.js?
Pass the JSON’s data array as the peaks argument to wavesurfer.load(audioUrl, peaks). Wavesurfer will skip in-browser decoding and render directly from the supplied peaks.
Is the container safe to run on the public internet?
By itself, no. There is no rate limiting, no authentication, and no content scanning. Always run it on an internal network or behind a proxy that enforces those policies.
How much memory does it need?
A few hundred MB is usually enough. Long files briefly spike memory while audiowaveform reads them, but the process exits quickly once the JSON is returned.
Can I run it on a Raspberry Pi?
Compatibility depends on the published architectures. If the image only ships for amd64, ARM hosts cannot run it directly. Check the supported tags on Docker Hub or build a derivative image for arm64.
What happens if I send an unsupported file?
The Flask wrapper returns an error response and audiowaveform exits with a non-zero status. Always check the JSON status field before consuming the data.
Why am I getting a 413 error?
The upload exceeds MAX_CONTENT_LENGTH. Raise the variable in your compose file or split the audio file before uploading.
Can I cache the JSON output?
Yes, and you should. Waveform JSON is deterministic for a given audio file plus parameters, so a simple file-based cache keyed by hash works perfectly.
How do I clear temporary files?
If you bind a host directory to /tmp/audiowaveform, you can run a periodic cleanup with a cron job or a sidecar container. The Flask wrapper deletes its own temp files when a request completes successfully.
How do I view container logs?
Use docker compose logs -f audiowaveform to follow the live output. Set PYTHONUNBUFFERED=1 if log lines appear delayed.
Can I run audiowaveform CLI directly inside the container?
Yes. docker compose exec audiowaveform audiowaveform --help runs the BBC binary directly. This is convenient for batch jobs or producing PNG previews.
Does it support PNG output?
The HTTP wrapper returns JSON only. To produce PNG images you can exec into the container and call audiowaveform with the -o file.png argument.
How does it compare to alternatives like csandman/docker-audiowaveform?
Other images expose the audiowaveform CLI without a web layer, so you must call it via docker exec or volumes. d9media adds an HTTP API on top, which is more convenient for remote applications.
Can it be used inside a Kubernetes cluster?
Yes. It is a stateless container with a single port and no persistent storage requirements – an ideal fit for a small Kubernetes Deployment behind a ClusterIP service.
How do I scale horizontally?
Run multiple replicas of the container behind a load balancer. Each request is independent, so load balancing is straightforward and there is no state to share.
What is the response time like?
Typically a fraction of a second per minute of audio on modern hardware. A 60-minute MP3 usually finishes in a few seconds on a single vCPU.
