CHECKING STATUS
I AM LISTENING TO
|

Day 46: AudioWaveform Server – Self-Hosted Waveform JSON Generation – 7 Days of Docker

4. May 2026
.SHARE

Table of Contents

If you have ever embedded an audio player on a podcast site, a music portfolio, or a media archive, you have probably noticed that pretty rendered waveform behind the playhead. Libraries like Wavesurfer.js and Peaks.js can draw those visuals on the client, but decoding a 60-minute MP3 in the browser is slow, memory-hungry, and unreliable on mobile. The clean solution is to pre-compute the waveform data on the server and serve it as a tiny JSON file. That is exactly what d9media/audiowaveform-server does – it wraps the BBC audiowaveform CLI inside a small Flask API so any application can POST an audio file and receive ready-to-render waveform JSON.

What Does This Thing Actually Do?

The audiowaveform-server is a lightweight Alpine-based Docker image that exposes a single HTTP endpoint. You upload an audio file, choose a resolution and a bit depth, and the server returns a JSON document containing the peak data needed to render a visual waveform anywhere – in WordPress, a Vue dashboard, a static site, a mobile app, you name it.

Under the hood it combines two well-known pieces:

  • BBC audiowaveform – The reference CLI built by the BBC R&D team for generating waveform data from MP3, WAV, FLAC, and Ogg Vorbis sources
  • Flask – A minimal Python web framework that exposes the CLI through a REST endpoint
  • Alpine Linux – Keeps the resulting image around 217 MB so it boots in seconds and runs comfortably on a small VPS
  • Multipart File Upload – Accepts audio files via standard multipart/form-data, which means every HTTP client and every CMS can talk to it
  • Dynamic Parameters – The resolution (zoom factor) and bit_depth are passed per request, so you can generate compact 8-bit overviews for thumbnails or detailed 16-bit data for full editors
  • JSON Output – Returns the waveform array directly in the response body, ready to be cached or piped into Wavesurfer.js, Peaks.js, or any custom canvas renderer
  • Stateless Design – No database, no queue, no auth layer in the way; it does one thing per request and gets out of the way

The original use case mentioned by the maintainer is integrating with WordPress to show waveforms on podcast posts, but the API is generic enough for any backend that can POST a file.

Docker Compose Setup

Because the service holds no persistent state and uses a single container, the compose file is refreshingly small. Here is a complete configuration that adds a small uploads cache and binds the API to a sensible internal port:

For most installations the defaults work out of the box and you can trim the compose file down to just the image, port, and restart policy. The named volume only matters if you process very large files and want to keep the temporary directory off the container’s writable layer.

Installation Steps

1. System Requirements

This is one of the lightest containers you will ever run, but audiowaveform itself is CPU-bound when decoding compressed audio. Plan for:

  • Docker and Docker Compose installed
  • 512 MB RAM minimum (1 GB comfortable)
  • 1 vCPU for casual use, 2+ vCPUs if you process long files in parallel
  • Disk space proportional to the largest audio file you intend to upload (the file is buffered in /tmp during processing)

2. Create the Project Directory

3. Save the Compose File

Create a docker-compose.yml in the new directory and paste the configuration from above.

4. Pull the Image

The image is around 217 MB so the pull is quick on most connections.

5. Launch the Service

6. Verify the Container is Running

7. Generate Your First Waveform

Pick any local MP3 or WAV file and post it to the API. The endpoint expects a multipart/form-data payload with three fields: the file, the resolution, and the bit depth.

If the call succeeds, waveform.json contains the peak array, sample rate, and the metadata needed by any waveform renderer.

Environment Variables Explained

The image is intentionally minimal and does not require environment variables to start. Everything that matters is passed per request. The variables listed here are useful adjustments for production deployments.

FLASK_ENV

Purpose: Sets the Flask runtime mode.

Format: String (production or development)

Default: production

Always keep this on production for live deployments. The development mode enables verbose stack traces and the debugger, which is a security risk on a public host.

MAX_CONTENT_LENGTH

Purpose: Maximum upload size accepted by the Flask server, expressed in bytes.

Format: Integer (bytes)

Example: 104857600 for 100 MB

Increase this when processing long-form content like full DJ mixes or audiobooks. A two-hour 320 kbps MP3 is roughly 280 MB, so a 300 MB ceiling is a safe value for podcasting workflows.

TMPDIR

Purpose: Directory used to store the uploaded file while audiowaveform reads it.

Format: Absolute filesystem path

Default: /tmp

Pointing this at a named volume keeps the temporary writes off the container layer and lets the host monitor disk usage if you process large batches.

PYTHONUNBUFFERED

Purpose: Forces Python to flush stdout/stderr immediately so logs appear in real time.

Format: Integer (1 to enable)

Useful when you tail the container logs with docker compose logs -f and want to see Flask output without buffering delays.

GUNICORN_WORKERS (Optional)

Purpose: If you swap the built-in Flask development server for Gunicorn behind a reverse proxy, this controls the worker count.

Format: Integer

Default: Not set (the image runs Flask directly)

A common rule of thumb is 2 * CPU + 1. Only relevant if you build a derivative image with Gunicorn baked in.

Volume Mounts Explained

audiowaveform-tmp

Purpose: Holds uploaded audio files briefly while audiowaveform decodes them.

Mount Point: /tmp/audiowaveform

The container does not require a persistent volume to function – everything lives in memory or in /tmp for the duration of a request. A named volume is still useful because it keeps temporary I/O off the container’s overlay filesystem and survives container recreations.

Bind Mount for Bulk Processing (Optional)

Purpose: If you batch-process a fixed library of audio files, mount the source directory into the container so you can shell in and run the CLI directly.

Mount Point: /audio (or any path you choose)

Mounting read-only protects the original library from accidental writes. You can still docker exec into the container and call the underlying audiowaveform binary directly when you need raw control.

Using the API

Endpoint Reference

The image exposes a single endpoint:

cURL Example

PHP Example

Node.js Example

Wavesurfer.js Frontend

Once the JSON is generated, feed it straight into Wavesurfer.js to skip client-side decoding entirely:

Common Use Cases

Podcast Hosting

Run the server alongside your CMS and pre-generate a waveform JSON every time an episode is uploaded. Visitors load a tiny JSON file instead of decoding a 100 MB MP3 in JavaScript, which dramatically improves time-to-first-paint on podcast pages.

WordPress Audio Players

The maintainer’s original target audience. Hook into the add_attachment action, post the file to http://audiowaveform:5000/generate_waveform, and store the JSON next to the audio file in wp-content/uploads. Themes and plugins can then render the waveform without an external API.

DJ and Music Portfolio Sites

Producers showcasing tracks, mixes, or remix stems get a consistent waveform across browsers without forcing visitors to download the full audio. High-resolution variants (resolution 1024 or lower) work nicely for full-page hero sections.

Audio Editors and Annotation Tools

Pair audiowaveform-server with Peaks.js to build interview annotation tools, transcription editors, or radio production dashboards. The 16-bit JSON output gives you enough resolution to draw zoomable timelines.

Bulk Library Processing

Mount your audio archive into the container, exec in, and loop over files with the bundled CLI. The container becomes a portable processing environment without polluting the host with libraries like libmad, libsndfile, or libid3tag.

Microservice Inside a Larger Stack

Drop the container next to your API gateway and let internal services call http://audiowaveform:5000/generate_waveform. Because there is no auth, keep the port internal to the Docker network and let the upstream service handle access control.

Tuning Resolution and Bit Depth

The two parameters drive both the visual fidelity and the size of the response. Use this rough guide:

  • Resolution 8192, bit_depth 8 – Tiny preview thumbnails, list views, sidebars
  • Resolution 4096, bit_depth 8 – Standard podcast players (sweet spot for most sites)
  • Resolution 2048, bit_depth 8 – Detailed full-width hero waveforms
  • Resolution 1024, bit_depth 16 – Editing interfaces with zoom and selection
  • Resolution 256, bit_depth 16 – High-fidelity DAW-style scrubbing (large payload)

The resolution value is “samples per pixel” – higher numbers mean fewer data points and smaller JSON. Bit depth affects the precision of each peak; 8-bit is more than enough for visual playback bars, while 16-bit only matters for editing tools.

Useful Links

Conclusion

Audiowaveform-server solves a tightly scoped problem with admirable focus: turn an audio file into renderable waveform JSON over a single HTTP call. There is no database to manage, no auth to misconfigure, no queue to monitor. The container starts in seconds, fits on the cheapest VPS, and slots cleanly into any stack that can speak multipart/form-data.

For podcasters, WordPress site owners, and any team building audio interfaces, it removes the painful step of decoding audio in the browser and replaces it with cached JSON that loads instantly on every device. Pair it with Wavesurfer.js or Peaks.js and you have a complete, self-hosted pipeline for beautiful audio visualization without a single third-party API key.

Spin it up once, point your application at port 5000, and forget it is running – exactly what a good single-purpose microservice should feel like.

FAQ

What is audiowaveform-server?

It is a Docker image that wraps the BBC’s audiowaveform command-line tool inside a small Flask HTTP API. You POST an audio file and receive JSON peak data that any waveform renderer can consume.

Who maintains the image?

The image is published on Docker Hub by d9media. It packages BBC R&D’s open-source audiowaveform binary together with a thin Python wrapper.

Is it free to use?

Yes. The container is freely available on Docker Hub, and the underlying BBC audiowaveform tool is released under a permissive open-source license.

What audio formats does it support?

It supports MP3, WAV, FLAC, and Ogg Vorbis – the same formats the BBC audiowaveform CLI accepts. M4A, AAC, and Opus are not supported directly and need to be transcoded first.

What is the typical image size?

Around 217 MB. It is built on Alpine Linux to keep the footprint minimal.

What port does the API run on?

Port 5000. You can map it to any host port in your compose file, or keep it internal-only on a Docker network.

What is the API endpoint?

POST /generate_waveform with a multipart/form-data body containing file, resolution, and bit_depth fields.

What does the JSON response look like?

The response contains the peak data array along with metadata such as sample rate, channel count, length, and bit depth. The structure mirrors the JSON output produced by the BBC audiowaveform CLI.

How do resolution and bit_depth affect the output?

Resolution is samples per pixel – higher numbers mean fewer points and smaller JSON. Bit depth controls precision; 8-bit produces compact files suitable for visual playback, 16-bit yields detailed data for editing tools.

What is a good default resolution?

4096 with bit depth 8 is a balanced choice for most podcast and music players. It produces a small JSON file that still looks crisp on full-width waveforms.

Does it require authentication?

No. The image has no built-in auth layer. If you expose it publicly, put it behind a reverse proxy with basic auth, an API gateway, or restrict it to an internal Docker network.

Is HTTPS required?

Not for the container itself. If you call it from another container in the same network, plain HTTP is fine. If you expose it to the internet, terminate TLS at a reverse proxy in front of the container.

How big can uploaded files be?

Flask’s default upload limit applies, but you can raise it with the MAX_CONTENT_LENGTH environment variable. 100–300 MB is a comfortable range for most podcast workflows.

Why does my upload time out?

Decoding very long MP3s on a small CPU can exceed the default request timeout of upstream proxies. Increase the proxy timeouts (Nginx proxy_read_timeout, for example) or process large files asynchronously.

Can I generate waveforms for multiple files in parallel?

Yes, but the bundled Flask development server handles requests serially. For concurrent processing, run multiple replicas behind a load balancer or build a derivative image that uses Gunicorn.

How do I integrate with WordPress?

Hook into the add_attachment action and POST the new audio file to the container with wp_remote_post. Save the JSON response next to the audio file and use a frontend library like Wavesurfer.js to render it.

How do I integrate with Wavesurfer.js?

Pass the JSON’s data array as the peaks argument to wavesurfer.load(audioUrl, peaks). Wavesurfer will skip in-browser decoding and render directly from the supplied peaks.

Is the container safe to run on the public internet?

By itself, no. There is no rate limiting, no authentication, and no content scanning. Always run it on an internal network or behind a proxy that enforces those policies.

How much memory does it need?

A few hundred MB is usually enough. Long files briefly spike memory while audiowaveform reads them, but the process exits quickly once the JSON is returned.

Can I run it on a Raspberry Pi?

Compatibility depends on the published architectures. If the image only ships for amd64, ARM hosts cannot run it directly. Check the supported tags on Docker Hub or build a derivative image for arm64.

What happens if I send an unsupported file?

The Flask wrapper returns an error response and audiowaveform exits with a non-zero status. Always check the JSON status field before consuming the data.

Why am I getting a 413 error?

The upload exceeds MAX_CONTENT_LENGTH. Raise the variable in your compose file or split the audio file before uploading.

Can I cache the JSON output?

Yes, and you should. Waveform JSON is deterministic for a given audio file plus parameters, so a simple file-based cache keyed by hash works perfectly.

How do I clear temporary files?

If you bind a host directory to /tmp/audiowaveform, you can run a periodic cleanup with a cron job or a sidecar container. The Flask wrapper deletes its own temp files when a request completes successfully.

How do I view container logs?

Use docker compose logs -f audiowaveform to follow the live output. Set PYTHONUNBUFFERED=1 if log lines appear delayed.

Can I run audiowaveform CLI directly inside the container?

Yes. docker compose exec audiowaveform audiowaveform --help runs the BBC binary directly. This is convenient for batch jobs or producing PNG previews.

Does it support PNG output?

The HTTP wrapper returns JSON only. To produce PNG images you can exec into the container and call audiowaveform with the -o file.png argument.

How does it compare to alternatives like csandman/docker-audiowaveform?

Other images expose the audiowaveform CLI without a web layer, so you must call it via docker exec or volumes. d9media adds an HTTP API on top, which is more convenient for remote applications.

Can it be used inside a Kubernetes cluster?

Yes. It is a stateless container with a single port and no persistent storage requirements – an ideal fit for a small Kubernetes Deployment behind a ClusterIP service.

How do I scale horizontally?

Run multiple replicas of the container behind a load balancer. Each request is independent, so load balancing is straightforward and there is no state to share.

What is the response time like?

Typically a fraction of a second per minute of audio on modern hardware. A 60-minute MP3 usually finishes in a few seconds on a single vCPU.

Let’s Talk!

Looking for a reliable partner to bring your project to the next level? Whether it’s development, design, security, or ongoing support—I’d love to chat and see how I can help.

Get in touch,
and let’s create something amazing together!

RELATED POSTS

Font licensing is wild. Proxima Nova — one of the most-used typefaces on the entire web — will run you $65 per style, and a full family license easily clears $300. Futura? Brandon Grotesque? Circular (yes, Spotify’s font)? We’re talking hundreds of dollars before you’ve typed a single character. For personal projects, indie dev work, […]

So you want to set up email on a subdomain. Maybe you’re trying to route support@help.yourdomain.com to your helpdesk, or you want newsletters@mail.yourdomain.com to run through your ESP without torching your main domain’s reputation. Whatever the reason, you’ve landed on the right page. MX records for subdomains are one of those DNS topics that seem […]

Here’s a question that’s been bouncing around dev Slacks and SEO Twitter for the past year: should you bother serving Markdown to AI agents and crawlers? Is it actually worth the effort, or is it just another shiny standard that’ll quietly die like so many before it? Short answer? Yeah, it’s worth doing. The longer […]

Alexander

I am a full-stack developer. My expertise include:

  • Server, Network and Hosting Environments
  • Data Modeling / Import / Export
  • Business Logic
  • API Layer / Action layer / MVC
  • User Interfaces
  • User Experience
  • Understand what the customer and the business needs


I have a deep passion for programming, design, and server architecture—each of these fuels my creativity, and I wouldn’t feel complete without them.

With a broad range of interests, I’m always exploring new technologies and expanding my knowledge wherever needed. The tech world evolves rapidly, and I love staying ahead by embracing the latest innovations.

Beyond technology, I value peace and surround myself with like-minded individuals.

I firmly believe in the principle: Help others, and help will find its way back to you when you need it.