Introduction

A few months back I got interested in how audio bits of ARM-based hobby-level SBCs work and how much effort would it take to add support for them to FreeBSD. I went through several SBC models, reading schematics, datasheets, and drivers. The text below is a summary of my findings. It’s by no means a comprehensive description, it only covers the most common design and aims at being a quick introduction to the subject.

It is assumed that the reader is familiar with the basic concepts of digital audio: what sample, sample size, and sampling rate are.

Overview

From the very top level, the audio subsystem can be seen as a black box: audio samples are fed into it from memory at one end, and acoustic waves come out of it at the other. If you were to look closer, instead of a single box you would see series of interconnected smaller boxes, like on this diagram:

audiopath

The concrete hardware implementing each functional block varies from SBC to SBC but conceptual scheme remains more or less the same: there are two main blocks, I2S controller and CODEC, connected by I2S bus.

CODEC

The component I’d like to start with is a CODEC. It’s a chip that integrates ADC (analog-to-digital converter) and DAC (digital-to-analog converter) along with some other functionality. On a playback, path CODEC takes audio samples as input from the digital input port and produces analog signal as an output. On a capture path, it’s the other way around - analog signal comes to the input port, then CODEC converts it to a stream of samples and sends to a digital output port.

CODEC usually is more complex than just ADC/DAC combined in one package. It may have more than one digital interface and more than one analog interface, i.e., headphones, speaker, microphone 1, microphone 2, line-in, line-out. In addition to I/O and ADC/DAC CODEC may have internal analog amplifiers, attenuators, DSP blocks for audio processing, and digital amplification. CODEC acts as a complex mixing console: the mono audio stream may be taken from digital port 1, extended to stereo by mirroring the input channel, converted to analog, mixed with the input from the microphone, amplified and sent to a headphones output, or converted back to the digital form and sent back to the digital port 2.

Realtek ALC5640

rtlcodec

A CODEC may be a separate physical chip or a part of an SoC. For the latter case, OS can access CODEC’s registers through a memory-mapped window: write to a certain physical memory location triggers a write to CODEC’s register. For the former case, a special control channel needs to exist between the SoC and CODEC chip. Normally it’s I2C bus, and OS accesses CODEC’s registers by sending/receiving data to/from the predefined I2C address.

If a CODEC does not have amplifier built-in or not powerful enough to drive an external speaker, it’s not uncommon to put amplifier between CODEC’s analog output and a speaker. This amplifier may also be controlled externally: switched on during playback and off in idle state to conserve power.

I2S Controller

The component that sends audio samples to the CODEC and receives them from it is the I2S controller. Theoretically, when integrated on SoC, CODEC can access samples directly, either through DMA requests or more CPU-intensive PIO mode. In practice, however, even integrated CODEC block still uses serial bus designed specifically for connecting audio devices, I2S, to exchange data with a controller.

OS driver configures the I2S controller to send/receive samples over the bus at a certain rate by setting audio parameters (sampling rate, number of channels, sample size) in the controller’s registers and then initiates TX or RX operation. I2S controller maintains two internal buffers, TXFIFO and RXFIFO, for samples to be transmitted or recently received. Sometimes there are more than just two FIFOs.

To start TX operation, a driver prefills TXFIFO and notifies the controller that it should start transmitting. Whenever the amount of data in the FIFO drops below a certain threshold, I2S controller raises interrupt to let the driver know that more samples are needed. The same happens for RX, only in the opposite direction: whenever the amount of data in RXFIFO is above a threshold, the driver is notified to read out received samples to free space in the FIFO for new ones.

I2S Bus

The link between an I2S controller and CODEC is the I2S bus. It’s an interface used to connect audio devices. The standard was developed by Philips in 1986. The simplest version of I2S bus has three signals: SCK (serial clock, also known as bit clock or BCLK), WS (word select, also known as left/right clock, LRCLK, or frame-sync, FS), Serial data (also known as SD, SDIN, SDOUT, DACDAT, ADCDAT). The bus may include more than one data line and two LRCLK lines instead of one: for TX and for RX.

I2S timing diagram (source)

i2s timing

If the audio component, i.e., CODEC, has internal digital logic, like DSP processing, it needs to be driven by a clock. This clock can be provided by another optional line, Master Clock (MCLK). Usually, MCLK frequency is a sampling_rate multiplied by 256. As an alternative, CODEC or a controller can derive MCLK internally from received SCK by multiplying it or have an external fixed oscillator or some other, independent, clock source.

Either of the components, CODEC or a controller, may generate SCK or WS, so to communicate properly, devices need to agree on who’s providing what clocks.

Since the I2S standard invention, it spawned several variants (or formats): the standard one, left-justified, right-justified, DSP, PCM. They operate on more or less the same principle but differ in details like where does a word start, how the FS signals channel change. Modern chips support at least the most common formats, and the actual variant can be configured by setting specific values in a control register.

Device Tree and Drivers

Information about this interconnected structure of I2S controller, CODEC, and external amplifiers is available to an OS through an FDT (flattened device tree) blob passed to the kernel by bootloader or built into the kernel during compilation. Below is an excerpt (slightly modified for illustration purposes) of the Pinebook Pro FDT source file with pieces relevant to the audio subsystem.

rk3399-pinebook-pro.dts
/ {
    /* Audio components */
    es8316-sound {
        compatible = "simple-audio-card";
        pinctrl-names = "default";
        pinctrl-0 = <&hp_det_gpio>;
        simple-audio-card,name = "rockchip,es8316-codec";
        simple-audio-card,format = "i2s";
        simple-audio-card,mclk-fs = <256>;

        simple-audio-card,aux-devs = <&speaker_amp>;
        simple-audio-card,bitclock-master = <&i2s1>;
        simple-audio-card,frame-master = <&i2s1>;

        simple-audio-card,cpu {
            sound-dai = <&i2s1>;
        };

        simple-audio-card,codec {
            sound-dai = <&es8316>;
        };
    };

    speaker_amp: speaker-amplifier {
        compatible = "simple-audio-amplifier";
        enable-gpios = <&gpio4 RK_PD3 GPIO_ACTIVE_HIGH>;
        sound-name-prefix = "Speaker Amplifier";
        VCC-supply = <&pa_5v>;
    };

    i2s1: i2s@ff890000 {
        compatible = "rockchip,rk3399-i2s", "rockchip,rk3066-i2s";
        reg = <0x0 0xff890000 0x0 0x1000>;
        interrupts = <GIC_SPI 40 IRQ_TYPE_LEVEL_HIGH 0>;
        dmas = <&dmac_bus 2>, <&dmac_bus 3>;
        dma-names = "tx", "rx";
        clock-names = "i2s_clk", "i2s_hclk";
        clocks = <&cru SCLK_I2S1_8CH>, <&cru HCLK_I2S1_8CH>;
        pinctrl-names = "default";
        pinctrl-0 = <&i2s_8ch_mclk_gpio>, <&i2s1_2ch_bus>;
        power-domains = <&power RK3399_PD_SDIOAUDIO>;
        #sound-dai-cells = <0>;
        rockchip,capture-channels = <8>;
        rockchip,playback-channels = <8>;
        status = "okay";
    };

    i2c1 {
        es8316: es8316@11 {
            compatible = "everest,es8316";
            reg = <0x11>;
            clocks = <&cru SCLK_I2S_8CH_OUT>;
            clock-names = "mclk";
            #sound-dai-cells = <0>;
        };
    };
};

As you can see, there is es8316 node that describes ES8316 CODEC chip. It’s located on the I2C bus 1, has I2C address 0x11 and Master Clock signal is derived from one of the clocks in the RK3399 SoC clock tree, namely SCLK_I2S_8CH_OUT.

Then there is i2s1 node that corresponds to one of the thee I2S controller blocks on the SoC. Property reg describes the physical memory window through which CPU can access the controller’s registers.

ES8316 does not have analog amplifiers, only attenuators, so to drive a laptop’s speaker, there is an external amplifier, speaker_amp. Control-wise it’s not a smart device; it can be powered up or down through the power regulator and enabled/disabled by a GPIO signal.

Tying them all together is a virtual "sound card" es8316-sound. The node’s simple-audio-card,cpu property references i2s1 as a CPU side of the audio path, the side that exchanges samples with the memory. es8316 is referenced as a CODEC side by simple-audio-card,codec property, and simple-audio-card,aux-devs lists references to optional, auxiliary, devices or in this case only one device - amplifier.

To start playback userland app opens an audio device and configures it for a specific sample format and sample rate using kernel-provided APIs. At this point, the simple-audio-card driver has all the required information to configure the CPU and CODEC nodes. It calculates MCLK frequency and instructs both ends of the audio path to set up system clocks if it’s required. Then it instructs both nodes to configure I2S bus parameters, so they could communicate: nodes need to agree on sample size, I2S data format, clock roles, calculate and set BCLK and WS frequencies. Once it’s done userland starts sending samples to the kernel, and kernel passes it to the I2S controller. If the user changes volume, the command goes through the simple-audio-card driver to the CODEC node.

When simple-audio-card,codec is not CODEC

There are cases when the CODEC chip is not present in the audio path. The most common one is HDMI audio. HDMI is a digital interface, so there is no point in converting a digital signal to analog. Still, the simple-audio-card,codec property must be present and refer to the HDMI framer node. In this case, the framer chip acts only as I2S receiver: it receives samples and passes them to the TV or monitor as bytes in a bitstream, mixed with pixels and other data.