May 2024 – Feb 2025·Individual·Completed

DCT in JPEG Compression

Signal Processing · Independent Research

Executed the core JPEG compression sequence on my own raw photographic data. Computed the Discrete Cosine Transform analytically, applied standard quantization matrices, and validated reconstruction accuracy against Fast Fourier Transform benchmarks.

PythonAlgorithm ImplementationSignal ProcessingImage Compression

View Paper

At a glance (one-pager)

Findings

Low-Frequency Dominance Validated at $-777.0$ DC

Manual computation of the primary $8\times 8$ pixel matrix yielded a DC coefficient of $-777.0$ . This metric directly proves that the vast majority of photographic energy is concentrated in low-frequency block averages. Subsequent Inverse DCT reconstruction produced a near-lossless output, mathematically validating the aggressive truncation of high-frequency AC coefficients during the quantization phase.

Key Parameters

Pixel Depth

8-bit[0, 255]

Level-shifted to [−127, 127] before DCT.

Block Size

8 × 8pixels

JPEG standard — a 512×512 image contains 4096 such blocks.

JPEG Quality

50(standard)

Balances compression ratio against perceptual fidelity.

DC Coefficient

−777.0(first block)

Largest coefficient by far — confirms low-freq content dominates.

DCT Reconstruction

[7,15,23…63]vs original [8,16…64]

Near-identical. FFT gave [24,12,20…48] — visible artifacts.

Compression Algorithm

Box Filter

Downsampling of pixel values to reduce complexity.

Objective

JPEG remains the global standard for image compression by relying heavily on frequency-domain transformations. The objective of this analysis was to mathematically validate the efficiency of the Discrete Cosine Transform (DCT) by isolating its core mathematical pipeline. This involved extracting pixel intensities via Python and computing the transform algorithm manually to evaluate the mechanics strictly prior to encoding.

Methodology & reflection

Takeaways & position

Bypassing the entropy coding phase exposed a critical vulnerability in the models. Without Huffman encoding, the Inverse DCT reconstruction produced minor boundary clipping artifacts, outputting negative pixel intensities that required manual zeroing. This limitation provided a direct lesson for future biomedical signal processing research: theoretical algorithm isolation is insufficient. Future compression or kinematic models must be validated through complete, end-to-end pipelines to guarantee absolute data integrity at the physical output layer.

Methodology

I converted a personal beach photo to greyscale in Python, extracted the first $8 \times 8$ pixel block, and computed $\text{DCT} = T \times M \times T'$ , where $T$ is the cosine transform matrix. The pipeline included level-shifting pixel values from $[0, 255]$ to $[-127, 127]$ , element-wise division by the quality-50 quantisation matrix $Q$ , and reconstruction via IDCT. The derivation of the DCT from the DFT was worked through analytically in the essay body.

Process

The derivation began with the Discrete Fourier Transform and took only its real part, eliminating imaginary frequency nodes to arrive at a variant of the DCT. The 2D DCT-II used in JPEG follows step-by-step:

F(i,j) = \frac{2}{N}C(i)C(j)\sum_{x=0}^{N-1}\sum_{y=0}^{N-1}f(x,y)\cos\!\frac{(2x+1)i\pi}{2N}\cos\!\frac{(2y+1)j\pi}{2N}

This established why the DCT is used in JPEG rather than treating it as an arbitrary historical choice. A secondary comparison against FFT reconstruction on the same input was included to test the claim that DCT is better-suited for natural images.

Analysis

The manual calculation of the first $8\times 8$ block yielded a DC coefficient of $-777.0$ . This magnitude vastly outweighed the high-frequency AC coefficients, mathematically validating the JPEG quantization strategy of aggressively zeroing out high-frequency data. IDCT reconstruction produced a near-lossless output matrix, with minor clipping artifacts $(2, 3 \to 0)$ stemming from the explicit exclusion of Huffman encoding in this model.

Additional photos

The box filter. Used to introduce local spatial averaging (filtering) before the shift to frequency-domain compression via the DCT.

The beach photo converted to greyscale. The first 8×8 pixel block (top-left) was extracted for the manual DCT calculation.

Python implementation of the DCT pipeline computing the transform matrix T. The level-shifted pixel block M is computed as T × M × T'.

Visualised DCT coefficient matrix for the first 8×8 block. The top-left DC coefficient (−777.0) dominates, most image energy is in low frequencies.

DCT (black) vs FFT (red) reconstruction against the original signal (green). DCT values stay close to the line; FFT values deviate visibly.