Image Engineer's Notes, Part 5 : How Image Sensors Define the DNA of Images and the Limits of HDR

The Art of Photoelectric Conversion: How Image Sensors Define the DNA of Images and the Limits of HDR

Introduction

In the world of digital imaging, image sensors play a crucial role. They are not merely the "eyes" that capture light but also the key determinants of an image's quality. This note will delve into the physical characteristics of image sensors, starting from the fundamental principles of photoelectric conversion, analyzing pixel science, the causes of noise, and revealing how hardware-level High Dynamic Range (HDR) technology is implemented. The article will combine engineering practical experience in optimizing specific image sensor registers and introduce advanced topics such as pixel architecture evolution, Photon Transfer Curve (PTC) models and environmental effects, optical coupling issues, shutter technology trade-offs, and in-depth HDR technology, providing readers with content that is both profound and practical.

1. Photoelectric Conversion and Pixel Science: The Birth of an Image

The generation of an image begins with the photoelectric conversion process of the image sensor. When photons penetrate the microlens and color filter, they enter the depletion region of the pixel, where their energy excites electrons in the semiconductor material, generating electron-hole pairs. These electrons are then collected in the potential well within the pixel, and their number is directly proportional to the intensity of the incident photons, completing the conversion from optical signal to electrical signal.

Evolution of Pixel Architecture: From FSI to Stacked Sensor and Logic Layer Integration

The physical structure of image sensors has undergone several significant transformations, with each evolution significantly improving image quality while laying the foundation for more complex functional integration:

Front-Side Illumination (FSI): This is an early image sensor architecture. In FSI design, the metal wiring layer is located above the photodiode. This metal wiring layer blocks some of the incident light, reducing the number of photons received by the photodiode, thereby decreasing Quantum Efficiency (QE) and sensitivity. In addition, light may scatter when passing through the metal layer, affecting image quality.
Back-Side Illumination (BSI): To solve the optical loss problem of FSI, BSI technology emerged. BSI moves the metal wiring layer below the photodiode, allowing incident light to directly illuminate the photosensitive layer, greatly improving quantum efficiency and low-light sensitivity. The introduction of BSI technology enables image sensors to maintain excellent photosensitive performance while shrinking pixel size.

Stacked Sensor and Logic Layer Integration: This is the pinnacle of current image sensor technology. Stacked image sensors vertically stack the pixel layer (Pixel Array) with the logic circuit layer (Logic Circuitry, such as Image Signal Processor ISP, DRAM memory, etc.). This architecture not only significantly reduces chip size but, more importantly, allows for the integration of more complex and powerful processing circuits below the pixel layer. This enables image sensors to achieve high-speed readout, single-frame HDR synthesis, more precise noise suppression, and even partial AI computations directly within the image sensor, greatly improving the efficiency and performance of image processing

Pixel Size, Full Well Capacity, and Dynamic Range

Pixel Size and Full Well Capacity (FWC) together determine the physical limits of an image sensor. FWC refers to the maximum number of electrons a single pixel can store before saturation, directly determining the "highlight" ceiling of the image. The Native Dynamic Range of an image sensor is calculated by the formula:

DR = 20 * log10(FWC / Read Noise)

From the formula, it can be seen that FWC determines the upper limit of the dynamic range (i.e., the brightest signal that can be captured), while read noise determines the lower limit of the dynamic range (i.e., the darkest signal that can be identified). Higher FWC and lower read noise can effectively extend the native dynamic range of the image sensor.

2. The Science of Noise and PTC Model: Quantifying Image Quality

Noise is an unavoidable factor in digital images, which reduces image clarity and detail. To accurately quantify the performance of image sensors, engineers typically establish a Photon Transfer Curve (PTC) model.

PTC and Noise Model Establishment

Photon Transfer Curve (PTC) is a curve that plots the mean signal and variance (i.e., the square of noise) of an image signal on a logarithmic scale. By analyzing the PTC curve, various noise components in the image sensor can be accurately separated, and key parameters such as Conversion Gain and Read Noise can be calculated.

PTC curves typically include the following regions:

Read Noise: Dominant in the low signal region (dark areas), appearing as a horizontal segment of the PTC curve. Noise in this region mainly comes from the readout circuit, defining the weakest signal that the image sensor can resolve, which is the limit of "dark detail" in the image.
Shot Noise: Increases with increasing signal, appearing as a straight line with a slope of 0.5 on a logarithmic scale. It originates from the randomness of photons arriving at the image sensor and follows a Poisson distribution. Shot noise is proportional to the square root of the number of signal electrons (√(N)), thus dominating in highlight areas or high signal conditions.
Fixed Pattern Noise (FPN): Increases linearly with the signal, appearing as a straight line with a slope of 1 on a logarithmic scale. FPN typically originates from physical differences between pixels, such as Dark Current Non-Uniformity or Photo Response Non-Uniformity (PRNU).

Through PTC, engineers can establish accurate noise models for image sensor calibration, performance evaluation, and optimization of image processing algorithms.

Low-Light Performance: Dark Current and Temperature Effects

In low-light environments, especially in long-exposure scenarios, Dark Current becomes a critical factor affecting image quality. Dark current is the electron flow generated by thermal excitation in semiconductor materials even in the complete absence of light. These electrons are collected by pixels and are mistakenly interpreted as light signals, causing noise points or uneven brightness in the image.

Dark current is extremely sensitive to temperature; typically, for every 6-8°C increase in image sensor temperature, the dark current doubles. This explains why professional cameras and astronomical photography equipment often come with cooling systems to lower the image sensor's temperature and reduce dark current noise. In ISP optimization, although the physical cause of dark current is temperature, since analog gain amplifies all signals (including the base level generated by dark current), in practice, the strength of Black Level Correction (BLC) is usually set in stages according to ISO Level (Gain). Establishing precise BLC parameters is an important means to suppress dark current and other base noise effects.

Why not correct black level based on exposure time?

Although dark current is directly related to exposure time, in ISP practice, Black Level Correction (BLC) is generally not directly dependent on exposure time for correction, mainly for three reasons:

Dark current is usually much smaller than offset: In addition to dark current, the image sensor's output signal also includes the circuit's own fixed offset. In most application scenarios, this offset is much larger than the dark current, so prioritizing offset correction is more critical.
Gain has a greater impact on black level: Analog Gain (AG) directly amplifies all signals, including dark current and offset. Therefore, the black level at different ISO Levels (Gain) will be significantly different, making staged correction based on Gain more effective and practical.
ISP pipeline requires stable LUT: To ensure the stability and efficiency of the ISP processing flow, a Look-Up Table (LUT) based on ISO Level is usually established for BLC, rather than dynamically adjusting according to exposure time, which helps simplify system design and tuning.

Application Scenarios for Dark Frame Calibration

Although BLC in general consumer-grade imaging systems does not rely on exposure time, in special applications that are extremely sensitive to noise and require long exposures, such as: Astronomy Cameras, Scientific CMOS and Long Exposure Photography. These systems employ more precise Dark Frame Calibration technology.

The principle is to capture a "dark frame" image with no light under the same exposure time and temperature as the actual shot, and then subtract this dark frame image pixel by pixel from the actual shot image to accurately eliminate the effects of dark current and fixed pattern noise.

3. Optical Coupling and Physical Challenges: System-Level Considerations

Image sensors are not isolated components; their optical coupling with lenses and the choice of shutter technology directly affect the final imaging quality and system performance.

Crosstalk, CRA, and Optical Coupling Issues

The matching between the image sensor and the lens is crucial, and improper optical coupling can lead to various image problems:

Optical Crosstalk: When incident light enters a pixel at a larger angle, especially in image sensors with miniaturized pixel sizes, light may penetrate into the photodiodes of adjacent pixels, causing signal mixing between different colors or different pixels. This can lead to color distortion and reduced resolution, which is more pronounced in the edge areas of the image.

Chief Ray Angle (CRA): CRA refers to the angle between the light ray exiting the lens and reaching the image sensor surface, and the normal to the image sensor. The microlens array of the image sensor is usually optimized for a specific CRA. If the lens's CRA does not match the CRA designed for the image sensor's microlenses, it can cause light to not focus effectively on the photodiode, leading to severe vignetting, color shading, or reduced sensitivity, especially in wide-angle lens designs, CRA matching is even more critical.

4. Engineering Trade-offs of Shutter Technology: Rolling Shutter and Global Shutter

Shutter technology determines how image sensors read pixel signals and produces different effects in motion scenes:

Rolling Shutter: This is the technology adopted by most CMOS image sensors today. It reads pixel signals sequentially row by row (or column by column). Its advantages include lower read noise, higher dynamic range, and relatively lower manufacturing cost. However, when shooting fast-moving objects, due to slight differences in exposure time between different rows of pixels, the image may produce distortion, tilt, or wobble, known as the "Jello Effect".

Global Shutter: Global shutter technology allows all pixels to be exposed simultaneously and read out at the same time. Its main advantage is that it can completely eliminate motion distortion caused by rolling shutters, making it very suitable for capturing high-speed motion or applications requiring precise time synchronization (such as machine vision, autonomous driving). However, to achieve simultaneous exposure and storage of pixels, each pixel usually requires an additional storage node, which increases pixel complexity, may lead to increased pixel size, reduced Full Well Capacity (FWC), and increased read noise . Therefore, the design of a global shutter requires a trade-off between performance and cost.

Stacked Global Shutter: This is the latest advancement in global shutter technology, combining the advantages of stacked image sensors. The challenge of traditional global shutters is that to achieve simultaneous charge storage within each pixel, additional circuitry is required, which can lead to increased pixel size, reduced sensitivity, or reduced full well capacity. Stacked global shutters vertically stack the pixel layer with the logic circuit layer, moving the storage circuitry required for global shutters (such as memory or processing units) from below the pixel layer to an independent logic layer. This architecture allows the pixel layer to focus on photoelectric conversion, maintaining a smaller pixel size and high sensitivity, while achieving high-speed, distortion-free global shutter readout in the logic layer. This technology shows great potential in industrial inspection, robotic vision, high-end film production, and applications requiring extremely high temporal accuracy, capable of effectively eliminating motion artifacts while maintaining high resolution and high dynamic range.

Shutter Technology Engineering Comparison Table

5. In-depth HDR Technology: Beyond the Limits of Native Dynamic Range and Physical Constraints

Although the native dynamic range of image sensors continues to improve, in scenes with extreme light ratios (such as backlit portraits, indoor views outside windows), a single exposure still struggles to capture all details in both highlights and shadows simultaneously. To address this challenge, High Dynamic Range (HDR) technology emerged, expanding the dynamic range of images through various strategies.

Staggered HDR

Multi-exposure HDR, often referred to as Staggered HDR, is one of the most classic technologies for achieving high dynamic range. Its core principle is to continuously capture at least two frames (usually three) with different exposure times within a very short period: a long exposure to capture dark details, a medium exposure for transition, and a short exposure to preserve highlight details. These images are then transmitted to an external Image Signal Processor (ISP) or Application Processor (AP), where they are merged into a complete HDR image through advanced synthesis algorithms.

Advantages: This method can achieve very high dynamic range, theoretically expandable indefinitely by increasing the number of exposure frames, especially suitable for static or slow-moving scenes.
Disadvantages: Its main challenge lies in handling moving objects. Due to the time difference between different exposure frames, fast-moving objects can produce noticeable "ghosting" or motion artifacts in the merged image. In addition, multi-frame capture and synthesis significantly increase the power consumption of the image sensor and the computational load of the backend processor.

Single-Frame Dual Conversion Gain (iDCG / Smart-ISO Pro)

Single-frame dual conversion gain, or intra-scene Dual Conversion Gain (iDCG), is a representative single-exposure HDR solution in high-end image sensors in recent years. Its core idea is to allow the pixel circuit to simultaneously read accumulated charges with two different conversion gains within a single exposure time:

High Conversion Gain (HCG): Used to process dark signals. In HCG mode, a small number of electrons can generate a larger voltage signal, thereby effectively reducing read noise and preserving rich dark details.
Low Conversion Gain (LCG): Used to process bright signals. In LCG mode, more electrons are required to generate the same voltage change, which is equivalent to indirectly increasing the Full Well Capacity (FWC), allowing it to accommodate more charges without saturation, thereby completely preserving highlight details.

Advantages: Since it is completed within a single exposure, iDCG technology fundamentally eliminates motion artifacts caused by multi-exposure, making it very suitable for shooting dynamic scenes. It achieves an excellent balance between dark noise suppression and highlight detail preservation.
Disadvantages: Its upper limit of dynamic range is limited by the FWC that LCG can provide. In addition, the processing of the junction (Knee Point) between HCG and LCG signals is crucial; if the algorithm or hardware design is improper, unnatural brightness or color transitions may occur in the mid-tone areas of the image.

Dual Slope Gain (DSG)

Dual Slope Gain (DSG) is another single-exposure HDR technology, similar in principle to iDCG but implemented slightly differently. DSG technology uses two different integration slopes during the charge-to-voltage conversion phase of the pixel. In the early stage of exposure, a higher slope (similar to HCG) is used to amplify weak dark signals; when the signal strength exceeds a certain threshold, the circuit switches to a lower slope (similar to LCG) to prevent bright signals from saturating prematurely . This is equivalent to creating two different gain ranges within a single pixel's response curve.

Advantages: DSG also has the advantage of single-frame motion artifact-free imaging and is particularly suitable for HDR in high-resolution or digital zoom modes, as it does not rely on operations such as pixel binning.
Disadvantages: The implementation circuit is relatively complex, and the connection point of the two slopes also requires precise calibration and processing to avoid visual artifacts.

Spatially Interleaved HDR

Spatially Interleaved HDR is a technology that extends dynamic range in the spatial dimension. Its core idea is to set adjacent pixels on the image sensor to different exposure parameters. For example, a checkerboard-like pattern can be used, where some pixels undergo long exposure and others undergo short exposure. After readout, the ISP uses advanced interpolation and reconstruction algorithms to merge the information from these two different exposure pixels into a complete HDR image.

Advantages: This is an efficient single-frame HDR solution that effectively avoids motion artifacts.
Disadvantages: Its main cost is a certain sacrifice in native spatial resolution. Since interpolation is required from surrounding pixels with different exposures, the detail performance of the final image may be slightly inferior to non-interleaved modes. In object edges or high-frequency texture areas, some reconstruction artifacts may occur.

Line Interleaved HDR

Line Interleaved HDR is a technology that achieves high dynamic range by alternating exposures row by row. Similar to spatially interleaved HDR, it is also completed within a single frame, but the exposure difference occurs at the row level of the image sensor. For example, odd rows may use long exposure to capture dark details, while even rows use short exposure to preserve highlight details. Subsequently, the Image Signal Processor (ISP) reconstructs and interpolates these rows with different exposures to generate a complete HDR image.

Advantages: It also has the advantage of single-frame motion artifact-free imaging, and compared to pixel-level spatial interleaving, its readout and processing logic may be simpler.
Disadvantages: The vertical resolution may be lost due to interpolation, which may lead to a decrease in the detail performance of the image in the vertical direction, especially in high-frequency textures or horizontal edges. In addition, if objects move quickly in the vertical direction, visual artifacts may also occur.

Physical Limits and Implementation Differences of HDR Technology

The goal of HDR technology is to extend the dynamic range of images, but its ultimate performance is still limited by the physical characteristics of the image sensor. In addition to FWC determining the highlight ceiling and read noise determining the dark floor, the quality of the Signal-to-Noise Ratio (SNR) knee point is a major challenge in HDR implementation. When synthesizing data from different exposures or gains, if the connection is improper, obvious noise discontinuities, color breaks, or brightness discontinuities may occur in the image. This requires precise algorithms and underlying register tuning to ensure a smooth transition between different signals.

Furthermore, different HDR technologies have significant differences in motion artifact suppression, power consumption, processing delay, and dependence on the Image Signal Processor (ISP). For example, multi-exposure HDR can provide extremely high dynamic range, but its motion artifact problem makes it unsuitable for all scenes; while Smart ISO Pro achieves HDR within a single frame, its performance in extreme highlights may not be as good as multi-exposure. Engineers need to choose the most suitable HDR implementation scheme and optimize it according to specific application scenarios and performance requirements.

6. Practical Optimization: Image Sensor Register Tuning Experience

In actual image engineering projects, especially when dealing with highly configurable image sensors, precisely adjusting their internal registers can greatly optimize image quality, especially in terms of HDR performance. Industry experience shows that the combined adjustment of gain and exposure time is key to achieving optimal HDR effects.

Combined Optimization of Gain and Exposure Time

Gain Tuning: Engineers need to balance Analog Gain (AG) and Digital Gain (DG). In practical engineering, gain adjustment usually follows the strategy of "prioritizing increasing AG to the sensor's analog gain limit, and then extending ISO through DG." This is because Analog Gain (AG) amplifies the signal without amplifying read noise, thus effectively improving the Signal-to-Noise Ratio (SNR); while Digital Gain (DG) amplifies both signal and noise, leading to a decrease in SNR. Therefore, in low light, AG is prioritized to reduce input equivalent noise, but it is necessary to pay attention to whether the step of the AG register will cause brightness jumps. The typical gain increase process is: increase ISO -> increase Analog Gain -> reach AG max -> start Digital Gain. A common strategy is to increase AG to around 8x, and then start using DG. In HDR applications, precise setting of AG is crucial for balancing dark details and noise levels, which helps extend the image's brightness range while maintaining lower noise.
Exposure Time Tuning: In multi-exposure HDR technology, the selection of long, medium, and short exposure times and their ratios are key to determining the final HDR image effect. For example, a typical HDR combination might be 1x, 4x, 16x exposure time ratios. However, HDR exposure ratio is also limited by sensor line time and frame timing. In engineering practice, the exposure time combination of multi-exposure HDR is not only limited by theoretical values but also needs to comply with the timing constraints of the image sensor, such as Line Time Constraint. This means that the sum of all exposure times must be less than or equal to the Frame Time, or in Staggered HDR image sensors, the short exposure time must be less than or equal to the Readout Overlap Window. Failure to comply with these restrictions may lead to problems such as frame drop, HDR merge alignment error, or rolling band. Therefore, common exposure ratios in actual tuning may be as follows:

Conversion Gain Switching Tuning: In HDR image sensors that employ Dual Conversion Gain (DCG), such as Samsung Smart ISO Pro, the imaging system simultaneously utilizes two different readout modes: High Conversion Gain (HCG) and Low Conversion Gain (LCG). This approach balances low-light signal-to-noise ratio (SNR) performance with the Full Well Capacity (FWC) in highlight areas.

In practical engineering tuning, precisely adjusting the switching threshold (Conversion Gain Switching Point) between HCG and LCG through image sensor registers can optimize the continuity of the overall SNR curve and the efficiency of dynamic range utilization.

If the switching point is set improperly, the following problems may occur:

• Noise Discontinuity: Significant noise changes appear at the junction of HCG and LCG, causing the image graininess to suddenly change.

• Tone Discontinuity: Tone breaks appear at the junction of highlight and mid-tone areas of the image.

• Color Shift: Due to slight differences in RGB channel response between different conversion gain modes, color changes may occur in highlight areas.

Practical Skills and Challenges

In the actual image sensor register optimization process, engineers usually need to combine hardware understanding, image analysis, and experimental tuning to achieve optimal image quality. Common practical methods include:

In-depth Understanding of Sensor Data Sheet: Carefully read the technical specifications of the image sensor to understand the functions, configurable ranges of each register, and the timing constraints of different Sensor Modes (e.g., frame length, line length, and HDR exposure overlap). This information is the basis for precise tuning and avoiding timing conflicts.
Utilizing Sensor Tuning Tools: Use tuning tools provided by image sensor or platform manufacturers to modify register parameters in real-time and observe image results. Through real-time preview and parameter control, the impact of different settings on exposure, gain, and HDR performance can be quickly verified, greatly improving tuning efficiency.
Establishing Standardized Test Scenarios: Design test scenarios that include highlights, dark areas, mid-tones, and moving objects, and use standard test charts (such as dynamic range charts or color calibration charts, e.g., X-Rite ColorChecker) to systematically evaluate HDR effects, dynamic range utilization, and color performance, ensuring that tuning results are objective and reproducible.
Analyzing RAW Image Data: During image sensor tuning, engineers usually directly analyze RAW output data, rather than relying solely on images processed by the ISP. By observing RAW histogram, noise level, and highlight clipping, the impact of exposure and gain settings on signal quality can be more accurately determined.
Iterative Optimization Process: Image sensor tuning is a process of repeated experimentation and verification. Engineers usually need to test, compare, and adjust parameters multiple times under different scene conditions, gradually optimizing exposure strategies, gain configurations, and HDR synthesis effects, ultimately finding the most suitable parameter combination for the image sensor and application scenario.

Conclusion

The design of image sensors is an exquisite trade-off between physics, optics, and circuits. From the evolution of pixel architecture to the establishment of noise models, and then to the breakthrough of HDR technology, each technology is trying to approach the physical limits of photoelectric conversion. As image engineers, understanding these underlying logics and performing precise tuning through registers is a key process that gives images "soul." This is the charm of the art of photoelectric conversion and the goal that image engineers constantly pursue.

Disclaimer

This note is for technical exchange and learning reference only and does not represent the official position of any image sensor manufacturer, platform supplier, or related enterprise. All technical details, practical experience, and suggestions are based on public information, industry common knowledge, and the author's personal understanding and experience. The technical illustrations and schematic diagrams used in this article are all AI-assisted generations, serving only as visual references and do not represent actual products or designs. In actual engineering applications, please refer to the official Data Sheet, development documents, and technical support provided by the image sensor manufacturer. The author and platform are not responsible for any direct or indirect losses caused by the use of the content of this note.