Properties emerging from simple photographic equations, and a new concept for a bulky photographic smartphone

I wanted to share some of my observations concerning physical limits of photographic equipment. Decades ago 35 mm film was common, and now most photographs are made with slim pocket devices. Dedicated cameras are available, but actually not many people that I know own them (I certainly don’t). It means that popular cheap cameras produced in USSR used film with image size of 36 × 24 mm (864 mm²), while modern marvels of engineering from capitalist China sport image sensors that rarely have bigger size than 8 × 6 mm (48 mm²).

Modern digital sensor have better sensitivity and resolution than ancient film made from dead animals (gelatin is an essential ingredient), so some of the problems with small frame size are alleviated. Depth of field, as I will latter show, depends only on the diameter of entrance pupil (for a given angle of view), so having a bigger camera does not mean that range of of distance at which objects can still be sharp is extended. But if reduced depth of field due to increased diameter of entrance pupil is acceptable, then more light can enter the lens. Problem with diffraction limit is also alleviated, as the absolute size of Airy disk on a sensor depends on lens f-number (focal length divided by the diameter of the entrance pupil), which means Airy disk takes up smaller portion of bigger sensor, assuming f-number stays the same.


Focal length, angle of view, what can fit within the frame, and light-gathering capability of lens

At first I will explain how to visualize what portion of the frame would an object of a given size located at some distance from a lens take, assuming given focal length and dimension of imaging sensor (height, width or diagonal). It may be obvious to some people, but I actually learned about this stuff while being 28 years old. Let’s look at the following illustration (that is not to scale):

Object that is at a distance 1000 times grater than a focal length can be 1000 times higher than a image sensor and still fit inside a frame. It means that a full-frame camera, that has image sensor with a height of 24 mm and focal length of 50mm can fit 24 m tall building that is 50 m away inside the frame (assuming that camera is located 12 m above the ground). Object located 5 m away can have maximum of 2.4 m if it is to be captured whole. This is theoretical of course, as lens can introduce distortions of geometry, and change of focus can influence focal length (breathing).

Bellow you can find equations that describe maximum size of a object that can fit in the frame and an angle of view. Following equation describes maximum height/width/diagonal of an object

    

maximum object dimension = ( s * sensor size ) / f

(1)

where s is a distance to which a lens is set, sensor size is height/width/diagonal of image sensor, and f is focal length. Angular extent of a scene in a given direction is

    

α = arctan( sensor size / ( 2 * f ) )

(2)

When it comes to a light-gathering capability of a lens, it depends mostly on an angle of view and a diameter of entrance pupil. If we imagine that full-frame camera is taking photo of a uniformly lit flat surface that fills whole frame (large TV screen displaying only white background for example), with a lens that has some focal length, then we can easily determine from above equations what portion of this flat surface is emitting light that is captured by the lens (assuming that optical axis is normal to the surface). If we multiply focal length by 2, then width and height of the surface that is within field of view of the lens are halved, and light from only ¼ of the original area can reach the sensor.

To keep the amount of light that enters the camera constant during doubling of focal length f, effective aperture diameter d (the diameter of the entrance pupil) has to increase by a factor of 2, increasing aperture area 4 times. This diameter is defined by the following equation:

    

d = f / N

(3)

where N is f-number, which is usually presented in the form f/# (that is basically right-hand side of above equation, # is the numerical value we are interested in).

Keeping f-number constant during changes of focal length makes sure that the same quantity of light passes through the lens (at least in the case of making photograph of uniformly lit surface that is normal to the optical axis with an ideal hardware). Crop sensor would require shorter focal length to keep angle of view constant, but diameter of the entrance pupil would have to stay the same, if the constant amount of photons would be expected to reach image sensor in a given period of time.

T-43 lens that was used in Smena 8(M) camera, which probably is most-produced camera ever, had focal length of 40 mm and lowest f-number of 4. This means that 10 mm diameter of entrance pupil could be achieved. Camera with a Micro Four Thirds sensor, that has a crop factor of 2 (its diameter is half as long as the one on 35 mm film frame [43.27 mm is the actual diameter]), would have to be equipped with 20 mm f/2 lens to provide the same angle of view and light-gathering capability.

Best smartphones have sensors with 13.1 x 9.8 mm size (~2.6 crop factor), ~8.5 mm focal length (~22 mm full frame equivalent), and ~5.3 mm maximum effective aperture diameter (f/1.6, f/4.3 full frame equivalent). Even some zoom lenses for mirrorless cameras can’t achieve 10 mm diameter of entrance pupil when working at 40 mm (or full frame equivalent). This is the case with Canon RF 24-105mm F4-7.1 IS STM (full frame), Sony E PZ 16-50mm F3.5-5.6 OSS II (APS-C), or Panasonic Lumix 12-60mm f/3.5-5.6 (MFT).


Airy disk

Light has a wave-like nature, and when it travels through a aperture (hole, opening) it undergoes diffraction at the circumference (or edge). Aperture effectively becomes a secondary source of the propagating wave. This secondary source interferes with rest of the light, producing a diffraction pattern. So even if a lens has perfect optical elements and has an ability to perfectly focus, then the point-source of light will become diffuse set of concentric rings with a bright central region. Airy disc is defined as diameter of the dark band surrounding bright central region:

    

 d_Airy ≈ 2.43932 * λ * N

(4)

λ is wavelength (typical human vision in bright conditions is most sensitive to 555 nm), N is f-number of the lens.

Rayleigh criterion states that two point sources of light have to be separated only by the radius of the Airy disk, so that center of of the second diffraction pastern is located directly over the first minium of the first diffraction pattern, in order to be just recoverable. But because I discuss digital sensors with relatively large pixels, and non-ideal lenses that do not focus on infinitesimally small point-sources, I will assume that minimum separation of centers of light sources is the diameter of the Airy disk.

As a side note, during derivation of the equation for the diameter of the dark band surrounding bright central region from the equation for minimal angular separation of the photographed point-sources few assumptions are made. For very small angles, its value in radians, sine, and tangent of this angle are all almost equal. This angle is very small only when the diameter of the Airy disk is much smaller than the focal length (which is usually the case).


Depth of field

For a given circle of confusion (which is maximum allowable diameter of a spot on the image sensor that was an infinitesimally small point in the photographed scene), and a lens, with a given f-number and focal length, there exist a distance to which lens can be set, beyond which all objects can be imaged and still be sharp (acceptable sharpness is defined as a size of circle of confusion), as well as the objects that lie between half of of this this distance and the distance itself,

    

H ≈ f^2 / ( N * c ) = ( f * d ) / c

(5)

This distance is called hyperfocal distance, circle of confusion is represented by variable c. Following equations for near and far DOF limits can depend only on hyperfocal distance H and and distance to which a lens is set s. You can see that when focal length, circle of confusion, and f-number are multiplied by a common factor k, meaning that lens and sensor are scaled, keeping angle of view, diameter of entrance pupil, and size of circle of confusion relative to the sensor size constant, then this factor in the numerator and denominator cancels itself out. It shows that diameter of the entrance pupil d is the parameter that influences hyperfocal distance and depth of field during taking a picture of a particular scene, with a given composition of objects in the frame, regardless of the camera size.

    

H ≈ (k*f)^2 / ( (k*N) * (k*c) ) = ( (k*f) * d ) / (k*c)

(6)

Near and far limits of the depth of field for a lens set to a distance s are:

    

D_N ≈ ( H * s ) / ( H + s ) = ( f^2 * s ) / ( f^2 + s * N * c ) = ( f * s * d ) / ( f * d + s * c )

(7)

    

D_N ≈ ( H * s ) / ( H + s ) = ( f^2 * s ) / ( f^2 - s * N * c ) = ( f * s * d ) / ( f * d - s * c )

(8)

Depth of field is a difference between those limits:

    

DOF ≈ D_F - D_N = ( 2 * H * s^2 ) / ( H^2 - s^2 ) = ( 2 * N * c * f^2 * s^2 ) / ( f^4 - N^2 * c^2 * s^2 ) = ( 2 * f * s^2 * d * c ) / ( f^2 * d^2 - s^2 * c^2 )

(9)


Distribution of sharpness

I created a spreadsheet (download link below) that that uses equations from this post to plot what sharpness and useful resolution can be expected for photographed objects located at different distances from various cameras with varying settings.

Below you can find graph thar shows behavior of two full-frame cameras (36 × 24 mm sensors), with 50 mm focal length lenses that have focus set to 20 meters. Red line represents camera with f-number of 1.8, blue one is camera with f-number of 7.1 (4 stop difference, keeping exposure constant would require 16 times longer shutter speed or 16 times higher ISO sensitivity, because entrance pupil with ¼ of the diameter has 16 times smaller area). Smooth parts of the lines are calculated using equations 7 and 8. Horizontal lines represent cut-off of maximum sharpness due to diffraction (vertical lines are just crude limiting of graph size).

Calculation of maximum number of lines per millimeter of image sensor assumes that diffraction limit is reached, when image of infinitesimally thin monochromatic parallel lines is spaced on a sensor, so that first dark stripe of diffraction pattern of a single line will be shared by diffraction pattens of neighboring lines. For a digital sensor with square pixels this is not ideal, because 2 rows of pixel are at least needed to represent single line (one for the line, one for the background [background row is shared with another line]). If centers of diffraction patterns would be located not in the middle of the pixels, but on their edges, then all rows of pixel would be illuminated by the same amount of light. “Stripes of confusion” used to calculate DOF at particular lppmm have a width that is half of the spacing of the hypothetical infinitesimally thin lines imaged perfectly sharp on a sensor. If “stripes of confusion” would fit perfectly within pixels, that have the same width as “stripes of confusion”, then perfect contras would be achieved, rows of pixels would alternate between being lit and completely dark. However, if “stripes of confusion” would fall at the edges of pixels, then all of them would receive exactly the same amount of light.

It is worth to point out that increasing diameter of entrance pupil (decreasing f-value) beyond some value may reduce maximum achievable sharpness of a real lens, even if Airy disk gets smaller. Large effective aperture exposes outer parts of optical elements to transmitted light, inducing larger aberrations. Real lens also will provide sharper image in the center of the image sensor than in its corners.

Next graph represents what would be maximum amount of pixels in a horizontal row, that has a width of image sensor, if width of every pixel would be equal to the diameter of the circle of confusion used to calculate near and far DOF limits (or radius of Airy disc for the highest possible resolution). Maximum useful horizontal resolution is equal to the number of line pairs per millimeter, multiplied by 2 (line need a background) and width of a sensor in millimeters. Word “maximum” is a misnomer, as more pixels would allow better differentiation between imperfect images of lines or points, but a real lens would introduce additional distortions to the image of objects with some some width, so actually amount of pixels in the graph bellow should be a good approximation of the maximum useful digital resolution.


Colors

Note that pixels in camera sensor are different from the ones in displays. Cameras usually use Bayer filter, which means that every photodiode is covered by either red, green, or blue color filter (there are 2 green filters for every red or blue filter), and a microlens (one of its function is redirection of light coming at an high angle into particular fotodiode, this results in smaller pixel crosstalk and higher photosensitivity). Each photodiode counts as pixel, despite the fact that only 1/3 of color information for that spot was collected. Remaining colors for that pixel are obtained through the process of demosaicing, with uses the information from neighboring pixel to interpolate color values.

Typical display uses pixels that consist of 3 individual subpixels (red, green, and blue). So a TV with a 3840 × 2160 resolution has approximately 8.3 million pixels (megapixels), and 24.9 million subpixels.

There is some wisdom in making image sensor pixels regular polygons, especially when they are covered by lenses (that do not have to be “blobs” of materials, digital-microlenses can also be used). Bayer filter uses twice as many green pixels (luminance-sensitive elements), as red or blue (chrominance-sensitive elements), because human vision is most sensitive to the green color. Human retina has much grater density of M and L cone cells, that are sensitive to the green color, than S cones, that are sensitive almost only to the blue light. Maybe imitating human vision in a camera, but more chromatic experience of the real world in a display, that is supposed to be seen by the human eye, is a optimal solution. Nevertheless, when I learned about megapixels of digital image sensors not being the same as megapixels of displays (that are more familiar, as they can be seen by the naked eye from close up), I felt cheated.

Other arrays of color filters exist, beside the one invented by Bryce Bayer. One of them is X-Trans, that is supposed to reduce occurrence of Moiré patterns, but it does not perform this ideally, and exhibits problems with reproduction of colors (less chrominance-sensitive elements, more luminance-sensitive elements). Displays sometimes also do not have pixels that consist out of 3 subpixels, that are used only by this pixel, as was the case with PenTile RGBG.

It is worth noting that one patch of color filter in a image sensor may be shared by few photodiodes. Lots of smartphones use Quad Bayer filter pattern (the ones with crazy amount of megapixels), where 4 neighboring photodiodes with 4 individual microlenses are sensitive to same color. Those photodiodes can be binned, resulting in good sensitivity in low light conditions. Noise performance of binned pixel array can possibly be better than if they were one single photodiode, at least according to this source that assumes presence of Gaussian white noise. Another source claims that binned pixels in astrophotography can lead to reduced time needed to achieve high SNR, when imagining at dark site with high read noise camera, and high f-number or narrowband filter. In other mode photodiodes can work independently, in order to increase resolution (albeit only slightly).

Those independent photodiodes could also be used to take 2 pictures at the same time, one with shorter exposure, and another one with longer. Those two (or more) images can be combined into one HDR picture of moving object (if 2 consecutive shoots would be taken, moving object would be in different position in each one of them), that preserves details oh the objects in the shadows, as well as those those that are brightly lit. This feature is extremely beneficial to a camera that is to be able to compete with old gelatin-based film technology, as digital sensors have narrow dynamic range and are prone to overexposure, with is particularly important during making pictures of exhaust jets (the ones that are full of glowing particulates) during rocket launches, as detailed in this Curious Droid video (that also notes existence of rolling shutter in many digital cameras as another drawback of modern equipment).

Quad Bayer (also called Tetracell, 4-cell, or Quad CFA) uses 2×2 photodiode array that shares the same color filter, but variants with 3×3, or 4×4 array also exist.

Arrays of 2 or 4 photodiodes sharing color filter can also share the same microlens, that covers whole color filter. They are used in dual- or quad-pixel phase detection autofocus. Photodiodes arranged in this way have directional sensitivity to light. So if light coming from opposite sides of the lens has different intensity, it means that object imaged onto this pixel array is out of focus. Patterns in data collected from multiple PDAF pixels can be analyzed and used to quickly change the focus of the camera.


Idea for a not-so-flat smartphone with an extendable physically large lens

Larger camera should be able to do the same things as the smaller one at high f-number, with the advantage of using on central portion of the lens and having larger amount of pixels in the image sensor, and also do other things at lower f-number, such as collect more light, and possibly make extremely sharp photograph of objects lying within shallow depth of filed (ability to blur objects lying outside DOF may be an advantage in some artistic cases).

I propose that camera composed out of 3 parts could be attached to the back of the smartphone. In the “flat” state those 3 parts, would lie on the back of the smartphone. To achieve “extended” state scissor mechanisms would move them away from the smartphone and actuators (probably the same ones that are used to extend scissor mechanism) on the interface between scissor mechanism and smartphone would change positions of the parts. I imagine that 4 actuators would be required for a part of the camera that “raises above” smartphone and changes position, 2 for the part that “raises” but stays in the same place “above” back of the smartphone, and one actuator for the part that moves, but does not “raise”.

Additional sliding element linear guides could be used to interlock parts and provide greater stiffness for camera (some carriages or rails may have to be actuated in narrow range). Additional mechanisms that firmly lock parts in place would be a nice addition.

In the image above I showed only bare scissor legs, but I imagine that final construction would be surrounded from all sides by a “wall” that restricts movement of scissor mechanisms in an undesired direction, preventing it from crumbling, when camera is not extended while pointing at the zenith. This “wall” could be filled by batteries, heatsinks, or other components.

Flat “plate” that covers entire mechanism and protects it should also be employed. Actually only 2 out of 3 parts of camera need to be covered by the “plate”. If this part is further split in two parts that are mounted on the hinges, then only one part has to be continuously in rotated position (and can provide additional support for the weight of the camera). The other part of the “plate” needs to move out of the way only during extension process, and then can be returned to the resting position.

This mechanism would not provide great weather resistance (at least during extension process). Simplest solution I can think of would be integrated mount for a small hi-tech umbrella, that would have to be stabilized.

Parts of the camera have height of 25 mm and width of 50 mm. When smartphone is in “flat” configuration it has thickness of about 30 mm. It much more than what Apple, Samsung and Huawei offer nowadays, but years ago cellular phones with thickness of more than 20 mm were common. Rugged Unihertz Tank 3 Pro with built-in DLP projector, 23.8 Ah battery, and a cooling fan has a thickness of 31 mm.

When extended, my photographic smartphone would have a thickness of about 80 mm. This should be enough to fit APS-C image sensor (23.5 × 15.6 mm), and 35 mm f/2 lens (or something similar). Other extension mechanisms are possible, but they do not allow filling the space inside the lens with thick optical elements. If longer focal length would be required, then combination of this mechanism with the more common one, found in compact cameras, could accommodate it.


Files:
photographic_equations_and_smartphone_v1.zip, mirror, SHA256: 61b1923926dc427c5de7a58c8ad7f124a5e12eaf7a469c5cf9b7a72fa4f04453)


I dedicate this work to J., and A. Their need to buy cameras with decent quality inspired me to learn more about photography, and to create spreadsheets with calculations that model behavior of lenses. Obviously, free-market economics made fulfillment of the desire to shoot better photos and videos unrealized.

Comments