AUGMENTED REALITY SYSTEM AND METHOD FOR SUBSTRATES, COATED ARTICLES, INSULATING GLASS UNITS, AND/OR THE LIKE
Certain example embodiments relate to an electronic device, including a user interface, and processing resources including at least one processor and a memory. The memory stores a program executable by the processing resources to simulate a view of an image through at least one viewer-selected product that is virtually interposed between a viewer using the electronic device and the image by performing functionality including: acquiring the image; facilitating viewer selection of the at least one product in connection with the user interface; retrieving display properties associated with the at least one viewer-selected product; generating, for each said viewer-selected product, a filter to be applied to the acquired image based on retrieved display properties; and generating, for display via the electronic device, an output image corresponding to the generated filter(s) being applied to the acquired image. The electronic device in certain example embodiments may be a smartphone, tablet, and/or the like.
Latest Guardian Glass, LLC Patents:
This application is a continuation of U.S. Application Serial No. 16/583,881 filed on Sep. 26, 2019, which claims priority to U.S. Application Serial No. 62/736,538 filed on Sep. 26, 2018, the entire contents of which are hereby incorporated herein by reference in their entireties.
TECHNICAL FIELDCertain example embodiments of this invention relate to augmented reality systems and methods for substrates, coated articles, insulating glass units, and/or the like.
BACKGROUND AND SUMMARYGlass has long been incorporated into buildings and other structures for aesthetic purposes. For example, design architects designing buildings oftentimes will desire a particular coloration, amount of visible light transmission, amount of visible light reflection, and/or other aesthetic properties for a given project, e.g., to enhance its aesthetic appeal, set it apart from other projects, comport with a particular “neighborhood feel,” etc.
Glass exhibits multiple effects, many of which are subtle. Yet even subtle effects can have a profound impact on aesthetics if magnified over a broad area as in the case of, for example, an office building with many stories. Although more easily perceivable aspects such as stated coloration can at some level be grasped by design architects, it oftentimes is difficult to gauge how minor changes in transmission and/or reflection might affect a project. Of course, even with a property as seemingly simple as coloration, there are many fine gradations that may not be readily appreciated.
Moreover, “off-axis” properties related to transmission, reflection, coloration, haze, and the like, also can have a profound impact on the overall aesthetics of a project. In other words, although properties such as coloration, transmission, and reflection typically are reported as nominal values, such nominal values generally assume an orthogonal viewpoint and thus do not fully and accurately reflect how a facade (for example) might be perceived when viewed at an angle, or how the outside of a building might be viewed when standing or looking at an angle.
To help combat these issues, design architects may have on-hand a collection of sample products. Additionally, some architects order sample products that are built to their specifications. Unfortunately, however, it oftentimes is difficult to maintain a large collection of samples. For example, such collections can become outdated fairly quickly, e.g., as technology related to functional coatings (e.g., low-emissivity, antireflection, and/or other coatings) continues to advance, as building certifications and standards change, etc. The storage requirements for a meaningful collection of samples also may be problematic.
When it comes to ordering sample products, and even those that are built to precise specifications, there is unfortunate waste created. It also has been found that there oftentimes is a large divergence between what a design architect thinks is being ordered and what actually is being delivered. It has been observed that this discrepancy has led to increasing frustration on the part of architects in terms of the sample delivery and disposal approach. It also has been observed that architecture firms across the board are moving to eliminate their sample libraries.
The assignee of the instant application has developed a “glass calculator” that provides precise optical information for a range of configurable products. This glass calculator allows users to specify types of glass, glass thicknesses, coatings, coating locations, spacer configurations for insulating glass (IG) units, etc. Once specified, the glass calculator generates and displays detailed information about the optical properties expected for the custom configuration. The glass calculator has proven to be a valuable tool for glass fabricators. Unfortunately, however, design architects may not be able to fully understand the detailed output of the glass calculator. Moreover, in the design realm, “seeing is believing” -- and the calculator cannot take the place of actual samples in this respect. And because the glass calculator was created with fabricators in mind, it is not easy or intuitive for design architects to quickly solve design challenges in a format and “language” that is meaningful to them.
Thus, it will be appreciated that there is a need in the art for tools designed to help design architects visualize the performance of windows, including when standing “outside” the window and “looking in,” and when standing “inside” the window and “looking out.”
Certain example embodiments help address these and/or other concerns. For instance, certain example embodiments apply augmented reality (AR) techniques so that an electronic device can be used as if it were a window. In certain example embodiments, the user can select aspects of the window. The user uses the device to “frame” an object or objects, e.g., by placing the device between the user and the object or objects and looking “through” the device (using cameras, displays, and/or other hardware elements of and/or coupled to the device) as if the device were a frame for the object or objects. “Framing” in certain example embodiments involves image processing based on a determination of the orientation of the user’s gaze relative to the device and an object. One or more cameras (e.g., a 360 degree camera, two 180 degree cameras, and/or the like) are used to help determine where a human user is looking and what a human user is looking at, e.g., using face and/or eye-tracking algorithms. Once determined, the object(s) being imaged is/are displayed on a display, with a custom filter being applied thereon to simulate the effects of the window. The filter is custom-generated in some instances based on specifications related to the selected aspects of the window, as well as the user’s position and/or orientation relative to the virtual window (and may in some instances be driven by or at least involve spectrum data).
It is noted that the use of two-sided fish eye cameras on mobile devices in general is rare, and that certain example embodiments make use of this rare arrangement. For instance, certain example embodiments isolate processing of the transmitted image of a glass sample through a filter that is applied only to the target-facing camera (or camera view), and likewise isolate processing of the reflected properties related to the image from the user-facing camera (or camera view).
Certain example embodiments relate to an electronic device, comprising a user interface, and processing resources including at least one processor and a memory. The memory stores a program executable by the processing resources to simulate a view of an image through at least one viewer-selected product that is virtually interposed between a viewer using the electronic device and the image by performing functionality comprising: acquiring the image; facilitating viewer selection of the at least one product in connection with the user interface; retrieving display properties associated with the at least one viewer-selected product; generating, for each said viewer-selected product, a filter to be applied to the acquired image based on retrieved display properties; and generating, for display via the electronic device, an output image corresponding to the generated filter(s) being applied to the acquired image. The electronic device in certain example embodiments may be a smartphone, tablet, and/or the like.
Certain example embodiments relate to methods of using the electronic device described herein. Similarly, certain example embodiments relate to non-transitory computer readable storage media tangibly storing a program that, when executed by a processor of a computing device, performs such methods. Still other example embodiments relate to the program per se.
The features, aspects, advantages, and example embodiments described herein may be combined to realize yet further embodiments.
These and other features and advantages may be better and more completely understood by reference to the following detailed description of exemplary illustrative embodiments in conjunction with the drawings, of which:
Certain example embodiments of this invention relate to augmented reality (AR) systems and methods for substrates, coated articles, insulating glass units, and/or the like. For instance, certain example embodiments may be used to simulate the performance of glass simulation including, for example, reflection, transmission, coloration, haze, and/or the like. In certain example embodiments, streaming composite images that replicate the unique optical characteristics of windows (e.g., uncoated substrates, coated articles such as float glass supporting sputtered and/or other coatings, insulating glass (IG) units, etc.). Such renderings may be created in real-time, e.g., in dependence on the user’s perspective (including position and/or orientation), as well as the user’s surroundings. This approach may create an experience that is similar to the user holding a “real” window or piece of glass. Certain example embodiments enable an earlier-photographed scene to be selected, e.g., enabling the technology disclosed herein to be used to showcase (for example) how the glass looks in locations that are different from the user’s current location. Motion in this predetermined scene can be driven by the same face-tracking system disclosed herein; using gyroscopes, accelerometers, and/or other inertial sensors of and/or coupled to the device; and/or the like.
Referring now more particularly to the drawings,
It will be appreciated that the situations shown in
More particularly,
Although certain example viewing angles and positions are shown and described in connection with
The hardware processor 702 of device 106′ is operably coupled to a device memory 704, which may be any suitable combination of transitory and/or non-transitory storage media (including, for example, flash memory, a solid state drive, a hard disk drive, RAM, and/or the like). The memory 704 include an operating system (OS) 706 that helps the device 106′ run. The OS 706 may be an embedded OS in some instances, and it may, for example, provide for other capabilities of the device such as, for example, camera functionality, telephone calls, Internet browsing, game playing, etc.
Hardware drivers and/or application programming interfaces (APIs) 708 enable program modules stored to the device 106′ to access at least the hardware features thereof. For example, the hardware drivers and/or APIs 708 may enable the AR control program logic program module 712 to access the user-facing and/or target-facing camera 110a/110b so that it can take pictures or otherwise receive video input, provide output to the display device 108, download updates via the network interface 710, etc.
The AR control program 712 includes a number of modules or sub-modules that may be customized for or otherwise related to the desired application. For the AR window use case, for example, a product configurator 714 may be provided so that the user can specify the component of the window to be simulated. In certain example embodiments, the AR control program logic 712 will present a user interface in connection with the display device 108 that guides the user through the selection of predefined configurations, the selection of parts for a “build-your-own” arrangement, and/or the selection of aesthetic properties.
The component store 716 may store information about the userselectable components that may be selected using the product configurator 714. In this regard, the component store 716 may be arranged as a database (e.g., a relational database, XML database, or the like). In certain example embodiments, different database structures may be used for the different design configuration options. In certain example embodiments, separate tables may be provided for each of predefined configurations, substrate types (e.g., with specification of materials such as glass, plastic, etc.; thickness; product or trade name; etc.), spacer types (e.g., warm edge or cool edge, aluminum, trade name, etc.), coating types (e.g., specific low-emissivity, antireflection, antifouling, and/or other coatings), laminating material (e.g., PVB, EVA, PET, PU, etc.), and/or the like. Different database structures storing performance metrics (e.g., coloration, transmission, reflection, etc.) additionally or alternatively may be provided and linked to the predefined and/or component information. In certain example embodiments, these performance metrics may be stored in the attribute store 718 (although separate databases and/or database structures need not necessarily be provided). Performance metrics may include and/or facilitate the calculation of CIE color coordinate information (e.g., a*, b*, L*, etc.), light-to-solar gain (LSG) values, solar heat gain coefficient (SHGC) data, and/or the like, with angular data being provided or being calculatable where relevant.
A user may, for instance, select a single 3 mm thick clear glass substrate with a single low-E coating thereon (e.g., a SunGuard, ClimaGuard, or other specific coating commercially available from the assignee), a laminated article with two 3 mm glass substrates laminated together with a PVB interlayer, an IG unit including two 3 mm glass substrate separated by 10 mm with an 80/20 Argon-to-Oxygen backfill separated by an IET spacer with a ClimaGuard coating on surface 2, etc. These are, of course, merely examples of user selections. Fully or partially pre-configured coated articles, glazings, IG units, VIG units, and/or the like, may be userselectable. Coatings, substrates, configurations, etc., from the assignee, or other parties, and/or the like may be selected, as well. A database of selectable options may be remotely updated and/or remotely maintained (and in this later instance accessed by the device over the Internet or other suitable network connection) so that further (e.g., future-developed) coatings, substrates, and/or the like may be added and selectable over time. Selections may be more or less fine-grained in certain example embodiments. Also, as noted above, selections may be specified in whole or in part in terms of performance characteristics (e.g., coloration, visible transmission, reflection, emissivity, etc.), selections may be based on predefined configurations that may be accepted or further customized in some instances, etc. Motion effects also may be used to help communicate reflected color information and/or other optical performance. Dynamic color representation may be shown, for example, in “digital swatches,” that are viewable using a device according to certain example embodiments. Such swatches could be selected by a user and then superimposed on an obtained image.
The gaze recognition module 716 recognizes where the user is looking. The gaze recognition module 716 may be static or real-time, e.g., so that the user’s gaze is tracked. An example of how a user’s gaze may be tracked is provided below (e.g., in connection with
The transform module 720 helps perform the operations shown schematically in
Example image processing techniques for determining face detection and position, as well as front (user-facing) and rear (target-facing) image rendering will now be provided. It will be appreciated that these approaches are provided by way of example and without limitation. For ease of illustration, the electronic device 106′ in the examples that follow is assumed to be a 2017 iPad Pro 10.5” (A1701 or A1709) held in the “portrait” orientation. It will be appreciated that different devices and/or orientations may be used in different implementations. In such cases, assumptions concerning the pixels-per-inch, screen dimensions, camera specifications, etc., may be adjusted. In this example:
- p = 264, pixels-per-inch of the screen
- ω = 1668, the screen width in pixels
- η = 2224, the screen height in pixels
Different coordinate spaces are involved in the example image transforms described herein. A first coordinate space relevant to certain example embodiments is the fisheye image space, expressed in Cartesian coordinates. This space is a square two-dimensional space parameterized by the Cartesian coordinates (u, v). As can be appreciated from
- (0, 0) is located at the bottom left of the square.
- u and v are measured in pixels.
- (u, v) = (d, d) is located at the top right of the square (where d = diameter).
- The fisheye image captured within this square is assumed to lie wholly within the circle of radius d/2 centered at the center of the square, such that (u, v) = (d/2, d/2). It is noted that this assumption and other assumptions may be made in certain example embodiments, but other assumptions may be used in different example embodiments. Furthermore, certain example embodiments may automatically calculate and/or determine factors such as these. As an example, certain example embodiments may make the assumption that the fish eye image will be centered on the screen, whereas different example embodiments may instead automatically identify the position of the fish eye image without exact manual centering, e.g., using image processing routines.
- The fisheye image captured within this square is assumed to be upright. It is noted that this assumption was made because most existing face tracking APIs require an upright image. However, this assumption might not stay true for other embodiments of the invention, e.g., where the user is to be visible at non-upright angles relative to the device.
A point (u0, v0) in fisheye image space (with Cartesian coordinates) can be converted to a point (r0, θ0) in fisheye image space (with polar coordinates) using the formulae:
It follows that a second coordinate space relevant to certain example embodiments is the fisheye image space, expressed in polar coordinates. This space is a circular two-dimensional space parameterized by the polar coordinates (r, θ). As can be appreciated from
- r is measured in pixels and varies between 0 and d/2 .
- θ is measured in radians and varies between -πand π, where θ = π/2 points directly up.
- The fisheye image is assumed to completely cover this space.
- The fisheye image captured within this space is assumed to be upright.
A point (r0, θ0) in fisheye image space (with polar coordinates) can be converted to a point (u0, v0) in fisheye image space (with Cartesian coordinates) using the formulae:
A point (r0, θ0) in fisheye image space (with polar coordinates) can be converted to camera space (with optical coordinates) using the formulae:
A third coordinate space relevant to certain example embodiments is the target image space. This space is a rectangular two-dimensional space parameterized by the Cartesian coordinates (x, y). As can be appreciated from
- (0, 0) is located at the bottom left of the rectangle.
- x and y are measured in pixels.
- (x, y) = (w, h) is located at the top right of the rectangle, where w and h satisfy w:h = 2:3 to match the aspect ratio of the example iPad screen.
- The output image displayed in this space is assumed to be upright.
A point (x0, y0) in target image space can be converted to a point (X0, Y0, 0) in screen space using the formulae:
A fourth coordinate space relevant to certain example embodiments is the screen space. The screen space is a three-dimensional space parameterized by the Cartesian coordinates (X, Y, Z). As can be appreciated from
- (0, 0, 0) is located at the center of the example iPad’s screen.
- X, Y, and Z are measured in inches.
- The X-Y plane is coplanar with the example iPad’s screen, with the X-axis parallel to the short side of the example iPad and increasing from left-to-right as viewed from the front, and the Y-axis parallel to the long side of the example iPad and increasing from bottom-to-top.
- The Z-axis is perpendicular to the example iPad’s screen. The half-space {Z > 0} lies wholly in front of the example iPad.
- (X, Y, Z) form a right-handed coordinate system.
The front / user-facing camera is assumed to be located at F = (0, 4.57, 0) in screen space. The back camera is assumed to be located at B = (3.07, 4.55, 0) in screen space. It will be appreciated that different locations will be provided for different cameras of or connected to different devices, and that the example conversions discussed immediately below can be modified to take into account these different placements.
A point (X0, Y0, Z0) in screen space can be converted to back camera space (with Cartesian coordinates) using the formulae:
A point (X0, Y0, Z0) in screen space can be converted to front camera space (Cartesian coordinates) using the formulae:
The front (user-facing) and back (target-facing) camera spaces as expressed with Cartesian coordinates also are relevant to certain example embodiments. These spaces are three-dimensional spaces parameterized by the Cartesian coordinates (I, J, K). As will be appreciated from
- The camera is located at (0, 0, 0).
- From the perspective of the camera: (1, 0, 0) is directly to the right of the camera; (0, 1, 0) is directly in front of the camera; and (0, 0, 1) is directly above the camera.
A point (I0, J0, K0) in camera space (with Cartesian coordinates) can be converted to camera space (with optical coordinates) using the formulae:
A point (I0, J0, K0) in front camera space (with Cartesian coordinates) can be converted to screen space using the formulae:
The front (user-facing) and back (target-facing) camera spaces as expressed with optical coordinates also are relevant to certain example embodiments. These spaces are two-dimensional hemispheres of arbitrary radii parameterized by the angles (Ψ, ϕ). As will be appreciated from
- Both angles are measured in radians.
- Ψ lies in the range [-π, π], where the line Ψ = 0 lies directly to the right of the camera.
- ϕlies in the range [0, π/2], where the line ϕ = 0 corresponds to the camera’s optical axis and the plane ϕ = π/2 is coplanar with the example iPad.
A point (Ψ0, Ψ0) in camera space (with optical coordinates) can be converted to fisheye image space (with polar coordinates) using the formulae:
A point Ψ0, Ψ0) in camera space (with optical coordinates) at a distance d from the camera can be converted to camera space (with Cartesian coordinates) using the formulae:
Given this background and with respect to this example electronic device 106′, further detail concerning the face detection and positioning image processing will now be provided. In this regard,
In step S902, the eye positions (u1, V1) and (u2, v2) are converted from fisheye image space (with Cartesian coordinates) to fisheye image space (with polar coordinates). The following formulae may be used in this regard:
In step S904, the eye positions are converted from fisheye image space (with polar coordinates) to front camera space (with optical coordinates). The following formulae may be used in this regard:
In step S906, the front camera space (with optical coordinates) is converted to front camera space (with Cartesian coordinates), normalized such that |I1| = |I2| = 1. It is noted that the next step assumes that the input vectors are already normalized. The following formulae may be used in this regard:
In step S908, the central angle δ between two points on a sphere using a well-conditioned vector formula: δ = atan2(|I1 × I2|, I1 ·I2). As is known to those skilled in the art, central angles are subtended by an arc between those two points, and the arc length is the central angle of a circle of radius one (measured in radians). The central angle is also known as the arc’s angular distance.
In step S910, the distance D (in inches) between the user and the front (user-facing) camera is estimated. The following formula may be used in this regard: D = (s/2)/tan(δ/2). In some instances, it may be necessary or advantageous to account for great circle distance (as opposed to linear distance) to improve accuracy as the user approaches the camera.
In step S912, the average eye location in fisheye image space (with Cartesian coordinates) is computed. The following formulae may be used in this regard:
In step S914, these locations in fisheye image space (with Cartesian coordinates) are converted to fisheye image space (with polar coordinates). The following formulae may be used in this regard:
In step S916, these locations in fisheye image space (with polar coordinates) are then converted to front camera space (with optical coordinates). The following formulae may be used in this regard:
It is noted that it may not be feasible to simply average Ψ1,2 and ϕ1,2 to obtain Ψa and ϕa because the transformation from fisheye image space to front camera space is nonlinear.
In step S918, the average eye location in front camera space (with optical coordinates) is projected a distance D from the camera, and a conversion to front camera space (with Cartesian coordinates) is performed. The following formulae may be used in this regard:
In step S920, this information is converted to screen space. The following formulae may be used in this regard:
It will be appreciated that the location of the user may be important because the user typically will move the device to “point at” or otherwise “frame” the object of interest. That is, the user is in essence framing a shot with a device, much like a professional photographer would do. Taking this into account, certain example embodiments may simplify the calculations that are performed by determining only where the user is rather than alternatively or additionally determining where the device is and/or how it is oriented. Of course, this determinations as to where the device is and/or how it is oriented can be taken into account in certain example embodiments, e.g., to verify that the image processing is being performed correctly. This may be accomplished in connection with accelerometers, gyroscopes, and/or other devices that typically are found in smart devices like tablets and smartphones. In any event, once the user is positioned, further image processing is possible to handle changes in perspective, augment reality (e.g., to simulate what looking through a window might be like), etc.
Here, X0 = (X0, Y0, 0) denotes the target point in screen space
In step S1004, equation of the line in screen space passing through the target point X0 and user’s eyes U is then given as:
Here, D = (X0 - U)/∥X0 - U∥ is a vector of length 1 pointing from the user’s eyes U to the target point X0 , and λ is a real number that represents distance traveled along this vector starting from X0.
The correct portion of the half-space {Z < 0} is to be displayed according to the user’s perspective. However, the image of that half-space is captured from the back camera’s perspective. To attempt to allow correction for this perspective offset, the half-space {Z < 0} is (normally) collapsed onto the hemisphere {Z < 0, (X - B) . (X - B) = R2} of radius R centered at the back camera location B. Then, it is possible to either (a) pick R according to some heuristic, or (b) allow users to calibrate the value of R manually. See step S1006 in these regards.
In step S1008, to find the point on the hemisphere that should be displayed at the screen coordinate X0 , the line from equation (1) is extended (in essence tracing the user’s line of sight) until it intersects with the hemisphere. See
It is noted that when R >>∥B - U∥, ∥X0 - B∥, the solutions of this equation are approximately ±R and, thus, the same solutions that would be expected if the offset were completely ignored. To determine whether to pick the positive or negative root in equation (2), the Z component of X0 + λ0D, which is
- -λ0W/∥X0 - U∥, is considered. This should be negative so as to pick the intersection point that lies behind the example iPad, not the intersection point that lies in front of the example iPad. This should only be possible if λ0 > 0. Therefore, the positive root in equation (2) typically will be selected:
-
The intersection point Xi = X0 + λ0D in screen space is now known, and this is converted to back camera space (with Cartesian coordinates) in step S1010. The following formulae may be used in this regard:
As above, these formulae may be modified based on physical parameters of the example device, etc.
In step S1012, these coordinates are converted to back camera space (with optical coordinates). The following formulae may be used in this regard:
In step S1014, these coordinates are converted to fisheye space (polar coordinates). The following formulae may be used in this regard:
In step S1016, the fisheye space (with Cartesian coordinates) values are determined. The following formulae may be used in this regard:
The front image rendering techniques of certain example embodiments may be similar to the rear image rendering techniques used therein.
Here, X0 = (X0, Y0, 0) denotes the target point in screen space
In step S1104, equation of the line in screen space passing through the target point X0 and user’s eyes U is then given as:
Here, D = (X0 - U)/∥X0 - U∥ is a vector of length 1 pointing from the user’s eyes U to the target point X0 , and λ is a real number that represents distance traveled along this vector starting from X0.
The equation of the line in screen space representing the incident ray that will be reflected along the line in equation (3) is then:
Here, E is a reflected vector of length 1 pointing at the target point X0 , and µ is a real number that represents distance traveled along this reflected vector starting from X0. Because reflection occurs in certain example embodiments, the plane {Z = 0}, E can be obtained from D by switching the sign of the Z-component. See step S1106 in this regard.
The correct portion of the half-space {Z > 0} is to be displayed according to the user’s perspective. However, the image of that half-space is captured from the front (user-facing) camera’s perspective. To attempt to allow correction for this perspective offset, the half-space {Z > 0} is (normally) collapsed onto the hemisphere {Z > 0, (X - F) (X - F) = R2} of radius R centered at the user-facing camera location F. Then, it is possible to either (a) pick R according to some heuristic, or (b) allow users to calibrate the value of R manually. See step S 1108 in these regards.
In step S1110, to find the point on the hemisphere that should be displayed at the screen coordinate X0 , the line from equation (4) is extended (in essence tracing the user’s reflected line of sight) until it intersects with the hemisphere. This intersection point is then given by one root of the following equation:
It is noted that when R >>∥F - U∥, ∥X0 - F∥, the solutions of this equation are approximately ±R and, thus, the same solutions that would be expected if the offset were completely ignored. To determine whether to pick the positive or negative root in equation (5), the Z component of X0 + µ0E , which is µ0W/∥X0 - U∥, is considered. This should be positive so as to pick the intersection point that lies in front of the example iPad, not the intersection point that lies in behind the example iPad. This should only be possible if µ0 > 0. Therefore, the positive root in equation (5) typically will be selected:
The intersection point Xi = X0 + µ0E in screen space is now known, and this is converted to front camera space (with Cartesian coordinates) in step S1112. The following formulae may be used in this regard:
As above, these formulae may be modified based on physical parameters of the example device, etc.
In step S1114, these coordinates are converted to front camera space (with optical coordinates). The following formulae may be used in this regard:
In step S1116, these coordinates are converted to fisheye space (polar coordinates). The following formulae may be used in this regard:
In step S1118, the fisheye space (with Cartesian coordinates) values are determined. The following formulae may be used in this regard:
It will be appreciated that the scene shown in
One or more inertial sensors such as an accelerometer, gyro, and/or the like, may be used to detect changes in orientation of the device. In certain example embodiments (e.g., where a static or pre-recorded image is used), tools such as, for example, Apple’s AR Kit (which uses both gyros and image tracking) may be used to obtain get highly accurate spatial positioning of the device.
It will be appreciated that the camera tracking version (e.g., with no gyros used) of a mobile application could be implemented for laptops, desktop computers, large screens mounted on a wall (e.g., like the Microsoft surface hub), etc. In general, it will be appreciated that the example application described in connection with these example screenshots may be provided on any suitable electronic device including, for example, a smart phone, tablet, laptop, desktop, video wall, and/or the like.
As noted above, different products may be simulated in different areas of the display. For example, tapping anywhere in a quadrant may produce a list of pre-configured options that may be selected by a user to change the appearance of that quadrant to simulate the selected option, e.g., as shown in
When looking at glass in the real-world, it is seen as a composite of a scene reflected in the glass and a scene transmitted through the glass, with the colors of each being modified by the base material and coating (e.g., in the case of a coated article).
Multiple effects may be combined. For example, transmission and reflection compositing may be combined with day and night views. In this regard,
As indicated above, dynamic swatch features, involving selection of a substrate, coating, or combination thereof, may be implemented in certain example embodiments. In general, swatches are small visual examples of product aesthetics shown individually next to product performance details or in an array of different product for easier aesthetic comparison. Traditional glass swatches are sometimes provided on marketing materials and are typically created by photographing actual product in a studio, or alternatively are represented as a solid field of color based on a single color value (e.g., pantone colors, RGB/CMYK values, etc.). Glass products are often represented by two swatches, one being an isolation of the transmitted color, and the other being an isolation of the reflected color.
Certain example embodiments improve upon these techniques by creating simulated glass swatches. That is, certain example embodiments involve swatches created digitally, without the need for photographing actual products and/or the limitations of using a single color value. The glass product spectrum curve can be used to convert any solid color or any number of images (typically 1-2 will suffice) into a rendering of glass isolated transmission, isolated reflection, or a composite of both.
In certain example embodiments, the glass swatches can be static or dynamic. It sometimes can be difficult to extrapolate from static swatches what a given product’s actual aesthetic on a building will be, because glass color is dependent on (among other things) what is being reflected in the environment, and the environment conditions are subject to flux.
To help address this issue, certain example embodiments thus may use static swatches and/or dynamic swatches. With respect to the latter, for example, because certain example embodiments use swatches that are generated digitally, it becomes possible to use motion and/or video to illustrate the kind environmental flux that glass demonstrates in the field. This allows viewers to get a better idea of how a variety of colors/scenes would look in transmission, reflection, or the sum of both for any particular glazing option. Motion effects can be triggered in a variety of ways, from accelerometers and/or gyroscopes on mobile devices, to touches/clicks or swipes/drags on stationary machines, etc. Pre-recorded or “canned” videos can run constantly or initiate while scrolling through a page. Many other user interface options are also possible in different example embodiments.
It will be appreciated that the example techniques described herein may be used to enable design architects, students, specifiers, and/or others to obtain quick visual feedback concerning proposed designs, to quickly and easily contemplate different designs, and/or the like. Advantageously, instead of having to maintain a large collection of samples, users in certain example embodiments can quickly and easily see an up-to-date listing of products that can be grouped and sorted by meaningful metrics, compare predefined products, generate custom products, and order a specific sample tailored based on an intelligent selection made with use of the application running on the device. Reusable and reconfigurable IG unit products, for example, can be rapidly prepared once ordered. In certain example embodiments, the application running on the electronic computing device may be integrated with a remote ordering system, enabling users to initiate orders directly from the application once selections have been made.
Although certain example embodiments have been described as being useful for glass / window simulation, it will be appreciated that the example techniques described herein have the potential for applicability in a wide variety of different applications. For instance, a first suite of additional or alternative applications may be thought of as being “perspective sharing” applications. These applications do not necessarily require AR information to be presented along with the perspective-dependent video stream but nonetheless can benefit from taking into account different perspectives gathered by the 360 degree view camera(s). In this regard,
Another example application relates to remote presence devices. This example functionality is demonstrated in
Another example application relates to immersive panoramas. Software for photographing or stitching together panoramas has become fairly common, and 360 degree cameras are becoming more and more widespread. Many programs have enabled immersive display of these wide format images through the use of scroll swiping, gyroscope motion, and the like. Unfortunately, however, these approaches do not account for the viewer’s perspective, and they therefore do not always create a lifelike immersive experience. By deploying the example techniques disclosed herein to this use case, a much more lifelike and immersive experience can be created. For instance, one aspect of the techniques disclosed herein relates to the increase of the field of view as the user gets closer to the device. This is a powerful visual control when displaying graphically-rich social and other information. This type of effect can be created using still and moving, live or pre-recorded, photographs and video from standard aspect ratios to panorama and full 360 degree images. The perspective can shift based on the six degree of freedom movement of the user.
Still another example application relates to drone or robot piloting, which in some senses may be thought of as an extension of both the remote presence and immersive panorama applications described above. For instance, certain example embodiments enable the viewer’s perspective to serve as the basis for input to robotics, drones, and/or the like. A mounted 360 degree camera, two 180 cameras, and/or the like, provided to a robot or drone, for example, can provide a full field of view. The user could “drive” the device based on the user’s own perspective, providing input for acceleration and/or the like. That is, the example techniques disclosed herein could be used to understand what the user is looking at by comparing the perspective from the user’s device and the camera(s) provided to the drone or robot, and responding to user controls accordingly.
Compositional AR overlay applications (which may in some instances make use of content-aware graphics and metadata) also may benefit from the example technology disclosed herein. These applications involve AR information being presented on top of or otherwise in relation to a video stream or other source of image information. It will be appreciated that the perspective sharing applications noted above may benefit from an AR treatment. In any event, the technology disclosed herein can be used to improve AR games and AR gaming experiences. For instance, new games based around the qualities and features of the example embodiments described herein can be designed, benefiting by turning the device into a more realistic feeling AR overlay window into the real world, e.g., by taking into account perspective and the like.
Targeting systems also can benefit from the example techniques disclosed herein. One drawback of current AR systems when deployed on a mobile device is that there is no sightline alignment between the user, object of interest, and the device. The lack of sightline can make the experience on the device feel disconnected from the real world because the information on the device is superimposing further information on an image that is broken from the user’s own perspective. In contrast, when the further information is only shown to the user when the device breaks the direct line of sight between the user and the target object, the user’s perception is that the user’s own (personal) view is being augmented. This also relieves the need for users to look back and forth between the device perspective and their own. This technology may be seen as being used in connection with or otherwise related to the drone or robot piloting approach discussed above.
There are many potential applications for the techniques disclosed herein with respect to the arena of way-finding. Much like how many video games have a limited local area mini-map with any point of interest outside of the local area marked at the perimeter of the mini-map, certain example embodiments can indicate the location of objects/information of interest that are outside the immediate field of view of the user. These notifications could appear at the edge of the screen and could imply the type of adjustment (rotation, lateral jog, etc.) that needs to be made in order to see the objects of potential interest. This concept could be further expanded using computer vision systems, e.g., to provide scene specific movement instructions to get to the object of interest (e.g., an arrow drawn on the floor or in the air that shows the user how to navigate to a place, go around a corner that is in the field of view, etc.). In-store sales finding is a related area. Shopping experiences can be enhanced by using the example techniques disclosed herein, e.g., by directing the users’ attentions to items that might be on sale, items that go well with what they already have in their carts, items from their shopping list, etc. These notifications could light up the onsale items when the user aims their perspective down a grocery aisle.
Mixed reality (MR) document views also can benefit from the example techniques disclosed herein. For example, when deployed in a space like a grocery store, busy shopping district, or many other options, data about what objects the user sees (or even the objects the user chooses to ignore) can by compiled to improve recommendations and tailor shopping experiences, navigation, etc. This is at least somewhat similar to Internet “page views” but for real-world objects.
Certain example embodiments may make use of stereo wide (e.g., 150-180 degree) field of view cameras facing the user. In this configuration, certain example embodiments may leverage image processing techniques to create improved foreground/background motion separation. With one camera, the user may in essence mask part of the background with his/her presence, and scaling techniques (common in the field of adding perspective changing motion graphics to still images) can be used to compensate for the masking. With stereo cameras, scale changes may not always be required to create a similar effect.
Further image processing details are set forth in the Appendices attached hereto, the content of which should be considered a part of this patent filing. That is, the entire content of each of the following Appendices is incorporated herein by reference:
- Appendix A: Example Techniques for Converting Between Different Geometries
- Appendix B: Example Techniques for Dual Image Capture and Balancing
- Appendix C: Example Techniques Regarding Glazing Color Effects
- Appendix D: Example Techniques for White Balancing
- Appendix E: Example Techniques for Handling Chromatic Aberrations
- Appendix F: Example Techniques for Pan/Tilt Mode
- Appendix G: Example Techniques for Distortion Validation
- Appendix H: Example Techniques for Importing Images
- Appendix I: Example Techniques for Processing Reflection Highlights
- Appendix J: Example Techniques for Fisheye Lens Correction
- Appendix K: Example Fisheye Mathematics
- Appendix L: Example Flowcharts for Composing Image
In certain example embodiments, an electronic device is provided. The electronic device includes a user interface; and processing resources including at least one processor and a memory, with the memory storing a program executable by the processing resources to simulate a view of an image through at least one viewer-selected product that is virtually interposed between a viewer using the electronic device and the image by performing functionality comprising: acquiring the image; facilitating viewer selection of the at least one product in connection with the user interface; retrieving display properties associated with the at least one viewer-selected product; generating, for each said viewer-selected product, a filter to be applied to the acquired image based on retrieved display properties; and generating, for display via the electronic device, an output image corresponding to the generated filter(s) being applied to the acquired image.
In addition to the features of the previous paragraph, in certain example embodiments, the program may be executable to perform further functionality comprising facilitating viewer selection of the image from a store of pre-stored images; wherein the acquiring of the image may comprise retrieving the viewer-selected image from the store.
In addition to the features of either of the two previous paragraphs, in certain example embodiments, the acquiring of the image may comprise obtaining the image from a camera operably connected to the electronic device.
In addition to the features of the previous paragraph, in certain example embodiments, the camera may have a field of view of at least 150 degrees.
In addition to the features of any of the four previous paragraphs, in certain example embodiments, the acquired image may be a static image and/or a video.
In addition to the features of any of the five previous paragraphs, in certain example embodiments, the facilitating viewer selection of the at least one product may comprise enabling the viewer to select the at least one product from a plurality of possible preconfigured products.
In addition to the features of the previous paragraph, in certain example embodiments, the plurality of possible preconfigured products may include at least one coated article, at least one insulating glass (IG) unit, at least one vacuum insulating glass (VIG) unit, and/or at least one laminated product.
In addition to the features of either of the two previous paragraphs, in certain example embodiments, each said possible preconfigured product may be specified in terms of its constituent parts, e.g., with the constituent parts including possible substrate material(s), substrate thickness(es), coating(s), coating placement(s), and/or laminate material(s), as appropriate for the respective possible preconfigured products.
In addition to the features of any of the eight previous paragraphs, in certain example embodiments, the facilitating viewer selection of the at least one product may comprise enabling the viewer to configure a customized product, e.g., with the customized product being configurable in terms of constituent parts, the constituent parts potentially including possible substrate material(s), substrate thickness(es), coating(s), coating placement(s), and/or laminate material(s), as appropriate for the customized product.
In addition to the features of any of the nine previous paragraphs, in certain example embodiments, multiple products are viewer-selectable.
In addition to the features of the previous paragraph, in certain example embodiments, different filters may be generated for each said viewer-selected product, and wherein the different filters may be applied to different areas of the acquired image in generating one output image.
In addition to the features of any of the 11 previous paragraphs, in certain example embodiments, the display properties may correspond to optical properties of the at least one viewer-selected product.
In addition to the features of the previous paragraph, in certain example embodiments, the display properties may be associated with transmission, reflection, and color related optical properties of the at least one viewer-selected product.
In addition to the features of any of the 13 previous paragraphs, in certain example embodiments, the display properties may be retrieved from a database.
In addition to the features of the previous paragraph, in certain example embodiments, a communication interface may be provided, and the database may be located remote from the electronic device and accessed via the communication interface.
In addition to the features of any of the 15 previous paragraphs, in certain example embodiments, the display properties may be calculated (e.g., locally or remotely).
In addition to the features of the previous paragraph, in certain example embodiments, the calculating of the display properties may be based at least in part on characteristics of a display device to which the output image is to be provided.
In addition to the features of any of the 17 previous paragraphs, in certain example embodiments, the program may be executable to perform further functionality comprising: detecting relative movement between the electronic device and the viewer; and responsive to a detection of relative movement between the electronic device and the viewer, generating, for display via the electronic device, an updated output image reflecting the detected relative movement.
In addition to the features of the previous paragraph, in certain example embodiments, the program may be executable to perform further functionality comprising responsive to the detection of relative movement between the electronic device and the viewer: determining whether the retrieved display properties associated with the at least one viewer-selected product still apply following the relative movement; and responsive to a determination that the retrieved display properties associated with the at least one viewer-selected product no longer apply following the relative movement: retrieving updated display properties associated with the at least one viewer-selected product; generating, for each said viewer-selected product, an updated filter to be applied to the acquired image based on retrieved display properties; and generating the updated output image in connection with the updated filter(s).
In addition to the features of the previous paragraph, in certain example embodiments, the determining whether the retrieved display properties associated with the at least one viewer-selected product still apply following the relative movement may be based on a determination as to whether the movement corresponds to a change in position without an accompanying change in orientation.
In addition to the features of the previous paragraph, in certain example embodiments, one or more inertial sensors may be provided, e.g., with the one or more inertial sensors being configured to detect changes in orientation of the electronic device.
In addition to the features of any of the three previous paragraphs, in certain example embodiments, the relative movement may correspond to movement of the electronic device (e.g., a change in position and/or orientation thereof), movement of the viewer (e.g., a change in the viewer’s position), a shift of the viewer’s gaze, and/or the like. In addition to the features of any of the three previous paragraphs, in certain example embodiments, at least one user-facing camera may be provided, e.g., with the user-facing camera being configured to provide a signal that is processable by the program to perform eye and/or face tracking in connection with a determination as to whether there has been a shift in the viewer’s gaze.
In addition to the features of the previous paragraph, in certain example embodiments, the eye and/or face tracking may be performable while the device and/or viewer is/are moving.
In addition to the features of any of the five previous paragraphs, in certain example embodiments, the updated display properties may be associated with off-axis transmission, reflection, and color related optical properties of the at least one viewer-selected product.
In addition to the features of any of the 24 previous paragraphs, in certain example embodiments, a display device via which the output image is to be displayed may be provided.
In addition to the features of any of the 25 previous paragraphs, in certain example embodiments, at least one camera having a field of view of at least 150 degrees may be operably connected to the electronic device.
In addition to the features of any of the 26 previous paragraphs, in certain example embodiments, first and second cameras generally oriented towards the viewer and away from the viewer, respectively, may be provided.
In addition to the features of any of the 27 previous paragraphs, in certain example embodiments, the electronic device may be a smartphone or tablet.
In certain example embodiments, a method of simulating a view of an image through at least one viewer-selected product that is virtually interposed between a viewer using an electronic device and the image is provided. The electronic device includes processing resources including at least one processor and a memory. The method comprises: acquiring the image; facilitating viewer selection of the at least one product in connection with a user interface running on the electronic device; retrieving display properties associated with the at least one viewer-selected product; generating, for each said viewer-selected product, a filter to be applied to the acquired image based on retrieved display properties; and generating, for display via the electronic device, an output image corresponding to the generated filter(s) being applied to the acquired image.
In addition to the features of the previous paragraph, in certain example embodiments, viewer selection of the image from a store of pre-stored images may be facilitated; wherein the acquiring of the image may comprise retrieving the viewer-selected image from the store.
In addition to the features of either of the two previous paragraphs, in certain example embodiments, the acquiring of the image may comprise obtaining the image from a camera operably connected to the electronic device.
In addition to the features of the previous paragraph, in certain example embodiments, the camera may have a field of view of at least 150 degrees.
In addition to the features of any of the four previous paragraphs, in certain example embodiments, the acquired image may be a static image and/or a video.
In addition to the features of any of the five previous paragraphs, in certain example embodiments, the facilitating viewer selection of the at least one product may comprise enabling the viewer to select the at least one product from a plurality of possible preconfigured products.
In addition to the features of the previous paragraph, in certain example embodiments, the plurality of possible preconfigured products may include at least one coated article, at least one insulating glass (IG) unit, at least one vacuum insulating glass (VIG) unit, and/or at least one laminated product.
In addition to the features of either of the two previous paragraphs, in certain example embodiments, each said possible preconfigured product may be specified in terms of its constituent parts, e.g., with the constituent parts including possible substrate material(s), substrate thickness(es), coating(s), coating placement(s), and/or laminate material(s), as appropriate for the respective possible preconfigured products.
In addition to the features of any of the eight previous paragraphs, in certain example embodiments, the facilitating viewer selection of the at least one product may comprise enabling the viewer to configure a customized product, e.g., with the customized product being configurable in terms of constituent parts, the constituent parts potentially including possible substrate material(s), substrate thickness(es), coating(s), coating placement(s), and/or laminate material(s), as appropriate for the customized product.
In addition to the features of any of the nine previous paragraphs, in certain example embodiments, multiple products are viewer-selectable.
In addition to the features of the previous paragraph, in certain example embodiments, different filters may be generated for each said viewer-selected product, and wherein the different filters may be applied to different areas of the acquired image in generating one output image.
In addition to the features of any of the 11 previous paragraphs, in certain example embodiments, the display properties may correspond to optical properties of the at least one viewer-selected product.
In addition to the features of the previous paragraph, in certain example embodiments, the display properties may be associated with transmission, reflection, and color related optical properties of the at least one viewer-selected product.
In addition to the features of any of the 13 previous paragraphs, in certain example embodiments, the display properties may be retrieved from a database.
In addition to the features of the previous paragraph, in certain example embodiments, a communication interface may be provided, and the database may be located remote from the electronic device and accessed via the communication interface.
In addition to the features of any of the 15 previous paragraphs, in certain example embodiments, the display properties may be calculated (e.g., locally or remotely).
In addition to the features of the previous paragraph, in certain example embodiments, the calculating of the display properties may be based at least in part on characteristics of a display device to which the output image is to be provided.
In addition to the features of any of the 17 previous paragraphs, in certain example embodiments, relative movement between the electronic device and the viewer may be detected; and responsive to a detection of relative movement between the electronic device and the viewer, generating, for display via the electronic device, an updated output image reflecting the detected relative movement.
In addition to the features of the previous paragraph, in certain example embodiments, responsive to the detection of relative movement between the electronic device and the viewer: a determination may be made as to whether the retrieved display properties associated with the at least one viewer-selected product still apply following the relative movement; and responsive to a determination that the retrieved display properties associated with the at least one viewer-selected product no longer apply following the relative movement: additional functionality may comprise retrieving updated display properties associated with the at least one viewer-selected product; generating, for each said viewer-selected product, an updated filter to be applied to the acquired image based on retrieved display properties; and generating the updated output image in connection with the updated filter(s).
In addition to the features of the previous paragraph, in certain example embodiments, the determining whether the retrieved display properties associated with the at least one viewer-selected product still apply following the relative movement may be based on a determination as to whether the movement corresponds to a change in position without an accompanying change in orientation.
In addition to the features of the previous paragraph, in certain example embodiments, one or more inertial sensors may be provided, e.g., with the one or more inertial sensors being configured to detect changes in orientation of the electronic device.
In addition to the features of any of the three previous paragraphs, in certain example embodiments, the relative movement may correspond to movement of the electronic device (e.g., a change in position and/or orientation thereof), movement of the viewer (e.g., a change in the viewer’s position), a shift of the viewer’s gaze, and/or the like. In addition to the features of any of the three previous paragraphs, in certain example embodiments, at least one user-facing camera may be provided, e.g., with the user-facing camera being configured to provide a signal that is processable to perform eye and/or face tracking in connection with a determination as to whether there has been a shift in the viewer’s gaze.
In addition to the features of the previous paragraph, in certain example embodiments, the eye and/or face tracking may be performable while the device and/or viewer is/are moving.
In addition to the features of any of the five previous paragraphs, in certain example embodiments, the updated display properties may be associated with off-axis transmission, reflection, and color related optical properties of the at least one viewer-selected product.
In addition to the features of any of the 24 previous paragraphs, in certain example embodiments, a display device via which the output image is to be displayed may be provided.
In addition to the features of any of the 25 previous paragraphs, in certain example embodiments, at least one camera having a field of view of at least 150 degrees may be operably connected to the electronic device.
In addition to the features of any of the 26 previous paragraphs, in certain example embodiments, first and second cameras generally oriented towards the viewer and away from the viewer, respectively, may be provided.
In addition to the features of any of the 27 previous paragraphs, in certain example embodiments, the electronic device may be a smartphone or tablet.
In certain example embodiments, there is provided a non-transitory computer readable storage medium tangibly storing a program that, when executed by a processor of a computing device, performs the method of any one of 28 preceding claims.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the claims.
Appendix A: Example Techniques for Converting Between Different GeometrisFor the purposes of this example, the descriptions that follow assume the host iPad is in the “Portrait” orientation. It will be appreciated that these example techniques may be used together with, or in place of, the examples provided above.
I. Introduction A. ConstantsThe host device is a 2017 iPad Pro 10.5” (A1701 or A1709). p = 264, the pixels-per-inch of the screen.
With respect to the camera, v = π, the field of view of the fisheye camera (assumed uniform).
I. B. Spaces & Coordinate Systems 1. Full Equirectangular Image SpaceFull equirectangular image space is an ω pixel wide by η pixel high rectangle, where ω and η satisfy ω : η = 2 : 1. This space is used to represent 360° by 180° scenes captured from a single point of view.
With respect to Cartesian coordinates, in this coordinate system, full equirectangular image space is parameterized by Cartesian coordinates (s, t), where: (0, 0) is located at the bottom left of the space, and (s, t) = (ω, η) is located at the top right of the space. See
In terms of conversions, a point (s0, t0) in full equirectangular image space (Cartesian coordinates) can be converted to full equirectangular image space (geographic coordinates) using the formulae:
With respect to a geographic coordinate system, the full equirectangular image space is parameterized by Cartesian coordinates (α, β), where α represents latitude and varies between -π/2 and π/2, and β represents longitude and varies between -π/2 and 3π/2. See
In terms of conversions, a point (a0, ,β0) in full equirectangular image space (geographic coordinates) can be converted to full equirectangular image space (cartesian coordinates) using the formulae:
A point (a0, ,β0) in full equirectangular image space (geographic coordinates) can be projected to full world space (360-cam-centered geographic coordinates) using the formulae Ao = a0; Bo = β0; and C0 = arbitrary.
2. Half Equirectangular Image SpaceHalf equirectangular image space is an η pixel wide by η pixel high square. This space is used to represent 180° by 180° scenes captured from a single point of view.
With respect to Cartesian coordinates, in this coordinate system, half equirectangular image space is parameterized by Cartesian coordinates (s, t), where (0, 0) is located at the bottom left of the space, and (s, t) = (η, η) is located at the top right of the space. See
With respect to geographic coordinates, half equirectangular image space is parameterized by Cartesian coordinates (a0, ,β0) where a represents latitude and varies between -π/2 and π/2, and β represents longitude and varies between -π/2 and π/2. See
A point (a0, ,β0) in half equirectangular image space (geographic coordinates) can be converted to half equirectangular image space (Cartesian coordinates) using the formulae:
Fisheye image space is a d pixel by d pixel square. This space is used to represent 180° by 180° scenes captured from a single point of view. It is assumed that the captured scene lies wholly within the central disc of diameter d, and the captured scene is upright.
With respect to Cartesian coordinates, in this coordinate system, fisheye image space is parameterized by Cartesian coordinates (u, v) where (0, 0) is located at the bottom left of the space, and (u, v) = (d, d) is located at the top right of the space. See
A point (u0, ν0) in fisheye image space (cartesian coordinates) can be converted to fisheye image space (polar coordinates) using the formulae:
With respect to polar coordinates, in this coordinate system, fisheye image space is parameterized by polar coordinates (r, θ), where r varies between and d/2; and θ is measured in radians and varies between -π and π, and where 0= π/2 points directly up. See
A point (r0, θ0) in fisheye image space (polar coordinates) can be converted to fisheye image space (Cartesian coordinates) using the formulae: u0 = d/2 + r0 cos(θ0) and ν0 = d/2 + r0 sin(θ0).
A point (r0, θ0) in fisheye image space (polar coordinates) can be converted to half world space (cam-centered optical coordinates) using the formulae:
Display image space is a w pixel wide by h pixel high rectangle, where w and h satisfy w:h = 2:3 to match the aspect ratio of an iPad screen. This space is used to represent images that will be displayed on an iPad screen.
With respect to Cartesian coordinates, in this coordinate system, display image space is parameterized by Cartesian coordinates (x, y), where (0, 0) is located at the bottom left of the rectangle, (x, y) = (w, h) is located at the top right of the rectangle, and the output image displayed in this space is assumed to be upright. See
A point (x0, y0) in display image space (Cartesian coordinates) can be converted to full world space (screen-centered Cartesian coordinates) using the formulae: X0 = (x0 - w/2)/p; Y0 = (y0 - A/2)/p; Z0 = 0.
5. Full World SpaceThe full world space is a three-dimensional space representing the physical world. All lengths are given in inches in this example.
With respect to screen-centered Cartesian coordinates, in this coordinate system, full world space is parameterized by Cartesian coordinates, where: X, Y, and Z all lie in the range (-∞,∞); (0, 0, 0) is located at the center of the iPad’s screen; the X-Y plane is coplanar with the iPad’s screen, with the X-axis parallel to the short side of the iPad and increasing from left to right as viewed from the front, and the Y-axis parallel to the long side of the iPad and increasing from bottom to top with this orientation; and the Z-axis is perpendicular to the iPad’s screen, with the half-space {Z > 0} lying wholly in front of the iPad. See
The coordinate system used by the iOS framework to express physical device attitude (pitch, roll, and yaw) is almost identical to screen-centered Cartesian coordinates. See
The iPad’s front camera is assumed to be located at Xf= (Xf, Yf Zf) = (0, 4.57, 0) in full world space (screen-centered Cartesian coordinates). The iPad’s back camera is assumed to be located at Xb = (Xb, Yb, Zb) = (3.07, 4.55, 0) in full world space (screen-centered Cartesian coordinates).
A point (X0, Y0, Z0) in full world space (screen-centered Cartesian coordinates) can be converted to half world space (back-cam-centered Cartesian coordinates) using the formulae: I0 = X0 - Xb; J0 = -Z0; and K0 = Y0 - Yb.
A point (X0, Y0, Z0) in full world space (screen-centered Cartesian coordinates) can be converted to half world space front-cam-centered Cartesian coordinates) using the formulae: I0 = -X0; J0 = Z0; and K0 = Y0 - Yf
A point (X0, Y0, Z0) in full world space (screen-centered Cartesian coordinates) can be converted to full world space (screen-centered Cartesian coordinates) using the formulae:
With respect to 360-cam-centered geographic coordinates, in this coordinate system, full world space is parameterized by geographic coordinates (A, B, C), where A and B are both measured in radians; A represents latitude and varies between -π/2 and π/2; B represents longitude and varies between -π//2 and 3π/2; and C lies in the range [0, (∞). See
A point (A0, B0, C0) in full world space (360-cam-centered geographic coordinates) can be collapsed to a point (a0, β0) in full equirectangular image space (geographic coordinates) using the formulae a0 = A0 and β0 = B0.
A point (A0, B0, C0) in full world space (360-cam-centered geographic coordinates) can be converted to full world space (screen-centered Cartesian coordinates) using the formulae: X0 = cos(A0) sin(B0); Y0 = sin(A0); and Z0 = -cos(A0) cos(B0).
6. Half World SpaceThe half world space is a three-dimensional half-space representing half the physical world. All lengths are given in inches in this example.
With respect to (front/back) cam-centered Cartesian coordinates, in this coordinate system, half world space is parameterized by Cartesian coordinates (I, J, K), where I and K are in the range (-∞, (∞), and J lies in the range [0, ∞) and the camera is located at (0, 0, 0). From the perspective of the camera, (1, 0, 0) lies directly to the right of the camera, (0, 1, 0) lies directly to the right of the camera, and (0, 0, 1,) lies directly above the camera. In these regards, see
A point (I0, J0, K0) in half world space (cam-centered Cartesian coordinates) can be converted to half world space (cam-centered optical coordinates) using the formulae:
A point (I0, .lo, K0) in half world space (front-cam-centered Cartesian coordinates) can be converted to full world space (screen-centered Cartesian coordinates) using the formulae: X0 = -I0; Y0 = K0 + Yf= K0 + 4.57; and Z0 = J0.
A point (I0, J0, K0) in half world space (cam-centered cartesian coordinates) can be converted to half world space (cam-centered geometric coordinates) using the formulae:
With respect to (front/back) cam-centered optical coordinates, in this coordinate system, half world space is parameterized by optical coordinates (Θ, ϕ, Γ), where Θ and ϕ are both measured in radians; Θ lies in the range [-Π, Π], where the line Θ = 0 lies directly to the right of the camera; ϕ lies in the range [0, Π/2], where the line ϕ = 0 corresponds to the camera’s optical axis and {ϕ = π/2} is coplanar with the iPad; and Γ lies in the range [0, ∞). See
A point (Θ0, ϕ0, Γ0) in half world space (cam-centered optical coordinates) can be collapsed to fisheye image space (polar coordinates) using the formulae r0 = dϕ0/v = dϕ0/π and θ0 = Θ0.
A point (Θ0, ϕ0, Γ0) in half world space (cam-centered optical coordinates) can be converted to half world space (cam-centered Cartesian coordinates) using the formulae: I0 = Γ0 sin(ϕ0) cos(Θ0); J0 = Γ0 cos(ϕ0); and K0 = Γ0 sin(ϕ0) sin(Θ0).
With respect to cam-centered geographic coordinates, in this coordinate system, half world space is parameterized by geographic coordinates, where: A and B are both measured in radians; A represents latitude and varies between -π/2 and π/2; B represents longitude and varies between -π/2 and π/2; and C lies in the range [0, ∞). See
A point (A0, B0, C0) in half world space (cam-centered geographic coordinates) can be collapsed to half equirectangular image space (geographic coordinates) using the formulae a0 = A0 and β0 = B0.
II. Dual Dynamic Fisheye to User-Perspective-Adjusted A. Estimate Face LocationThis technique may be used - given eye locations (u1, v1) and (u2, v2) in front fisheye image space (Cartesian coordinates), and S = 2.5 (the assumed physical separation (in inches) of two eyes) - to compute the user’s average eye location Xa = (Ua, Va, Wa) (Wa > 0) in full world space (screen-centered Cartesian coordinates). To do this, certain example embodiments may:
Step 1. Convert the eye positions (u1, v1) and (u2, v2) rom fisheye image space (Cartesian coordinates) to fisheye image space (polar coordinates):
Step 2. Project into half world space (front-cam-centered optical coordinates). Radius is arbitrary at this step so fix Γ1 = Γ2 = 1 without loss of generality:
Step 3. Convert to half world space (front-cam-centered Cartesian coordinates) (note: the next step assumes input vectors are normalized, which these are):
- I1 = sin(ϕ1) cos(ϕ1)
- J1 = cos( ϕ1 )
- K1 = sin(ϕ1) sin(Θ1)
- I2 = sin(ϕ2) cos(Θ2)
- J2 = cos(ϕ2)
- K2 = sin(ϕ2) sin(Θ2)
Step 4. Compute the central angle δ between two points on a sphere using a well-conditioned vector formula: δ = arctan(|1 x I2| , I1 • I2).
Step 5. Given the angle δ, estimate the distance D in inches between the user and the front camera: D = (S/2) / tan(δ / 2). Great circle distance (as opposed to linear distance) may be accounted for to improve accuracy as the user approaches the camera.
Step 6. Compute the average eye location in fisheye image space (Cartesian coordinates):
Step 7. Convert to fisheye image space (polar coordinates):
Step 8. Project into half world space (front-cam-centered optical coordinates):
It will be appreciated that it may not be appropriate to simply average Θ1,2 and Θ1,2 to obtain Θa and ϕa because the transformation from fisheye image space to half world space is nonlinear.
Step 9. Convert to half world space (front-cam-centered Cartesian coordinates):
Step 10. Convert to full world space (screen-centered Cartesian coordinates):
This technique may be used - given a target point (x0, y0) in display image space (Cartesian coordinates), and the user’s eye location Xu = (Xu, Yu, Zu) (Zu > 0) in full world space (screen-centered Cartesian coordinates) - to compute the source point (u0, ν0) in rear fisheye image space (Cartesian coordinates) that should be displayed at the target point. To do this, certain example embodiments may:
Step 1. Convert the target point to full world space (screen-centered Cartesian coordinates):
Let X0 = (X0, Y0, 0) to denote it.
Step 2. The equation of the line in full world space (screen-centered Cartesian coordinates) passing through the target point X0 and user’s eyes Xu is then X = X0 + λD (1), where D = (X0 -Xu) / | X0 - Xu | is a vector of length 1 pointing from the user’s eyes Xu to the target point X0 and λ is a real number that represents distance traveled in the direction of this vector starting from X0.
Step 3. It would be desirable to display the correct portion of the half-space {Z < 0} according to the user’s perspective. However, the image of that half-space is captured from the back camera’s perspective. To attempt to allow correction for this perspective offset, imagine (normally) collapsing the half-space {Z < 0} onto the hemisphere {Z < 0, |X - Xb|2 = R2} of radius R centered at the back camera location Xb. Then, either (1) pick R according to some heuristic, of (2) allow users to calibrate the value of R manually.
Step 4. To find the point on the hemisphere that should be displayed at X0, imagine extending the line (1), tracing the user’s line of sight until it intersects with the hemisphere. This intersection point is given by one root of the equation:
When R >> |Xb - Xu|, |X0 - Xb|, the solutions of this equation are approximately ±R; the same solutions that would be expected if ignoring the perspective offset altogether. To determine whether to pick the positive or negative root in (5), consider the Z component of X0 + µ0E, which is µ0 W/|X0 - Xu|. This should be positive, as the intersection point that lies behind the iPad should be picked rather than the intersection point in front of the iPad, which is possible when λ0 > 0. The positive root in (2) therefore is picked.
See
Step 5. The intersection point Xi = X0 + λ0D in full world space (screen-centered Cartesian coordinates) is now known. It should be converted to half world space (back-cam-centered Cartesian coordinates): I0 =Xi - 3.07; J0 = -Zi and K0 = Yi - 4.55.
Step 6. Convert to half world space (back-cam-centered optical coordinates):
Step 7. Collapse to fisheye image space (polar coordinates): r0 = dϕ0/v = dϕ0/π and θ0 = Θ0.
Step 8. Convert to fisheye image space (Cartesian coordinates): u0 = d/2 + r cos(θ0) and vo = d/2 + r sin(θ0).
C. Front Image WarpingThis technique may be used - given a target point (x0, y0) in display image space (Cartesian coordinates), and the user’s eye location Xu = (Xu, Yu, Zu) (Zu > 0) in full world space (screen-centered Cartesian coordinates) - to compute the source point (u0, ν0) in front fisheye image space (Cartesian coordinates) that should be displayed at the target point. To do this, certain example embodiments may:
Step 1. Convert the target point to full world space (screen-centered Cartesian coordinates):
Let X0 = (X0, Y0, 0) to denote it.
Step 2. The equation of the line in full world space (screen-centered Cartesian coordinates) passing through the target point X0 and user’s eyes Xu is then X = X0 + λD (3), where D = (X0 - Xu) / |X0 -Xu | is a vector of length 1 pointing from the user’s eyes Xu to the target point X0 and λ is a real number that represents distance traveled in the direction of this vector starting from X0.
Step 3. The equation of the line in full world space (screen-centered Cartesian coordinates) representing the incident ray that will be reflected along the line (3) is then X =X0 + µE (4), where E is a reflected vector of length 1 pointing away from the target point X0, and µ is a real number that represents distance traveled along this reflected vector starting from X0. Here, because reflection occurs in the plane {Z =0}, E can be obtained from D by switching the sign of the Z-component.
Step 4. It would be desirable to display the correct portion of the half-space {Z > 0} according to the user’s perspective. However, the image of that half-space is captured from the front camera’s perspective. To attempt to allow correction for this perspective offset, imagine (normally) collapsing the half-space {Z > 0} onto the hemisphere {Z > 0, |X - Xf|2 = R2} of radius R centered at the front camera location Xf Then, either (1) pick R according to some heuristic, of (2) allow users to calibrate the value of R manually.
Step 5. To find the point on the hemisphere that should be displayed at X0, imagine extending the line (4), tracing the user’s reflected line of sight until it intersects with the hemisphere. This intersection point is given by one root of the equation:
When R >> |Xf- Xu|, |X0 - Xf|, the solutions of this equation are approximately ±R; the same solutions that would be expected if ignoring the perspective offset altogether. To determine whether to pick the positive or negative root in (2), consider the Z component of X0 + λ0D, which is -λ0W/|X0 - Xu|. This should be negative, as the intersection point that lies in front of the iPad should be picked rather than the intersection point behind the iPad, which is possible when µ0 > 0. The positive root in (5) therefore is picked.
Step 6. The intersection point Xi = X0 + µ0E in full world space (screen-centered Cartesian coordinates) is now known. It should be converted to half world space (front-cam-centered Cartesian coordinates): I0 = -Xi; J0 = Zi, and K0 = Yi - 4.57.
Step 6. Convert to half world space (front-cam-centered optical coordinates):
Step 7. Collapse to fisheye image space (polar coordinates): r0 = dϕ0/v = dϕ0/π and θ0 = Θ0.
Step 8. Convert to fisheye image space (Cartesian coordinates): u0 = d/2 + r cos(θ0) and vo = d/2 + r sin(θ0).
III. Single Static Equirectangular to Device-Attitude-Adjusted A. Equirectangular Image WarpingHere, a high-level goal may be thought of as - given a single equirectangular image representing a 360° by 180° scene, where the left half of the image represents the 180° by 180° scene captured by the front-facing portion of a 360° camera and the right half of the image represents the 180° by 180° scene captured by the rear-facing portion of a 360° camera, and information about the current device attitude relative to the reference attitude at which the equirectangular image was captured - construct the equirectangular image representing the original 360° by 180° scene as it would have been captured from the same position and the current device attitude. From an implementation perspective, this may be thought of as - given a target point (s0, t0) in full equirectangular image space (Cartesian coordinates), and a CMAttitude instance describing relative device attitude - compute the source point (s1, t1) in full equirectangular image space (Cartesian coordinates) that should be displayed at the target point. (CMAttitude according to Apple developer guidelines represents the device’s orientation relative to a known frame of reference at a point in time.) This may be accomplished in certain example embodiments by:
Step 1. Convert to full equirectangular image space (geographic coordinates):
Step 2. Project to full world space (360-cam-centered geographic coordinates). Radius is arbitrary (see step 5), so fix C0 = 1 without loss of generality: Ao = a0 and Bo = β0.
Step 3. Convert to full world space (screen-centered Cartesian coordinates): X0 = cos(A0) sin(B0); Y0 = sin(A0); and Z0 = -cos(A0) cos(B0).
Step 4. Given a CMAttitude instance that expresses current device attitude relative to a reference attitude, we can obtain a corresponding rotation matrix R. (A rotation matrix in linear algebra describes the rotation of a body in three-dimensional Euclidean space.) Values in R are to be interpreted in device-centered (equiv. screen-centered) Cartesian coordinates. Through experimentation, it has been determined that:
- A pure relative pitch of Ψ radians produces the rotation matrix
-
- A pure relative roll of Ψ radians produces the rotation matrix
-
- A pure relative yaw of Ψ radians produces the rotation matrix
-
Because a source point for a given target point is being sought, the inverse operation is performed. In other words, given a point with position X0 in device-centered coordinates after device rotation, determine the position of that point in device-centered coordinates before rotation. This can be achieved by applying the inverse rotation matrix: X1 = R-1X0. Because R is a rotation matrix, this is equivalent to X1 = RTX0.
Step 5. Convert to full world space (360-cam-centered geographic coordinates): A1 = arcsin(Y1); B1 = arctan(X1, -Z1); and C1 is irrelevant here.
Step 6. Collapse to full equirectangular image space (geographic coordinates): a1 =A1 and β1 = B1.
Step 7. Convert to full equirectangular image space (Cartesian coordinates):
Here, a high-level goal may be thought of as - given a single equirectangular image representing a 360° by 180° scene, where the η pixel by η pixel left half of the image represents the 180° by 180° scene captured by a front-facing camera assumed to have been located at the center of the screen, and the η pixel by η pixel right half of the image represents the 180° by 180° scene captured by a rear-facing camera assumed to have been located at the center of the screen - construct two fisheye images representing the two 180° by 180° scenes in the left (front-facing) and right (rear-facing) halves of the equirectangular image. From an implementation perspective, this may be thought of as - given a target point (x0, y0) in fisheye image space (Cartesian coordinates) - compute the source point (s0, t0) in half equirectangular image space (cartesian coordinates) that should be displayed at the target point. This may be accomplished in certain example embodiments as follows. It will be appreciated that the steps that follow refer to the front scene/fisheye image, but the same steps and conversions work for the rear scene/fisheye image.
Step 1. Convert to fisheye image space (polar coordinates):
Step 2. Project into half world space (front-cam-centered optical coordinates). Radius is arbitrary (see step 5), so this can be fixed without loss of generality: Θ0 = θ0 and ϕ0 = vro/η = πr0/r/η.
Step 3. Convert to half world space (front-cam-centered Cartesian coordinates): I0 = sin(ϕ0) cos(Θ0); J0 = cos(Θ0); and K0 = sin(ϕ0) sin(Θ0).
Step 4. Convert to half world space (front-cam-centered geographic coordinates):
Step 5. Collapse to half equirectangular image space (geographic coordinates): a0 = A0 and (30 = B0.
Step 6. Convert to half equirectangular image space (cartesian coordinates):
The following description provides example techniques for transmitted / reflected image capture. For instance, the following description outlines steps that may be followed in a dynamic experience to capture separate front and rear input images, as if they were taken by the same camera subject to uniform exposure and white balance adjustments.
Step 1: Allow the user to capture a rear fisheye image using auto white balance and auto exposure.
Step 2: Configure the front camera using the metadata from the rear capture. The front capture duration is set to exactly match the rear capture duration. With some hardware, the front capture aperture is fixed at f/2.2; this cannot be balanced with the rear capture aperture, which is always f/1.8. To compensate for the forced difference in aperture, the ISO used for rear captures may be adjusted by a factor of (2.2/1.8)2. The square accounts for the fact that f-number is linearly related to aperture diameter, but sensor illuminance is linearly related to aperture area. This compensation is valid independent of lens focal length differences, by design of the f-number scale.
In this regard, a 100 mm focal length f/4 lens has an entrance pupil diameter of 25 mm. A 200 mm focal length f/4 lens has an entrance pupil diameter of 50 mm. The 200 mm lens’ entrance pupil has four times the area of the 100 mm lens’ entrance pupil, and thus collects four times as much light from each object in the lens’ field of view. But compared to the 100 mm lens, the 200 mm lens projects an image of each object twice as high and twice as wide, covering four times the area, and so both lenses produce the same illuminance at the focal plane when imaging a scene of a given luminance.
The front camera white balance (per-channel multiplicative) gains are set to match the rear camera white balance gains, with the exception of the red channel gain which is multiplied by a factor of about 1.82886/2.30566. This factor was determined experimentally by photographing a single scene under the same illumination with both front and rear cameras, and comparing the white balance gains computed by auto white balance adjustment.
Appendix C: Example Techniques Regarding Glazing Color EffectsThe description that follows describes an example technique for adjusting RGB image color based on glazing spectral data. In general, the problem to be solved has the following form:
where:
- k E {R, G, B }
- Ix,k represents the k-th channel pixel intensity corresponding to a point x in the scene
- L(λ) represents the spectral power distribution of the scene illuminant,
- Rx (λ) represents the spectral reflectance of a point x in the scene, and
- Ck(λ) represents the capture system spectral sensitivity in the k-th channel
where G(λ) represents the spectral attentuation due to a glazing sample.
In general, L, R, and Ck, and are unknown. Values for G(λ) are available in the IGDB.
Assumptions may be made to help make this problem well-posed, e.g., to help compensate for the fact that integration discards information about the form of the integrand’s components. Known capture system sensitivity and uniform scene luminance, for example, may be taken into account in this regard.
for example, if it is assumed that L(λ)Rx(λ) = Bx throughout a lit scene - such that the scene radiance is constant across all wavelengths for each point x in the scene, and such that the capture system spectral sensitivities Gk(λ) are known -then:
and
so that
This can be viewed as loosely analogous to the “gray world theory” used in white balancing algorithms, in which it is assumed that the scene “should” be 18% gray on average.
Appendix D: Example Techniques for White BalancingWhite balancing aims to solve the following problem: Given per-pixel generic RGB measurements of a scene, where those measurements are influenced by spectral properties/sensitivities of (at least) the scene, the illuminant(s), the camera lenses and coatings, the camera IR filter, the camera Bayer filter, and the camera sensor photosites themselves, compute the sRGB values that would have been observed by a human if the scene was instead illuminated by a known and relatively-spectrally-neutral light.
Solving this problem compensates for the fact that a human viewing the original scene under the original illumination would enjoy the benefits of color constancy when interpreting colors in the scene, but that same human viewing a photographic representation of the original scene under the original illumination would not enjoy those same benefits (because color constancy relies heavily on contextual cues that are missing when viewing the photographic representation, e.g. “ambient light”).
This problem is complicated at least because:
It is impossible to reverse information lost due to spectrally-limited light. In extreme cases, two spectrally-distinct materials subjected to unfortunate lighting may be recorded as identical generic RGB values. No amount of post-capture correction can recover this lost information; localized inference and restoration is required.
The equivalence classes of colors identifiable by a camera and the equivalence classes of colors identifiable by humans are not identical. More plainly, there are some colors that a camera can distinguish that a human cannot, and vice versa. While this property of cameras does not directly impact white balancing, it could skew results if unacknowledged. This is reflected in the Sensitivity Metamerism Index for example.
Most solutions take a black box approach. That is, rather than attempting to reason in the spectral domain using knowledge of the spectral properties of camera components, all reasoning occurs in three-dimensional color spaces and the focus is on neutral output. This is a practical choice, since scene spectral data is almost never available.
There are a number of algorithms that may be used in connection with certain example embodiments. ColorChecker Correction is a first example. When many areas of known spectral properties are captured in the photographed scene (e.g. a ColorChecker chart with 24 swatches), it may be posited that the “actual” R/G/B at a given pixel (as would be observed under a more neutral illuminant) are functions of the observed R/G/B values:
The cross terms that mix observed color channels are usually unimportant, and higher powers typically fail to improve accuracy.)
Using 24 sets of 3 observed values, it is possible to write a system of 3 sets of 24 equations, each for 10 unknowns, then approximately solve for those 10 unknowns using linear least squares. It is possible for this approximation method to fail if matrices are ill-conditioned (e.g., when known-distinct ColorChecker colors appear identical to the observation device under the original illuminant).
Another algorithm approach that may be used is neutral patch correction. In this regard, if a known-neutral patch is captured in the photographed scene (e.g., a white balance card), it may be posited that the “actual” R/G/B at a given pixel (as would be observed under a more neutral illuminant) are linear functions of the observed R/G/B values:
Here, ρ, γ, and β match the per-channel “white balance gains” as reported. Because the patch measured is known to be neutral, Ractual = Gactual = Bactual = λ. When scaled appropriately, these equations then become:
It is sometimes desirable to choose to avoid blowing or tinting highlights.
Yet another algorithm that may be used in connection with certain example embodiments is neutral estimate correction. If no known-neutral patch is captured in the photographed scene, a region of neutrality is guessed. This may be achieved using a gray world model (averaging the entire image), retinex theory (looking at the coloration of near-highlights), or similar. Then the steps from the neutral patch correction technique are repeated.
Additional information is provided in, for example:
- http://www.odelama.com/photo/Developing-a-RAW-Photo-by-hand/
- http://www.odelama.com/photo/Developing-a-RAW-Photo-by-hand/Developing-a-RAW-Photo-byhand_Part-2
- https://www.dxomark.com/About/In-depth-measurements/Measurements/Color-sensitivity
- http://therefractedlight.blogspot.com/2011/09/white-balance-part-2-gray-world.html
- https://Iwww.rawdigger.com/howtouse/color-is-a-slippery-trickster
- https: //en. wikipedia. org/wiki/Colortemperature#Digitalphotography
- https://en.wikipedia.org/wiki/Color_balance
- https://en.wikipedia.org/wiki/White_point
- https: //www.mathworks.com/help/images/examples/comparison-of-auto-white-balance-algorithms.html
- https: //www. adobe. com/digitalimag/pdfs/understanding_digitalrawcapture. pdf
There are two types of chromatic aberration (CA): lateral and longitudinal. Image correction techniques for both involve estimating a distortion map based on the geometries of the three RGB channels captured in a RAW image. For longitudinal CA, the distortion is generally radial. For lateral CA, the distortion is generally modeled via some polynomial function with unknown coefficients. Landmark (edge) detection is then run separately on the three color channels and used to estimate unknown scaling/parameters via least squares.
These fixes are agnostic of all upstream capture technology and processing and thus need not be used in connection with certain example embodiments that relate to color modeling.
Appendix F: Example Techniques for Pan/Tilt ModeThe description that follows explains example computations that may be used if the device is used in a pan/tilt mode. Pan is the direction of the vector from the user’s eyes (assumed to be fixed in space) to the center of the device, and tilt is the angle formed between the vector from the user’s eyes to the center of the device and the vector normal to the device’s screen.
In general, it is difficult to simultaneously infer pan and tilt from device attitude changes alone, since there is no reliable way to distinguish between (for example) a change in device roll due to the user rotating the device sideways about its center and a change in device roll due to the user translating the device sideways around themselves. However, if it is known whether the user is currently tilting or panning, it is possible to interpret observed changes in device attitude appropriately. The description that follows describes how this may be accomplished.
When dealing with iPads, for example, the total relative device attitude dictates which part of the static scene is in front of the iPad and which part of the static scene is behind the iPad. This is true whether the total relative device attitude was arrived at through panning the device, tilting the device, or some combination of the two. Furthermore, tilting the app influences the position of the user’s eyes in screen-centered coordinates, whereas panning does not.
Certain example embodiments therefore receive as inputs the total relative device attitude, to compute the front and rear scenes from our static image; and the net tilt-only relative device attitude (e.g., the portion of the total relative device attitude due to movements made while in tilt mode), to compute the net user position in screen-centered coordinates.
The former is easily computed for each device motion update by comparing to the current center reference attitude, which changes infrequently. The latter can be more complicated because a subset of device attitude changes may need to be processed. The following may be implemented:
Step 1. Keep a reference to the last total relative device attitude received (so that it becomes possible to compute the change in device attitude for each frame).
Step 2. Keep a reference to a net tilt matrix that matches the net tilt-only relative device attitude (as a CMAttitude instance may not be usable to track this information because instances may not be modified or newly constructed).
Step 3. When a new total relative device attitude is in tilt mode, compute the change in device attitude since the last frame; convert this device attitude change to a tilt increment matrix; and update the net tilt matrix by multiplying it by the tilt increment matrix.
Appendix G: Example Techniques for Distortion ValidationThe description that follows helps demonstrate how certain example embodiments can compute global transformation functions for certain limited device movements in order to allow validation of the fully-general transformations allowed by our primary image processing code. This includes, for example, pure roll. Consider a fixed Cartesian coordinate system whose origin coincides with the center of the iPad. (This is not the screen-centered coordinate system discussed above in Appendix A.) Imagine that the iPad lies in the plane {Z = 0}.
When the iPad is rolled the center fixed in space, the fully-general image processing code should skew the image displayed on the iPad’s screen so that it appears exactly stationary when viewed by a user located at (0, 0, D). Equivalently, the iPad can be thought of as a portal through which we are viewing a fixed scene. The skewing can be validated as being correct by taking the skewed image and mathematically projecting it back into the original reference plane {Z = 0}. This skewed-then-projected image should exactly match the original image displayed by the iPad (possibly cropped) when overlaid.
This check for pure rolls may be performed as the calculation of the projection is relatively straightforward compared to general movements. The same applies to pure pitches and pure yaws.
In this regard, let the amount of roll in radians be α, so that a point (X0, Y0, 0) on the iPad’s screen moves to (X0 cos(α), Y0, -X0 sin(α)).
The projection of this point back into the plane {Z = 0} as seen from the user located at (0, 0, D) is (X1, Y1, Z1) =
(X0 cos(α), Y0, 0).
To implement this projection as a CIFilter, it is useful to compute the reverse transformation (given a target (projected) point, find the source (screen)
point):
Appendix H: Example Techniques for Importing ImagesThe following description demonstrates how images can be imported and used in connection with certain example embodiments. Although any image can imported, it may be most desirable to use equirectangular 360 images.
The description that follows describes example processes that may be used to modify glazing filters to preserve more highlights in reflections. An example technique that may be used involves calculating the relative luminance, or perceptual brightness of pixels, to modify the reflective glazing filter. This relative luminance is calculated the same way that the Y value is calculated in CIE XYZ color space when being converted from Linear RGB.
Given a linear RGB value for a pixel, relative luminance is calculated with the formula Y = 0.2126Rlinear + 0.7152GLinear + 0.0722Blinear. This normalized Y value is applied to the glazing filter to alter the amount of filtering that is applied to the input image. As relative luminance for a pixel increases, less glazing effect and more of the original image are shown. The glazing effect is not completely removed in some implementations.
Additionally, this highlight emphasis Ymodified has been parameterized around threshold p and linearity s of the effect using the following:
using
where
This formula is used to de-linearize the highlight effect so that the parameters may be adjusted to attempt to achieve the desired effect in a given scene. This de-linearizing formula can break down when certain parameters are used due to divide by zero errors and thus may be replaced with other suitable algorithms.
For further information, see https://en.wikipedia.org/wiki/Relative_luminance and https://en.wikipedia.org/wiki/CIE1931color_space.
Appendix J: Example Techniques for Fisheye Lens CorrectionThere are a number of known fisheye lens correction algorithms. One example that may be used in connection with certain example embodiments is described at http://paulbourke.net/dome/fisheyecorrect/, the entire contents of which is hereby incorporated by reference herein, and which is described in relevant part below.
Appendix K: Example Fisheye MathematicsAny point P in a linear (mathematical) fisheye defines an angle of longitude and latitude and therefore a 3D vector into the world. See
- θ = longitude = atan2(P.y,P.x)
- ø = latitude = r ømax / 2 where ømax is the field of view of the fisheye lens
The vector into the word is given by:
A key to a fisheye is the relationship between latitude ø of the 3D vector and radius on the 2D fisheye image, namely a linear one where ø(r) = r ømax/2 , where r is the radius of the point on the fisheye in normalized coordinates (-1 to 1 across the fisheye circle) and ømax is the field of view of the fisheye across the fisheye circle. Note that this one dimensional function for what is a 2 dimensional curve works because fisheye lenses, like other lenses, are formed from a spinning process and are thus radially symmetric. See
Where a real world fisheye deviates from the idealized relationship above is that real fisheye lenses are rarely (if ever) perfectly linear. Note a linear fisheye lens is often called a “true f-theta lens.” Thus, at some stage of any fisheye mapping process, assuming the lens is not a true f-theta lens (or a close enough approximation), is a function that maps real points on the fisheye to their position on an ideal fisheye.
There are two possible functions depending on whether one needs to convert normalized radii to actual latitude, or whether one needs to convert latitude to actual radius. In both cases a 4th order polynomial function is usually adequate. The form will be ax + bx2 + cx3 + dx4.
Noting that since the origin of the curve always passes through the origin there is no constant term, put another way, r=0 is always latitude=0.
Illustrating this with an actual fisheye lens that is particularly nonlinear, the iZugar MKX22 220 degree fisheye, is shown in
In other words, given a latitude ø of a 3D vector the above gives the normalized radius on the fisheye. This is normally what is used for an image mapping of a fisheye into another image projection type, namely, one computes the 3D vector for each pixel in the output image and the above gives the radius of the pixel in the real fisheye image.
Sometimes, the other mapping is required. In this way, given a radius in the real fisheye image, a determination can be made as to the latitude of the corresponding 3D vector. While in some cases one can invert the polynomial or solve the inverse numerically, it may be easier to just fit another polynomial to the swapped data.
The description above provides examples of the r, ø curves and how polynomials can be fit to those to either derive the correct r on the fisheye image given a latitude, or calculate the correct latitude given r on the fisheye image. There a number of ways of calculating the points for these curves, and several examples will be set forth herein.
The concept of the zero parallax point of a fisheye lens/camera system is related to the “nodal point” in panorama photography. Rotating the lens about the zero parallax position ensures there will not be any stitching errors due to parallax. When doing measurements from a fisheye it is this zero parallax point that should be the origin.
The zero parallax point is typically located somewhere along the barrel of the fisheye lens. Sometimes it is marked on the metal barrel of the lens. One can measure the zero parallax position by aligning two objects in a scene that are at different distances, then rotating the lens/camera and inspecting the two objects on the image. At zero parallax, the two objects will stay aligned, rotating about other positions will see the position of the near and far object separate. This is shown conceptually in
Once the zero parallax position is known, one can directly measure the points for the linearizing curves. One way is with a rotating camera head, e.g., with 5 (or other) degree increments.
Another approach is to place a structure around the camera with markings of equal angle, or at least such that the angle can be measured. One reads off the angle markings on the camera image knowing their latitude in the real world. This is shown schematically in
The origin for the measurements should be the zero parallax position on the lens barrel so as to reduce the parallax issues. However, because one often deals with small lenses, choosing the front face of the lens or the front plate of the camera is probably only going to be out by a centimeter or less and therefore may be acceptable in some implementations.
In theory, one only needs to do one half of one of the above (or other) procedures. But doing both halves can be a good test for symmetry (is the lens pointing straight down, are you measuring the center of the fisheye circle correctly, and so on), and thus may improve accuracy and reliability.
Appendix L: Example Flowcharts for Composing ImageClaims
1. A method of modifying a reflection of a display device as viewed by a viewer, the method comprising:
- acquiring an image in real-time using an electronic device;
- calculating display properties based on at least one of the image and characteristics of the display device to which an output image is to be provided;
- generating a modified reflection of the display device based on the display properties; and
- generating, for display via the display device, the output image comprising the modified reflection of the display device.
2. The method of claim 1, wherein the display device simulates a glass sample.
3. The method of claim 2, where the modified reflection simulates a reflection of a glass sample based on a viewing angle.
4. The method of claim 2, wherein the modified reflection simulates a reflection of the glass sample with an antireflection coating.
5. The method of claim 1, wherein the modified reflection comprises a reflected color.
6. The method of claim 1, wherein the modified reflection comprises 100% reflection.
Type: Application
Filed: Feb 3, 2023
Publication Date: Jun 8, 2023
Applicant: Guardian Glass, LLC (Auburn Hills, MI)
Inventors: Alexander Sobolev (West Bloomfield, MI), Vijayen S. Veerasamy (Ann Arbor, MI)
Application Number: 18/164,345