# THREE-DIMENSIONAL MODELING FROM SINGLE PHOTOGRAPHS

A method of obtaining a three-dimensional digital model of an artificial object, made up of a plurality of geometric primitives, the artificial object being in a single two-dimensional photograph, the method comprising: using edge detection to define a two-dimensional outline of the artificial object within the photograph; interactively allowing a user to define two-dimensional profiles of successive ones of the geometric primitives; interactively allowing a user to sweep respective profiles over an extent of a corresponding one of the geometric primitives within the image; generating successive three-dimensional model parts from existing detected edges of the corresponding geometric primitives and the sweeping of the respective profile; and aligning the plurality of three-dimensional model parts to form the three-dimensional model.

## Latest Ramot at Tel-Aviv University Ltd. Patents:

**Description**

**RELATED APPLICATION**

This application is a continuation of U.S. Patent Application No. 14/177,359 filed on Feb. 11, 2014, which claims the benefit of priority of U.S. Provisional Patent Application No. 61/763,005 filed on Feb. 11, 2013. The contents of the above applications are all incorporated by reference as if fully set forth herein in their entirety.

**FIELD AND BACKGROUND OF THE INVENTION**

The present invention, in some embodiments thereof, relates to three-dimensional modeling from single photographs and, more particularly but not exclusively to modeling of manmade objects with straightforward geometry.

The creation and modeling of 3D objects has always been a difficult task even for professionals. First, a mental idea of what the model should look like needs to be formed. This conceptual stage requires creativity and inspiration. Then, the idea needs to be implemented by a series of actions using various geometric modeling tools. These steps take time, demand very high proficiency, and a fair amount of skill. By modeling objects from existing photographs one can first alleviate the mental stage. Second, it allows much simpler modeling that can also borrow textures from the image. This forms at least an initial base model that can later be edited and refined. In addition, such abilities can be utilized for manipulating the images themselves using 3D. An example of a suitable object is shown in

Extracting three dimensional models from a single photo is still a long way from realization at the current state of technology, as it involves numerous complex tasks: the target object must be separated from its background, and its 3D pose, shape and structure should be recognized from its projection. These tasks are difficult since they require some degree of semantically understanding the object. To alleviate this problem, complex 3D models can be partitioned into simpler parts, but identifying object parts also requires semantic understanding and is difficult to perform automatically. Moreover, once decomposing a 3D shape to parts, the relations between these parts should also be understood and maintained in the final composition.

Related Work

3D Modeling from a single photo. Images have always been an important resource and were used as references in 3D modeling. There are numerous techniques that model shapes from multiple images [**26**, **28**]. However, modeling from a single photograph is more challenging since there is more ambiguity in the observed geometry. Methods to reconstruct an object from a single image usually require some degree of manual intervention. Oh et al. [**23**] allow the annotation of depth and layer information in a single image and yield impressive image editing at the scene level. Russell et al. [**25**] build a manually annotated database of 3D scenes to assist recovering scene-level geometry and camera pose. Lau et al. [**19**] introduced a “Modeling-in-context” concept, allowing complementary objects of a photograph to fit better to other objects in the photo. Jiang et al. [**15**] recover an architectural model heavily relying on the symmetry of such buildings.

Of particular significance is the work of Xu et al. [**30**] which models a man-made object observed in a single photograph. Their method relies on matching and warping an existing 3D object to the observed object in the photograph. The warp is constrained by semantic geometric (geo-semantic) constraints. However, the success of their method strongly depends on the existence, and retrieval, of a similar 3D shape.

The task of 3D modeling from a single image is closely related to the endeavor of reconstructing a 3D shape from a sketch [**24**]. A number of interactive systems have been developed for this purpose [**13**, **14**, **16**, **34**, **32**]. Free-sketched objects however do not necessarily correspond to real man-made objects that may appear in photographs, and there remain problems with modeling such man-made objects, which typically consist of a composition of primitives with certain inter-relations among the components [**9**, **21**], which the systems aimed at free sketches do not approach.

Part-based Modeling. Part-based snapping techniques have been used for modeling 3D objects from sketches. Gingold et al. [**10**] introduce an interface to generate 3D models from 2D drawings by manually placing 3D primitives. Tsang et al. [**29**] use a guiding image to assist sketch-based modeling, the user's input curves can snap to the image and then the user is provided with suggestions for curve completion from a curve database. Recently, Shtof et al. [**27**] have modeled 3D objects from sketches by snapping primitives. In their system, the user drags-and-drops an entire 3D primitive onto its place. Since the fitting problem is ambiguous, the silhouettes of the sketches must be semantically labeled, and the sketch is expected to contain some cues that indicate the part boundaries.

Sweep-based Modeling. Sweep based models have been studied extensively in Computer-Aided Design. Choi and Lee [**7**] model sweep surfaces by using coordinate transformations and blending. Swirling-Sweepers [**1**] is a volume preserving modeling technique capable of unlimited stretching, avoiding self-intersection. Hyun et al. [**12**] and Yoon et al. [**33**] use sweeping for human and freeform deformation, respectively. Many CAD works also aim at modeling generalized primitives. Kim et al. [**17**] model and animate generalized cylinders by a translational sweep along the spine or rotational sweep around the spine. Lee [**20**] models generalized cylinders using direction map representation. Based on generalized cylinder, Murugappan et al. [**22**] propose an interesting interaction approach to create 3D shapes by hand gestures. None of these methods have been applied for modeling from photographs or sketches.

Semantic Constraints. Gal et al. [**9**] have introduced a 3D deformation method while preserving some semantic constraints among the object's parts. Such geo-semantic constraints [**35**] have been shown to be useful to quickly edit or deform man-made models [**30**, **31**]. Li et al [**21**] and Shtuf et al. [**27**] reconstruct 3D shapes while simultaneously inferring the global mutual geo-semantic relations among their parts.

Object-Level Image Editing. Unlike traditional image-based editing, object-based editing allows high-level operations. Operating on the object-level requires extensive user interaction [**8**, **5**] or massive data collection [**18**, **11**]. Barrett et al. [**4**] use wrapping to achieve object-based editing, which is restricted to 3D rotation. Zhou et al. [**37**] fit a semantic model of a human to an image, allowing an object-based manipulation of a human figure in photographs. Recently, Zheng et al. [**36**] have proposed using cuboid proxies for semantic image editing. Man-made objects are modeled by a set of cuboid proxies, possibly together with some geometric relations or constraints, allowing their manipulation in the photo.

**SUMMARY OF THE INVENTION**

The present embodiments provide a method and apparatus for extracting three-dimensional information of objects in single photographs by providing a user with interactivity to draw a cross-section for a part of the object and then sweep the cross section over the part of the object to which it applies. Unlike certain of the above cited works, the present embodiments may focus on the modeling of a single subject that is observed in a photograph and not the whole scene.

The computer then fits the cross-section to the object outline of which it is aware and once all parts of the object have been addressed in this way the computer is able to generate a three-dimensional model of the object, which can then be rotated, or used in animations or in any other way.

Thus, in the present embodiments, the original object is not restricted, as with Xu et al, to prestored shapes. Rather, the embodiments work on geometric primitives, so that any shape that can be deconstructed into geometric primitives can be reconstructed into a 3D object. The reconstructed object is thus composed of these generic primitives, providing larger scope and flexibility.

The prior art teaches snapping, and separately teaches sweeping. The present embodiments combine sweeping and snapping to provide automatic alignment of the primitives into an overall object.

According to an aspect of some embodiments of the present invention there is provided a method of obtaining a three-dimensional digital model of an artificial object made up of a plurality of geometric primitives, the artificial object being in a single two-dimensional photograph, the method comprising:

defining a two-dimensional outline of the artificial object within the photograph;

interactively allowing a user to define cross-sectional profiles of successive ones of the geometric primitives, the cross-sectional profiles defining a third dimension;

interactively allowing a user to provide sweep input to sweep respective defined cross-sectional profiles over an extent of a corresponding one of the geometric primitives within the image, the sweeping generating successive three-dimensional model primitives from existing detected edges of the corresponding geometric primitives and the sweeping of the respective profile; and

aligning the plurality of three-dimensional model primitives to form the three-dimensional model.

The method may comprise interactively allowing the user to explicitly define three dimensions of the geometric primitive using three sweep motions, wherein a first two of the three sweeps define a first and second dimension of the cross-sectional profile and a third sweep defines a main axis of the geometric primitive.

The method may comprise, upon the user sweeping the two-dimensional profile over a respective one of the geometric primitives, dynamically adjusting the two-dimensional profile using a pictorial context on the photograph and automatically snapping photograph lines to the profile.

In an embodiment, the snapping allows the three-dimensional model to include three-dimensional primitives that adhere to the object in the photographs, while maintaining global constraints between the plurality of three-dimensional model primitives composing the object.

The method may comprise optimizing the global constraints while taking into account the snapping and the sweep input.

The method may comprise a post snapping fit improvement of better fitting the primitive to the image, the better fitting comprising searching for transformations within ±10% of primitive size, that create a better fit of the primitive's projection to the profile.

In an embodiment, the defining the two dimensional outline comprises edge detecting.

An embodiment may comprise estimating a field of view angle from which the photograph was taken in order to estimate and compensate for distortion of the primitives within the photograph.

An embodiment may comprise using relationships between the primitives in order to define global constraints for the object.

An embodiment may comprise obtaining geo-semantic relations between the primitives to define the three-dimensional digital model, and encoding the relations as part of the model.

An embodiment may comprise inserting the three-dimensional digital model into a second photograph.

The method may comprise extracting a texture from the photograph and applying the texture to sides of the three-dimensional model not visible in the photograph.

In an embodiment, the defining the cross-sectional profiles comprises defining a shape and then distorting the shape to correspond to a three-dimensional orientation angle.

The method may comprise applying different constraints to different parts respectively of a given one of the geometric primitives, or locally modifying different parts respectively of a given one of the geometric primitives.

The method may comprise snapping the first two user sweep motions to the photograph lines, using the endpoints of the first two user sweep motions along with an anchor point on a respective primitive to create three-dimensional orthogonal system for a respective primitive.

The method may comprise supporting a constraint, the constraint being one member of the group consisting of: parallelism, orthogonality, collinear axis endpoints, overlapping axis endpoints, coplanar axis endpoints and coplanar axes, and for the member testing whether a pair of components is close to satisfying the member, and if the member is satisfied or close to satisfied then adding the constraint to a respective one of the primitives.

In the method, aligning the three dimensional primitives may comprise finding an initial position for all primitives together by changing only their depth to adhere to geo-semantic constraints, followed by modifying shapes shape of the primitives.

The present embodiments may include a user interface for carrying out the above method. The user interface may comprise an outline view of a current photograph on which view to carry out interactive sweeping to define cross sections of respective primitives and on which to snap the cross-sections. The user interface may further comprise a solid model view and a texture view respectively of the current photograph, and selectability for user selection between different basic cross-sectional shapes.

According to a second aspect of the present invention there may be provided a method of digitally forming a three-dimensional geometric primitive from a two-dimensional geometric primitive from a two-dimensional photograph, comprising:

interactively obtaining user input to draw a two-dimensional cross section of the primitive and then using further user input to sweep the cross-section over a length of the primitive.

A geometric primitive is a part of an object whose cross section does not change, or which does not change discontinuously. That is to say the part is a geometric primitive if it has a cross section that remains constant or changes continuously along the length of the part.

According to a third aspect of the present invention there is provided a method of forming a derivation of a photograph, the photograph incorporating a two dimensional representation of a three-dimensional object, the two-dimensional representation being a rotation of an original two-dimensional representation, the rotation being formed by:

carrying out the method described hereinabove to form a three-dimensional model of the original two-dimensional representation;

rotating the three-dimensional model; and projecting the rotated three-dimensional model onto a two-dimensional surface to form the derivation.

Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the invention, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting.

Implementation of the method and/or system of embodiments of the invention can involve performing or completing selected tasks manually, automatically, or a combination thereof. Moreover, according to actual instrumentation and equipment of embodiments of the method and/or system of the invention, several selected tasks could be implemented by hardware, by software or by firmware or by a combination thereof using an operating system.

For example, hardware for performing selected tasks according to embodiments of the invention could be implemented as a chip or a circuit. As software, selected tasks according to embodiments of the invention could be implemented as a plurality of software instructions being executed by a computer using any suitable operating system. In an exemplary embodiment of the invention, one or more tasks according to exemplary embodiments of method and/or system as described herein are performed by a data processor, such as a computing platform for executing a plurality of instructions. The data processor may include a volatile memory for storing instructions and/or data and/or a non-volatile storage, for example, a magnetic hard-disk, flash memory and/or removable media, for storing instructions and/or data. A network connection may be provided and a display and/or a user input device such as a keyboard or mouse may be available as necessary.

**BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS**

The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawings will be provided by the Office upon request and payment of the necessary fee.

Some embodiments of the invention are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the invention. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the invention may be practiced.

In the drawings:

**DESCRIPTION OF SPECIFIC EMBODIMENTS OF THE INVENTION**

The present invention, in some embodiments thereof, relates to three-dimensional modeling based on a single photograph.

The present embodiments may provide an interactive technique for modeling 3D objects having a clear geometry, typically but not exclusively man-made objects, by extracting them from a single photograph. The modeling of a 3D shape from a single photograph requires the understanding of the components of the shape, their projections, and relations. These are particularly difficult for automatic algorithms but are simple cognitive tasks for humans. The present interactive method may intelligently combine the cognitive ability of humans with the computational accuracy of the machine. To extract an object from a given photograph, the user draws cross-sectional profiles of parts of the object and sweeps the profile over the part using simple gestures, to progressively define a 3D body that snaps to the shape outline in the photo. The generated part adheres to various geo-semantic constraints imposed by the global 3D structure. As explained below, with the present intelligent interactive modeling tool, the daunting task of object extraction is made simple. Once the 3D object is extracted, it can be quickly edited and placed back into photos or 3D scenes, offering object-driven photo editing tasks which are impossible to achieve in image-space.

More particularly, the present disclosure teaches an interactive technique to model 3D man-made objects from a single photograph utilizing the interplay between humans and computers, while leveraging the strengths of both. The human is involved in perceptual tasks such as recognition, positioning, and partitioning, while the computer performs tasks which are computationally intensive or require accuracy. Guided by the present method, the final model of the object includes its geometry and structure, as well as some of its semantics. This allows the extracted model to be readily available for intelligent editing, while maintaining the shape's semantics.

The present approach is based on the observation that many man-made objects can be decomposed into simpler parts that can be represented by a generalized cylinder or similar primitives. An idea of the present method is to provide the user with an interactive tool to guide the creation of 3D editable primitives. The tool is based on a relatively simple modeling gesture referred to herein as sweep-snap. The sweep-snap gesture allows the user to explicitly define the three dimensions of the primitive using three sweeps. The first two sweeps define the first and second dimension of a 2D profile and the third, longer, sweep is used to define the main curved axis of the primitive.

While the user sweeps the primitive, the computer program dynamically adjusts the progressive profile by sensing the pictorial context on the photograph and automatically snapping to it. With such sweep-snap operations the user models 3D parts that adhere to the object in the photographs, while the computer automatically maintains global constraints with other primitives composing the object. The present embodiments use geo-semantic constraints that define the semantic and geometric relations between the primitive parts of the final 3D model such as parallelism and collinearity.

The present method thus disambiguates the three dimensional problem by an explicit sweep move of a 2D entity. The present embodiments adopt a geo-semantic constraint inference to assist the modeling of man-made objects. Thanks to the presently disclosed user interaction, the present embodiments may be able to achieve faster modeling than the prior art systems listed above and can support fuzzy and noisy image edges as well as clear sketches and photographs. The present embodiments obviate any requirement for sketch classification and avoid the annoyance of false-positives when geo-semantic optimization falls into a local minimum.

As mentioned above, Zheng et al. [**36**] proposed using cuboid proxies for semantic image editing. Man-made objects are modeled by a set of cuboid proxies, possibly together with some geometric relations or constraints, allowing their manipulation in the photo. The method of the present embodiments achieves similar image manipulations with a larger variety and more complex man-made models with more kinds of geo-semantic constraints. The present embodiments may also recover a full 3D model of the object rather than just a proxy, and support various shapes rather than just cuboids. Using the user interaction the present embodiments avoid the need for unreliable image segmentation and unsupervised model fitting. In the present embodiments, the user may provide vital information in the modeling process with little effort.

Using sweep-snap technology, non-professionals can extract various 3D objects from photographs. These objects may then be used to build a 3D scene or to alter the image itself by manipulating or editing the objects or its parts in 3D, and pasting them back into the photograph. The present disclosure contains results of a variety of such examples.

Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or the Examples. The invention is capable of other embodiments or of being practiced or carried out in various ways.

Referring now to the drawings,

The object in the photograph is typically made up of several geometric parts, and needs to be extracted from the single two-dimensional photograph. Edge detection may be used to determine the bounds of the object from the photograph. A typical object is that shown in

The method then generates three-dimensional model parts from existing detected edges of the corresponding geometric parts and the sweeping of the respective profile. The method then aligns the three-dimensional model parts in 3D space to form a consistent three-dimensional model. This alignment is a further snap stage.

Reference is now made to *a*)*b*)

*c*)*d*)*e*)*f*)

In more detail, the interactive modeling process takes as input a single photo such as shown in *a*)

Although the user interacts with the given photo, the actual modeling algorithm uses an outline image of the object as shown in *b*)

To create one part, the user interactively fits a 3D primitive into the given photo. This operation is not trivial since the photo lacks the third dimension and fitting can be ambiguous. The challenge is to provide the interactive means to disambiguate such fitting. The sweep-snap technique of the present embodiments requires the user to generate a 3D model that roughly approximates the target part, and snaps to the extracted outline of the object.

The user thus defines the 3D approximate part by first drawing a 2D profile of the part and then its main axis. The former is done by drawing a 3D rectangle or circle directly over the image, while the latter is done by sweeping the profile along a straight or curved axis to form the 3D part. Defining the profile as well as the sweeping operation are simple tasks since they do not demand accuracy. The profile dimensions are guided by the object's left and right outlines as shown in *c*)*d*)

As the modeled parts are being gathered, the geometric relations among them serve (i) to assist in disambiguating and defining the depth dimension and (ii) to optimize the positioning of the parts. These geometric relations include parallel, orthogonal, collinear and coplanar parts. Most of these are automatically inferred from the positioning of the parts, but the user can also specify the constraints for the selected parts manually. The present embodiments optimize these geo-semantic constraints while taking into account the snapping of the 3D geometry to the object's outlines and the user's sweeping input. The complete model with geo-semantic relation is shown in *e*)*f*)

Single Primitive Fitting

The main challenge in image-guided modeling of a 3D part, is to disambiguate the observed subject and infer the missing depth dimension. Directly fitting a 3D object into the image requires many geometric hints to constrain the non-linear optimization problem [**27**]. The present embodiments explicitly guide the 3D inference with simple user interaction. The sweep-snap modeling tool consists of two stages. In the first, the user draws a 2D profile assisting by explicitly defining its position in 3D. In the second, the user sweeps the profile to implicitly define a volumetric part.

Sweep-snap relies on snapping of primitives to object outlines created from image edges. To extract the image edges and build candidate object outlines the present embodiments adopt a method for hierarchical edge feature extraction based on spectral clustering [**2**]. Then, a technique is applied to link the detected edge pixels into continuous point sequences [**6**], each shown in different color in *b*)*a*)

Reference is now made to *a*-3*e*

Profile. In a first stage, the user draws the 2D profile of the generalized cylinder, usually at one end of the shape. This is illustrated in *b*)*c*)

Sweeping. Once the base profile is ready, in the second stage, the user sweeps it along a curve that approximates the main axis of the 3D part. In general, this curve should be perpendicular to the profile of the 3D primitive, as indicted by blue arrows in *c*)

During drawing, the axis curve is sampled in image space at uniform intervals of five pixels producing sample points A_{0}, . . . A_{N}. Then, at each sampled point A_{i}, a copy of the profile is fit, centered around the curve. The normal of the profile is aligned with the orientation of the curve at A_{i}, and its diameter is adjusted to meet the object's outlines. Together, the adjusted copies of the profile form a discrete set of slices along the generalized cylinder, see *e*)

At each point A_{i}, we first copy the profile from A_{i−1 }and translate it to A_{i}. Then we rotate it to accommodate for the bending of the curve. Now, we consider the two tips of the profile, denoted by p_{i}^{0},p_{i}^{1}—indicated by yellow points in *d*)_{i}^{j}, j∈[0,1] we cast a 2D ray from point A, along the diameter of the profile, through pZ seeking for an intersection with an image outline.

Finding the correct intersection of the ray with an image outline is somewhat challenging. The image may contain many edges in the vicinity of the new profile. The closest one is not necessarily the correct one, e.g. when hitting occlusion edges. In other cases, the correct edges may be missing altogether. To deal with these we first limit the search for an intersection to a fixed interval—the size of which is governed by limiting the diameter change of adjacent profiles not to exceed 20% of the length. Second, we search for an intersecting outline that is close to perpendicular to the ray. If the angle between the ray and the outline is larger than π/3 the candidate intersection is discarded.

When an intersection is found the contour point p_{i}^{j }position is snapped to the intersection position. If both contour points of the profile are snapped, one may adjust the location of A_{i }to lie in their midpoint. If only one side is successfully snapped, the length of the snapped side may be mirrored to the other side and the other contour point may be moved respectively. Lastly, if none of the two contour points is snapped, the size of the previous profile is maintained. Reference is now made to

Numerous primitives can be used. Generalized cuboids are modeled in a similar manner as generalized cylinders. The main difference lies in the first stage of modeling the profile. The two strokes that define the profile of a cuboid follow the two edges of the cuboid base instead of the diameters of the disk, as shown in the bottom row of

The above modeling steps follow user gestures closely, especially when modeling the profile. This provides more intelligent understanding of the shape but is less accurate. Therefore, after modeling each primitive, we apply a post-snapping stage to better fit the primitive to the image as well as correct the view. We search for small transformations (±10% of primitive size) that create a better fit of the primitive's projection to the edge curves that were snapped in the editing process. We also automatically refine the field of view angles (initialized to 45 degree) after each modeling step for better fitting.

In many cases, the modeled object has some special properties, or priors, that can be used to constrain the modeling. For example, if we know that a given part has a straight spine, we can constrain the sweep to progress along a straight line. Similarly, we can constrain the sweep to preserve a constant or linearly changing profile radius. In this case, the detected radii are averaged or fitted to a line along the sweep. We can also constrain the profile to be a square or a circle. In fact, a single primitive can contain segments with different constraints: it can start with a straight axis and then bend, or use a constant radius only in a specific part. These constraints are extremely helpful when the edge detection results are bad. Lastly, we provide the possibility to interactively adjust the profile diameter locally, for instances, in places where the outlines were not salient or missing altogether.

To further ease the modeling interaction, the present embodiments may also provide a copy and paste tool. The user can drag a selected part that is already snapped over to a new location in the image and snap it again in the new position. While copying, the user can rotate, scale, or flip the part.

Inter-part Optimization

The technique described above generates parts that fit the object outlines. The positions of these parts in 3D are still ambiguous and inaccurate. However, as these parts are components of a coherent man-made object, they have certain geometric relations among them derived from the semantics of the object. Constraining the shape based on such geo-semantic inter-parts relations allows modeling coherent shapes [**9**, **35**, **21**, **27**].

A direct global optimization of the positioning of parts that considers their geo-semantic relations is computationally intensive and subject to fall into local minima, since each component has many degrees of freedom. In the present setting, however, the modeled components are also constrained to agree with some outlines of the image. This can significantly reduce the degrees of freedom of the parts. By considering the image constraints, the dimensionality of the optimization space can be lowered and local minima are avoided. In the following, we describe how we simplify the general problem and solve a rather light-scale optimization to respect the geo-semantic constraint among the sweep-snapped parts.

The key idea is that by fixing the projection of a part, its position and orientation can be determined by one or two depth values only. We first describe the method for simple parts that can be modeled by a single parameter, namely parts which were modeled along a straight axis. General cylinders and cuboids with curved axes will later be approximated by two arbitrary-connected straight axis primitives at the start and end of the shape.

Reference is made to *a **b *

The position and orientation of a straight-axis generalized cylinder a, can be determined by two points we call anchors, C_{i,1 }and C_{i,2 }along its main axis, as shown for example in _{i,j}, j∈[1,6] positioned at the center of each face. Every opposite pair of anchors defines one main axis of the cuboid. Even though four anchors are enough to fix the position and orientation of a cuboid, an embodiment uses six anchors to allow setting various geo-semantic constraints on this part.

As the user defines the 3D part i using three strokes for the three dimensions, as discussed above in respect of _{i }on the part's projection. For a cuboid part we pick the point connecting the first and second of the user's strokes and for a cylinder we pick the point connecting the second and third strokes. Due to the internal orthogonality of the straight part, the profile of the part is perpendicular to the main axis. Therefore, we may use the endpoints of the user's strokes (after snapping them to the image) to define three points that together with R_{i }create an orthogonal system. These are the orange points and lines in _{i}, the z value of R_{i}, by using three orthogonality constraints equations.

Next, the positions of the anchor points C_{i,j }in world coordinates can be defined using the orthogonal local axes. This defines the structure of part i. Since the local axes depend only on the depth value z_{i }of the point R_{i}, we can parameterize the positions of C_{i,j }as a function of z_{i}: C_{i,j}=F_{i,j}(z_{i}). That is, the position and orientation of the whole part become a function of a single unknown z_{i}, F_{i,j }has the form

for each coordinate component, where a depends only on the x and y-coordinate of the endpoints of the local axes, and b,v are decided by perspective parameters. They are different for each axis endpoint and for each coordinate component.

We may use the anchor points to define the geo-semantic relations among the parts. Specifically, we support six types of constraints: parallelism, orthogonality, collinear axis endpoints, overlapping axis endpoints, coplanar axis endpoints and coplanar axes. During the modeling phase, for each type, we test whether a pair of components is close to satisfying one of the above geo-semantic constraints, and if so, we add the constraint to our system. For example, for two cylinders with index m and n, if the angle between vector (C_{m,1}−C_{m,2}) and (C_{n,1}−C_{n,2}) is smaller than 15 degree, we may add a parallel constraint (C_{m,1}−C_{m,2})×(C_{n,1}−C_{n,2})=0 to our system of constraints. Similarly if any three among the four anchors of two cylinders form a triangle containing an angle larger than 170 degree, then we add a collinear axes constraints: (C_{1}−C_{2})×(C_{1}−C_{3})=0 as shown in

_{i,j }for the axis endpoints of a cuboid from the depth value z_{i }of the reference point R_{i}.

Suppose we have defined p geo-semantic constraints G_{k }for a set of n components, together with the objective function of fitting to the image outline, we define the following optimization system:

where m_{i }is the number of axes of ith primitive part. We add weights w_{i }proportional to the radius of the base profile of each part and the length of its axis. Larger parts have more impact on the solution since typically larger parts are modeled more accurately. Intuitively, the first equation tries to fit the part's geometry (C_{i,j}) to the image outline and the user's gestures, while the second set of equation define the geo-semantic constraints.

Solving for C_{i,j }and z_{i }together we have a non-linear non-convex optimization problem with non-linear constraints. Such a system is very hard to solve directly without being trapped in local minima. Hence, we decompose the solution of this system into a two-step procedure. The first step tries to find a good initial position for all parts together by changing only their depth (governed by z_{i}) to adhere to the geo-semantic constraints. In the second step, the full system is solved—allowing the shape of the parts (C_{i,j}) to change as well.

In the first step, we modify the soft constraint in Equation (1) to a hard one, and replace C_{i,j }by F_{i,j}(z_{i}) in all equations. This means Equation (1) is trivially true and we are left with just the constraints in Equation (2). In effect, this means we fix the projection and find the optimal z_{i }fitting the geo-semantic constraints. This reduces the number of variables to n (z_{i}, 1≦i≦n) and changes Equation (2) into an over-determined system, where each equation only contains two different variables.

We find the least squares solution _{i }for example by conjugate gradient, with all z_{i }values initialized to 0.

This first step provides a good initial condition to find the optimal solution for C_{i,j}, as it should be around the values F_{i,j}(_{i}), fixing only small inconsistencies with the geo-semantic constraints. Hence, in the second step, we solve the full optimization of Equation (1) with the set of constraints in Equation (2), for example using an augmented Lagrangian method. Both steps are fast, and we are able to avoid local minima due to better initialization from the first step. This leads to an interactive rate optimization. Note that the nonlinearity of F_{i,j}O is due to the assumption of a perspective projection. However, we can approximate this projection linearly since we assume the change in z_{i }is small. This further increases the speed and stability of our solution.

Lastly, to handle parts with a non-straight axis, we first simplify the problem by assuming that the general axis lies on a plane. Second, we treat the part as being a blend of two straight-axis sub-parts, placed at the two ends of the part. The position of each of these sub-parts is determined by a single depth value in the optimization above, and the whole part is defined by connecting the two subparts with a general axis while constraining the profile snapping.

The Derivation of F

For a straight primitive with reference point R, We denote the three orange points in _{m}, m∈[1,3], the order doesn't matter. Then we have three equation defined by orthogonality in world coordinates: {right arrow over (RP)}_{m}·{right arrow over (RP)}_{n}=0, where the pair (m, n)∈P={(1,2), (2,3), (3,1)}. We denote the world coordinates of P_{m }by (X_{m}, Y_{m}, Z_{m}), screen coordinates by (x_{m}, y_{m}), and depth by z_{m}. For R, they are (X_{r},Y_{r},Z_{r}) etc. So we can write the equations:

(*X*_{m}*−X*_{r})(*X*_{n}*−X*_{r})+(*Y*_{m}*−Y*_{r})(*Y*_{n}*−Y*_{r})+(*Z*_{m}*−Z*_{r})(*Z*_{n}*−Z*_{r})=O,

by inverse perspective transformation, we can change this to:

where N, u, v are constant when the perspective parameters are fixed. Since the projection is fixed, x_{m}, y_{m}, x_{n}, y_{n }are all fixed. The only variables are zs. To solve these equations, we first replace all zs by _{m},_{n},_{r}^{2 }on both side, and representing _{m }by _{n}, we get:

where

In this representation we replace the two unknown _{r}. Let C_{s,t}=(x_{s}x_{t}+y_{s}y_{t}+c^{2}), where (s, t) can be 1,2,3 and r, we directly give the representation of _{m}:

Due to symmetry, m, n, l can be any permutation of 1,2,3. Note that the two solutions exactly match the ambiguity of perspective projection of the primitive. We examine the two solutions and use the one that can generate a projection that fits the image edges better. This has the form of _{m}=a_{r}, which means z_{m }is linear with z_{r}. We can easily compute the world coordinates (X_{m}, Y_{m}, Y_{m}) as a function of z_{y }by inverse perspective transformation. Since the axis endpoints C_{i,f }are linear combination of P_{m}, we can also decide each of their coordinates as a function of z_{r }in the form of

where b, v are decided by the perspective, and a is decided by the above derivation.

Experimental Results

The sweep-snap interactive technique referred to herein is currently implemented in C++. The system provides an outline view for sweep-snap, a solid model view and a texture view for checking the model and image editing. The user can choose between “cuboid”, “cylinder” or “sphere” primitives using a button or key shortcut. The system also provides conventional menu selection, view control and deformation tools. Most of the examples given below were modeled in a few minutes or less. The modeling process is intuitive and fluent so that even an untrained user with little experience of the technique can handle. Editing and repositioning the object requires activities which would be familiar to users of other parametric editing techniques.

Once the objects have been modeled, the user may map the texture from the image onto the object, as exemplified in the bottom row in **3**] from the texture.

Modeling from single image and editing. The approximated 3D model and its texture allow semantic image editing. Before editing, the image of the 3D model is cut out from the photo, leaving a black hole (as demonstrated in **3**].

*f*)

Reference is now made to

Thus, in the middle row we show the extracted 3D models, repositioned and, in the third row, inserted back into the photo. The rightmost column shows the modeling and repositioning of three objects in one complex photo. Note that the Menorah has been rotated as well as translated on the ground plane.

Reference is now made to

Modeling the Obelisk in Paris from two photos as per the above involves (a) Taking the base of the Obelisk from a close view and thus capturing detail. (b) Transporting the partial 3D model from the close view to a more distant view, where part of the base is occluded, to complete the modeling. (c) the texture of the transported part is blended into the region it occupied and the whole is rotated. (d) The end result is that details of the base are visible in a close-up view of the model of the Obelisk, when in fact most of the obelisk is taken from the distant photograph.

More particularly, in

Reference is now made to

Thus,

Then, the whole tap is copied and attached to another side of the wall. The bottom left shows a candleholder being modeled and rotated, with its two arms duplicated to a perpendicular position. We also enlarge the middle holder. The top right shows a street lamp with duplicated lamps moved to a lower position, rotated and copied to other positions in the street. The bottom right shows a samovar rotated with multiple copies of its handles pasted across its surface.

Reference is now made to

In

Modeling from sketch. Reference is now made to **27**].

Recently, Shtuf et al. [**27**] presented a method to model objects from 2D sketches. In

The photographs themselves usually have some distortions from ideal perspective projection, especially when an object is too close or taken from a wide angle camera. In this case, fisheye correction should be applied first before modeling.

Conclusion

We present an interactive technique to model 3D man-made objects from a single photograph by combining the cognitive ability of humans with the computational accuracy of the machine. The results show that the present embodiments may model a large variety of man-made objects from natural images or photographs, as well as modeling objects from sketches. The modeled objects may be used to achieve semantic editing and composition of images, as well as creating simple 3D scenes by copying items from photographs. One may extend the types of supported primitives to allow modeling of free shapes of natural objects. It is also possible to add symmetry and smoothness constraints on the shapes. Sweep-snap can also be extended for modeling from multi-view image or video without the help of depth data. In terms of applications, we demonstrate editing and manipulation of geometry and furthermore, the recovered 3D model and surface norms can be used to achieve re-lighting and material editing.

It is expected that during the life of a patent maturing from this application many relevant pulse shaping and symbol decoding technologies will be developed and the scope of the corresponding terms in the present description are intended to include all such new technologies a priori.

The terms “comprises”, “comprising”, “includes”, “including”, “having” and their conjugates mean “including but not limited to”.

The term “consisting of” means “including and limited to”.

As used herein, the singular form “a”, “an” and “the” include plural references unless the context clearly dictates otherwise.

It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other described embodiment of the invention. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.

Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.

All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention. To the extent that section headings are used, they should not be construed as necessarily limiting.

**REFERENCES**

[**1**] Angelidi, A., Canif, M., Wyvill, G., and King, S. 2004. Swirling-sweepers: Constant-volume modeling. In *Computer Graphics and Applications, *2004. *PG *2004. *Proceedings. *12*th Pacific Conference *on, IEEE, 10-15.

[**2**] Arbelaez, P., Maire, M., Fowlkes, C., and Malik, J. 2011. Contour detection and hierarchical image segmentation. *Pattern Analysis and Machine Intelligence, IEEE Transactions on *33, 5,898-916.

[**3**] Barnes, C., Shechtman, E., Finkelstein, A., and Goldman, D. 2009. Patchmatch: a randomized correspondence algorithm for structural image editing. *ACM Transactions on Graphics*-*TOG *28, 3, 24.

[**4**] Barrett, W., and Cheney, A. 2002. Object-based image editing. In *ACM Transactions on Graphics *(*TOG*), vol. 21, ACM, 777-784.

[**5**] Cheng, M., Zhang, F., Mitra, N., Huang, X., and Hu, S. 2010. Repfinder: finding approximately repeated scene elements for image editing. *ACM Transactions on Graphics *(*TOG*) 29, 4,83.

[**6**] Cheng, M. 2009. Curve structure extraction for cartoon images. In *Proceedings of The *5*th Joint Conference on Harmonious Human Machine Environment, *13-25.

[**7**] Choi, B., and Lee, C. 1990. Sweep surfaces modelling via coordinate transformation and blending. *Computer*-*Aided Design *22,2,87-96.

[**8**] Eitz, M., Sorkine, 0., and Alexa, M. 2007. Sketch based image deformation. In *Proceedings of Vision, Modeling and Visualization *(*VMV*), 135-142.

[**9**] Gal, R., Sorkine, 0., Mitra, N., and Cohen-Or, D. 2009. iwires: an analyze-and-edit approach to shape manipulation. In *ACM Transactions on Graphics *(*TOG*), vol. 28, ACM, 33.

[**10**] Gingold, Y., Igarashi, T., and Zorin, D. 2009. Structured annotations for 2d-to-3d modeling. In *ACM Transactions on Graphics *(*TOG*), vol. 28, ACM, 148.

[**11**] Goldberg, C., Chen, T., Zhang, F., Shamir, A., and Hu, S. 2012. Data-driven object manipulation in images. In *Computer Graphics Forum, *vol. 31, Wiley Online Library, 265-274.

[**12**] Hyun, D., Yoon, S., Chang, J., Seong, J., Kim, M., and Jailer, B. 2005. Sweep-based human deformation. *The Visual Computer *21, 8,542-550.

[**13**] Igarashi, T., Kawachiya, S., Tanaka, H., and Matsuoka, S. 1998. Pegasus: a drawing system for rapid geometric design. In *CHI *98 *conference summary on Human factors in computing systems, *ACM, 24-25.

[**14**] Igarashi, T., Matsuoka, S., and Tanaka, H. 1999. Teddy: a sketching interface for 3d freeform design. In *Proceedings of the *26*th annual conference on Computer graphics and interactive techniques, *ACM Press/Addison-Wesley Publishing Co., 409-416.

[**15**] Jiang, N., Tan, P., and Cheong, L. 2009. Symmetric architecture modeling with a single image. *ACM Transactions on Graphics *(*TOG*) 28, 5,113.

[**16**] Kaplan, M., and Cohen, E. 2006. Producing models from drawings of curved surfaces. In *EUROGRAPHICS workshop on sketch*-*based interfaces and modeling, *The Eurographics Association, 51-58.

[**17**] Kim, M., Park, E., and Lee, H. 1994. Modelling and animation of generalized cylinders with variable radius offset space curves. *The Journal of Visualization and Computer Animation *5, 4,189-207.

[**18**] Lalonde, J., Hoiem, D., Efros, A., Rother, C., Winn, J., and Criminisi, A. 2007. Photo clip art. In *ACM Transactions on Graphics *(*TOG*), vol. 26, ACM, 3.

[**19**] Lau, M., Saul, G., Mitani, J., and Igarashi, T. 2010. Modeling-in-context: User design of complementary objects with a single photo. In *Proceedings of the Seventh Sketch*-*Based Interfaces and Modeling Symposium, *Eurographics Association, 17-24.

[**20**] Lee, J. 2005. Modeling generalized cylinders using direction map representation. *Computer*-*Aided Design *37, 8,837-846.

[**21**] Li, Y., Wu, X., Chrysathou, Y., Sharf, A., Cohen-Or, D., and Mitra, N. 2011. Globfit: Consistently fitting primitives by discovering global relations. In *ACM Transactions on Graphics *(*TOG*), vol. 30, ACM, 52.

[**22**] Murugappan, S., Liu, H., Ramani, K., et al. 2012. Shape-it-up: Hand gesture based creative expression of 3d shapes using intelligent generalized cylinders. *Computer*-*Aided Design. *

[**23**] Oh, B., Chen, M., Dorsey, J., and Durand, F. 2001. Image-based modeling and photo editing. In *Proceedings of the *28*th annual conference on Computer graphics and interactive techniques, *ACM, 433-442.

[**24**] Olsen, L., Samavati, F., Sousa, M., and Jorge, J. 2009. Sketch-based modeling: A survey. *Computers *& *Graphics *33, 1, 85-103.

[**25**] Russell, B., and Torralba, A. 2009. Building a database of 3d scenes from user annotations. In *Computer Vision and Pattern Recognition, *2009. *CVPR *2009. *IEEE Conference on, *IEEE, 2711-2718.

[**26**] Seitz, S., Curless, B., Diebel, J., Scharstein, D., and Szeliski, R. 2006. A comparison and evaluation of multi-view stereo reconstruction algorithms. In *Computer Vision and Pattern Recognition, *2006 *IEEE Computer Society Conference on, *vol. 1, IEEE, 519-528.

[**27**] Shtuf, A., Agathos, A., Gingold, Y., Shamir, A., and Cohen-Or, D. 2013. Geosemantic snapping for sketch-based modeling. In *Eurographics. *

[**28**] Snavely, N. 2011. Scene reconstruction and visualization from internet photo collections: A survey. *IPSJ Transactions on Computer Vision and Applications *3, 0,44-66.

[**29**] Tsang, S., Balakrishnan, R., Singh, K., and Ranjan, A. 2004. A suggestive interface for image guided 3d sketching. In *Proceedings of the SIGCHI conference on Human factors in computing systems, *ACM, 591-598.

[**30**] Xu, K., Zheng, H., Zhang, H., Cohen-Or, D., Liu, L., and Xiong, Y. 2011. Photo-inspired model-driven 3d object modeling. In *ACM Transactions on Graphics *(*TOG*), vol. 30, ACM, 80.

[**31**] Xu, K., Zhang, H., Cohen-Or, D., and Chen, B. 2012. Fit and diverse: Set evolution for inspiring 3d shape galleries. *ACM Transactions on Graphics *(*TOG*) 31, 4,57.

[**32**] Xue, T., Liu, J., and Tang, X. 2010. Object cut: Complex 3d object reconstruction through line drawing separation. In *Computer Vision and Pattern Recognition *(*CVPR*), 2010 *IEEE Conference on, *IEEE, 1149-1156.

[**33**] Yoon, S., and Kim, M. 2006. Sweep-based freeform deformations. In *Computer Graphics Forum, *vol. 25, Wiley Online Library, 487-496.

[**34**] Zeleznik, R., Herndon, K., and Hughes, J. 2007. Sketch: an interface for sketching 3d scenes. In *ACM SIGGRAPH *2007 *courses, *ACM, 19.

[**35**] Zheng, Y., Fu, H., Cohen-Or, D., Au, O., and Tai, C. 2011. Component-wise controllers for structure-preserving shape manipulation. In *Computer Graphics Forum, *vol. 30, Wiley Online Library, 563-572.

[**36**] Zheng, Y., Chen, X., Cheng, M., Zhou, K., Hu, S., and Mitra, N. 2012. Interactive images: cuboid proxies for smart image manipulation. *ACM Transactions on Graphics *(*TOG*) 31, 4, 99.

[**37**] Zhou, S., Fu, H., Liu, L., Cohen-Or, D., and Han, X. 2010. Parametric reshaping of human bodies in images. *ACM Transactions on Graphics *(*TOG*) 29, 4, 126.

## Claims

1. A method of obtaining a three-dimensional digital model of an artificial object, made up of a plurality of geometric primitives, the artificial object being in a single two-dimensional photograph or drawing, the method comprising:

- defining a two-dimensional outline of said artificial object within the photograph;

- interactively allowing a user to define cross-sectional profiles of successive ones of said geometric primitives, said cross-sectional profiles defining a third dimension;

- interactively allowing a user to provide sweep input to sweep respective defined cross-sectional profiles over an extent of a corresponding one of said geometric primitives within the image, said sweeping generating successive three-dimensional model primitives from existing detected edges of said corresponding geometric primitives and said sweeping of said respective profile; and

- aligning said plurality of three-dimensional model primitives to form said three-dimensional model.

2. The method of claim 1, comprising interactively allowing said user to explicitly define three dimensions of the geometric primitive using three sweep motions, wherein a first two of said three sweeps define a first and second dimension of said cross-sectional profile and a third sweep defines a main axis of the geometric primitive.

3. The method of claim 1, comprising, upon the user sweeping the two-dimensional profile over a respective one of said geometric primitives, dynamically adjusting said two-dimensional profile using a pictorial context on the photograph and automatically snapping photograph lines to said profile.

4. The method of claim 3, wherein said snapping allows said three-dimensional model to include three-dimensional primitives that adhere to the object in the photographs, while maintaining global constraints between said plurality of three-dimensional model primitives composing said object.

5. The method of claim 4, further comprising optimizing said global constraints while taking into account said snapping and said sweep input.

6. The method of claim 4, further comprising a post snapping fit improvement of better fitting the primitive to the image, said better fitting comprising searching for transformations within ±10% of primitive size, that create a better fit of the primitive's projection to said profile.

7. The method of claim 1, wherein said defining said two dimensional outline comprises edge detecting.

8. The method of claim 1, further comprising estimating a field of view angle from which said photograph was taken in order to estimate and compensate for distortion of said primitives within said photograph.

9. The method of claim 1, further comprising using relationships between said primitives in order to define global constraints for said object.

10. The method of claim 9, further comprising obtaining geo-semantic relations between said primitives to define said three-dimensional digital model, and encoding said relations as part of said model.

11. The method of claim 1, further comprising inserting said three-dimensional digital model into a second photograph.

12. The method of claim 1, further comprising extracting a texture from said photograph and applying said texture to sides of said three-dimensional model not visible in said photograph.

13. The method of claim 1, wherein said defining said cross-sectional profiles comprises defining a shape and then distorting said shape to correspond to a three-dimensional orientation angle.

14. The method of claim 4, comprising applying different constraints to different parts respectively of a given one of said geometric primitives, or locally modifying different parts respectively of a given one of said geometric primitives.

15. The method of claim 2, comprising snapping said first two user sweep motions to said photograph lines, using the endpoints of said first two user sweep motions along with an anchor point on a respective primitive to create three-dimensional orthogonal system for a respective primitive.

16. The method of claim 1, further comprising supporting a constraint, said constraint being one member of the group consisting of: parallelism, orthogonality, collinear axis endpoints, overlapping axis endpoints, coplanar axis endpoints and coplanar axes, and for said member testing whether a pair of components is close to satisfying said member, and if said member is satisfied or close to satisfied then adding said constraint to a respective one of said primitives.

17. The method of claim 1, wherein said aligning said three dimensional primitives comprises finding an initial position for all primitives together by changing only their depth to adhere to geo-semantic constraints, followed by modifying shapes shape of the primitives.

18. A user interface for carrying out the method of claim 1, the user interface comprising an outline view of a current photograph on which view to carry out interactive sweeping to define cross sections of respective primitives and on which to snap said cross-sections.

19. The user interface of claim 18, further comprising a solid model view and a texture view respectively of said current photograph, and selectability for user selection between different basic cross-sectional shapes.

20. A method of digitally forming a three-dimensional geometric primitive from a two-dimensional geometric primitive from a two-dimensional photograph or drawing, comprising:

- interactively obtaining user input to draw a two-dimensional cross section of the primitive and then using further user input to sweep the cross-section over a length of the primitive.

21. A method of forming a derivation of a photograph or drawing, the photograph incorporating a two dimensional representation of a three-dimensional object, said three-dimensional object comprising geometric primitives, the two-dimensional representation being a rotation or other transformation of an original two-dimensional representation, the rotation being formed by:

- carrying out the method of claim 1 to form a three-dimensional model of said original two-dimensional representation;

- rotating or otherwise transforming said three-dimensional model; and

- projecting said rotated or otherwise transformed three-dimensional model onto a two-dimensional surface to form said derivation.

**Patent History**

**Publication number**: 20170330388

**Type:**Application

**Filed**: Jun 9, 2017

**Publication Date**: Nov 16, 2017

**Applicant**: Ramot at Tel-Aviv University Ltd. (Tel-Aviv)

**Inventors**: Daniel COHEN-OR (Hod-HaSharon), Ariel SHAMIR (Jerusalem), Tao CHEN (Shandong)

**Application Number**: 15/618,175

**Classifications**

**International Classification**: G06T 19/20 (20110101); G06T 17/10 (20060101);