METHOD AND A SYSTEM FOR DETERMINING THE GEOMETRY AND/OR THE LOCALIZATION OF AN OBJECT

Info

Publication number: 20140180629
Type: Application
Filed: Mar 14, 2013
Publication Date: Jun 26, 2014
Applicant: Ecole Polytechnique Federale de Lausanne EPFL (Lausanne)
Inventors: Ivan DOKMANIC (Lausanne), Reza Parhizkar (Ecublens), Andreas Walther (Crissier), Martin Vetterli (Grandvaux), Yue Lu (Arlington, MA)
Application Number: 13/828,761

Abstract

A method for determining the geometry and/or the localisation of an object comprising the steps of: sending one or more signals by using one transmitter; receiving by one or more receivers the transmitted signals and the echoes of the transmitted signals as reflected by one or more reflective surfaces building by a computing module a first Euclidean Distance Matrix (EDM) comprising the mutual positions of the receivers; adding to the EDM matrix a new row and a new column, the new row and a new column comprising time of arrivals of said echoes and computing its rank or distance to an EDM matrix determining the geometry and/or the position of the object based on said rank or distance.

Description

Description

RELATED APPLICATION

The present application claims the priority of the Swiss patent application CH2935/12 of Dec. 22, 2012, the content of which is hereby incorporated by reference.

FIELD OF THE INVENTION

The present invention concerns a method and a system for determining the geometry and/or the localisation of an object, e.g. of a wall, a room, a microphone, a loudspeaker or a person. The invention concerns in particular the estimation of the geometry of a room from its acoustic room impulse responses (RIR).

DESCRIPTION OF RELATED ART

The problem of estimating the geometry of a room from its acoustic room impulse responses (RIR) can be resumed by a question: can a person blindfolded inside a room hear the shape of the room after having snapped his fingers? In other words can the person reconstruct the 2-D or 3-D geometry of the room from the acoustic room impulse response (RIR)?

Beyond the question of uniqueness, meaning that the RIR is a unique signature of a room, the question of reconstructing the geometry from impulse responses is interesting algorithmically. That is, are there efficient ways to recover the room geometry from measured impulse responses?

Finally, establishing uniqueness would lead to localization inside a known (or unknown) room and algorithms for tracking the trajectory of a moving source listening to the varying RIRs. Key questions are: how many sources, how many receivers, for what room shapes?

Different known documents have tried to give some responses to the question above. Moreover recently, there has been a renewed interest in reconstructing the room shape from acoustic response, as shown by the increasing number of publications on the subject.

Some of these documents have used the image source model in order to cope with the signal reflections. This image source model, along with the first and second order echoes, are described in FIGS. 1 and 2.

FIG. 1 illustrates a room defined by the walls w1, w2 and by other walls not represented and comprising a source or transmitter s and a receiver r. The source can be for example and in a non limitative way a loudspeaker and the receiver a microphone. The walls are reflective surface, i.e. a surface allowing a signal to be reflected, the angle at which the signal is incident on this surface being equal to the angle at which it is reflected.

A first audio signal transmitted by the source s is reflected by the wall w2. The reflected signal or echo e1 is then received by the receiver r. Since there is a single reflection of the transmitted signal before its reception by the receiver r, the echo e1 is a first-order echo. A second audio signal transmitted by the source is reflected first by the wall w2 and after by the wall w2: the reflected signal or echo e2 is then received by the receiver r. Since there are two reflections of the transmitted signal before its reception by the receiver r, the echo e2 is a second-order echo.

The times of arrival (TOA) is defined as the travel time from a source s to a receiver r. The audio signals e1 and e2 can have different time of arrivals (TOAs).

FIG. 2 illustrates a system comprising a room defined by some walls (for sake of clarity only three walls are represented), a source or transmitter s and a receiver r. The points p_iand p_i+1are the end-points of the i_th-wall, n_iis its unit, outward pointing normal and {tilde over (s)}_iis an image source: in fact the signal e_ireceived by the receiver r could be considered as generated by the image or virtual source {tilde over (s)}_iwhich is the mirror image of the source s with respect to the wall defined by the points p_iand p_i+1. {tilde over (s)}_iis a first generation image source as the signal e_ireceived by the receiver r has been reflected once by the wall. In other words {tilde over (s)}_iis a first generation image source as e_iis a first-order echo.

{tilde over (s)}_ijis the image of {tilde over (s)}_iwith respect of the wall (i+1). It is then a second generation image source, generating a second-order echo.

The virtual sources {tilde over (s)}_ior {tilde over (s)}_ijare not real, tangible and concrete sources as the “real” source s. In other words they are abstract objects used for studying the signal reflections, according to the well known image-source theory, used e.g. in optics.

The use of the reflections of a signal for the determination of the position of the real source and/or of the shape of a room is known from US2011317522. However the described algorithm does not propose to find the source location immediately as there is a huge number of intermediated steps and hypothesis.

In U.S. Pat. No. 7,688,678 the volume of a room is determined by using the diffused field, i.e. without image sources.

In GEOMETRICALLY CONSTRAINED ROOM MODELING WITH COMPACT MICROPHONE ARRAYS, F. RIBEIRO, D. A. FLORENCIO, D. E. BA, AND C. ZHANG, IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. 20, NO. 5, PP. 1449-1460 2012, it is necessary to know in advance the mutual position of the microphones and of the loudspeaker. Moreover since many impulse responses have to be measured by putting a fake wall at different positions with respect to the microphone array and the loudspeaker, the resulting matrix of shifted impulse responses is also quite huge and then computing expensive.

In INFERENCE OF ROOM GEOMETRY FROM ACOUSTIC IMPULSE RESPONSES ANTONACCI, FILOS, THOMAS, HABETS, SARTI, NAYLOR, TUBARO TO APPEAR ON IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, 2012, only the 2D geometry of a room is estimated, i.e. there are not estimations of the floor and the ceiling. In this case the discretized Hough transform is used. Moreover the described algorithm requires that the source has to be placed in many different positions.

In F. Antonacci, A. Sarti, and S. Tubaro, “Geometric reconstruction of the environment from its response to multiple acoustic emissions” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Dallas, 2010, pp. 2822-2825, the authors propose to move a loudspeaker around a microphone to collect multiple impulse responses and then estimate the distance and the angle of the reflector (a line since they consider a 2-D case) using the tools of projective geometry. Each source-receiver pair defines an ellipse of possible reflection points, and the wall is estimated as the common tangents to all of the ellipses.

M. Kuster, D. de Vries, E. M. Hulsebos, and A. Gisolf, “Acoustic imaging in enclosed spaces: Analysis of room geometry modifications on the impulse response”, Journal of the Acoustical Society of America, vol. 116, no. 4, pp. 2126-2137, 2004, describes an approach based on acoustic imaging is proposed. An array comprising many microphones is used to sample the sound field and then employ wave field inversion to infer the room.

J. Fibs and E. A. P. Habets, “A two-step approach to blindly infer room geometries”, in Proceedings of the International Workshop on Acoustic Echo and Noise Control, 2010, propose to use projective geometry tools to infer the room geometry.

S. Tervo, “Localization and tracing of early acoustic reflections”, Ph.D. thesis, Aalto, University, School of Science, Department of Media Technology, 2012, describes a method using directive loudspeakers, and then scanning the room for reflectors. The proposed method requires multiple emissions for the room to be scanned completely.

Some inventors of the present invention have previously worked on a problem of estimating the room geometry from a single RIR in I. Dokmanic, Y. M. Lu, and M. Vetterli, “Can One Hear the Shape of a Room: The 2-D Polygonal Case”, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Prague, 2011, EPFL. The described algorithm is based on the complete knowledge of first and second generation echo Times of Arrivals (TOAs): however the second generation is often difficult to obtain for practical reasons (e.g. attenuation of the signal). Then the proposed algorithm is not applicable in practice since without using second-order echoes a single RIR does not suffice for reconstructing the shape of the room.

The known solutions are then not often applicable in practice. They are not exact, since some approximations are necessary (as for example in the Hough transform case). Some of them allow to reconstruct the 2D geometry of a room only, without considering ceiling and floor. They require also a huge number of receivers and/or transmitters.

It is an aim of the present invention to obviate or mitigate one or more of the aforementioned disadvantages.

BRIEF SUMMARY OF THE INVENTION

According to the invention, these aims are achieved by means of a method for determining the geometry and/or the localisation of an object according to claim 1, a system for determining the geometry and/or the localisation of an object according to claim 15, a computer program product determining the geometry and/or the localisation of an object according to claim 19.

The method according to the invention comprises the steps of

sending one or more signals by using one transmitter

receiving by one or more receivers the transmitted signals and the echoes of the transmitted signals as reflected by one or more reflective surfaces

building by a computing module a first Euclidean Distance Matrix (EDM) comprising the mutual positions of the receivers;

adding to the Euclidean Distance Matrix a new row and a new column, the new row and a new column comprising time of arrivals of said echoes and computing the rank of the modified matrix, or by computing how far the modified matrix is from a true Euclidean Distance Matrix;

determining the geometry and/or the position of the object based on the computed information.

The first EDM (Euclidean Distance Matrix) corresponds to the receivers setup, which is known. For example given some receivers r_i, the EDM matrix DεR^M×Mcomprising the following elements:

d_ij=∥r_i−r_j∥₂²1≦i,j≦M

where ∥•∥₂²is an Euclidean distance.

The EDM matrix is then a symmetric matrix with positive entries and a zero diagonal.

Advantageously the proposed method performs an echo labelling. In fact, in order to know which of the peaks in impulse responses received by the receivers (e.g. microphones) correspond to which reflective surface (e.g. wall of a room), instead of relying on different derived heuristics, intrinsic properties of point sets in Euclidean spaces are used. A particular property easily exploited is the rank property of EDM, which says that the EDM corresponding to a point set in Rⁿhas the rank at most n+2. In 2-D, its rank can thus be at most 4, and in 3-D at most 5.

The matrix D is augmented with a combination of M TOAs. This corresponds to adding a new row and a new column to D. If the augmented matrix D_aug, still verifies the rank property (or more generally, the EDM property), then the selected combination of echoes corresponds to an image source, or equivalently, to a reflective surface (e.g. a wall).

Even if this requires to test all the echoes combinations, in practical cases the number of combinations is quite small and does not represent a problem: e.g. with M=4, only 256 combinations have to be tested. Moreover there are not many correct combinations, but only one.

The number of combinations may even be smaller, by choosing to combine a particular echo received by one microphone only with those echoes from other microphones that were received within a temporal window corresponding to the size of the microphone setup.

The advantage of use the EDM approach is that it is exact, not approximate (like e.g. the discrete Hough transform). It is then a clear-cut criterion for good combinations of echoes.

It can be applied for many signals (acoustic signals, radio signals, UWB signals, etc.). It is very general and can be extended to multiple sources, multiple microphones very easily (as will be discussed here below, it can be applied to MIMO applications).

It requires only one source or transmitter. It can work with a small number of receivers, i.e. less than 5.

It can be used for determining a 3D geometry of a room.

In one preferred embodiment the method considers first-order echoes only; other echoes are not considered, and may be discarded. Therefore the method does not rely on a knowledge of second-order and further-order echoes, which are difficult to measure.

In one preferred embodiment the object is a convex room, the transmitter is a loudspeaker, each receiver is a microphone, the geometry is a 2D geometry and the number of receivers is 3. In other words the proposed method allows to determine the 2D geometry of a room by using one loudspeaker and only 3 microphones. So the proposed method uses a reduced number of receivers for accurately determining the room's 2D geometry.

The proposed method based on EDM can be extended to determine the 3D geometry of a room, using at least 5 receivers.

The method according to the invention can comprise the determination of the location of the transmitter by using least-squared distance trilateration. It can comprise multi-dimensional scaling. It can comprise applying a s-stress criterion.

The present invention concerns also a system for determining the geometry and/or the localisation of an object comprising

a transmitter for sending one or more signals;

one or more receivers for receiving the transmitted signals and the echoes of the transmitted signals as reflected by one or more reflective surfaces;

a first computing module for building a first Euclidean Distance Matrix (EDM) comprising the mutual positions of the receivers, and optionally computing its rank;

a second computing module for adding to the EDM a new row and a new column, the new row and a new column comprising time of arrivals of said echoes and computing its rank or its distance from the first EDM;

a third computing module for determining the geometry and/or the position of the object based on said rank or distance.

In one preferred embodiment the first, second and third modules are the same module.

The present invention concerns also a computer program product for determining the geometry and/or the localisation of an object, comprising:

a tangible computer usable medium including computer usable program code being used for

building a first Euclidean Distance Matrix (EDM) comprising the mutual positions of the receivers and optionally computing its rank;

adding to the EDM a new row and a new column, the new row and a new column comprising time of arrivals of echoes of the signals transmitted by a transmitter as reflected by one or more reflective surfaces and received by one or more receivers and computing its rank or its distance from the first EDM or set of EDMs;

determining the geometry and/or the position of the object by comparing the first rank based on the second rank and/or on said distance.

The present invention concerns also a computer data carrier storing presentation content created with the described method.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be better understood with the aid of the description of an embodiment given by way of example and illustrated by the figures, in which:

FIGS. 1 and 2 show a view of a room comprising a source or transmitter and a receiver.

FIG. 3 shows a view of a room comprising a source or transmitter and four receivers.

FIG. 4A to 4C show the RIR received from each receiver.

FIG. 5 illustrates some possible room's reconstructions due to incorrect echo labelling.

FIG. 6 illustrates the feasible region concept.

FIG. 7 illustrates an embodiment of a system according to the invention.

FIG. 8 illustrates an embodiment of a data processing system in which a method in accordance with an embodiment of the present invention, may be implemented.

DETAILED DESCRIPTION OF POSSIBLE EMBODIMENTS OF THE INVENTION

The present invention will be now described in more detail in connection with its embodiment for determining the geometry of a room. However the present invention finds applicability of connection with many other fields, as will be discussed. Moreover the two-dimensional case will be described first for the sake of simplicity and illustrations. The three dimensional case then follows easily.

The method according to the invention uses the image source model. The idea in the image source model is that if there is a sound source on one side of the wall, then the sound field on the same side can be represented as a superposition of the original sound field and the one generated by a mirror image of the source with respect to the wall.

FIG. 2 illustrates the setup and the image source model. For our purposes, a room is either a convex planar K-polygon or a K-faced convex polyhedron. With the i_thside of the room we associate an outward pointing unit normal n_i, and define the normal matrix as N=(n₁, . . . n_k). The reference {tilde over (s)}_idenotes the image of the source s with respect to the i side.

In the case of FIG. 2 the following relation is valid

{tilde over (s)}_i=s+2p_i+s,n_in_i (1)

In the remainder of this report we assume that the choice of units is such that the speed of sound is unity, c=1. Adjustments for the actual speed of sound are trivial.

By observing the impulse response and doing appropriate computations it is possible to access to the first-order echos, but also higher-order echoes.

FIG. 3 illustrates array of M microphones in a 2-D room (in this case then M=4), and a loudspeaker s at an arbitrary position in this room. The geometry enables the microphones to pick up the first-order echoes only.

The references r1 to r4 denotes the receivers along with their positions. Same considerations apply to the source.

In general r_mεR²and sεR².

The EDM matrix is the Euclidean Distance Matrix corresponding to the microphones setup, which is known. In the case of FIG. 3, the EDM matrix DεR^M×Mcomprising the following elements:

d_ij=∥r_i−r_j∥₂²1≦i,j≦M

where ∥•∥₂²is an Euclidean distance.

The EDM matrix is then a symmetric matrix with positive entries and a zero diagonal.

If the loudspeaker s fires a pulse, each microphone (it is assumed that all of them are in favourable positions so that they observe echoes for all the walls) will receive the direct sound and K first-order echoes. These echoes correspond to images of s across the K walls. The locations of image sources are valid points of the plane R².

If the distances between the image sources and the microphones are known, it is possible to reconstruct the locations of image sources and hence the 2-D room.

In order to know which of the peaks Pi (see FIG. 4A to 4C) in impulse responses RIR_ireceived by the microphones correspond to which wall (labeling problem) EDM is used. Instead of relying on different derived heuristics, intrinsic properties of point sets in Euclidean spaces are used. A particular property easily exploited is the rank property of EDM, which says that the EDM corresponding to a point set in Rⁿhas the rank at most n+2. In 2-D, its rank can thus be at most 4, and in 3-D at most 5.

The matrix D is augmented with a combination of M TOAs. This corresponds to adding a new row and a new column to D. If the augmented matrix D_augstill verifies the rank property (or more generally, the EDM property), then the selected combination of echoes corresponds to an image source, or equivalently, to a reflective surface (e.g. a wall).

FIG. 5 illustrated some possible room reconstructions due to incorrect echo labeling, of which only a single one (reference 10 in the Figure) satisfies the EDM criterion. The image source location is estimated using least-squared-distance trilateration.

Characterisation of Correct TOA Vectors

Denote by τ_mthe set of first-order echo TOAs received by the m-th microphone. The matrix D is now augmented by a vector t so that

$\begin{matrix} D_{aug} = (\begin{matrix} D & t \\ t^{T} & 0 \end{matrix}) & (2) \end{matrix}$

where the vector t is formed by taking one TOA from each microphone. In particular t=(t₁², . . . , t_M²)^T, with t_mετ_m. It is possible to state the following lemma:

Lemma 1.

If an M-tuple of echoes t={t₁, . . . , t_M} is such that rank D_aug<5, then (t^T,0)^Tε{(D,t)^T}. In particular, if M=4 and the microphones are not colinear or on a circle, we have that t^TD⁻¹t=0.

Proof.

First part is obvious. For the second part, let t be a vector that corresponds to the fourtuple such that rank (D_aug)<4. It is possible to write

$\begin{matrix} D_{aug} = (\begin{matrix} D & t \\ t^{T} & 0 \end{matrix}) & (3) \end{matrix}$

If the rank of this matrix is 4 or less, it is possible to represent the last column as a linear combination of the first four columns. This in turn means that ∃v such that

$\begin{matrix} (\begin{matrix} D \\ t^{T} \end{matrix}) v = (\begin{matrix} t \\ 0 \end{matrix}) & (4) \end{matrix}$

By components, one has

Dv=t

t^Tv=0. (5)

Under the assumptions of the second part, D is invertible, and combining the two equations yields the result.

Corollary 1.

Let

$Z = {t \in R^{M} : rank (\begin{matrix} 0 & t \\ t^{T} & 0 \end{matrix}) < 5} ⋐ R^{M} .$

Then dim Z<M, that is, μ(Z)=0, where μ is the Lebesgue measure in R^M.

Proof.

Immediate from Lemma 1.

This means that it is possible to test all possible M-tuples generated from the collected RIRs and know that that one that yield singular D_augcorrespond to image sources. With 4 walls and 4 microphones, we have 4⁴=256 combinations. This amounts to 256 SVDs of a 5×5 matrix, which can be computed very fast, so the combinatorial aspect is not an issue. After finding the corresponding rows it is possible to triangulate to find the actual locations of image sources, and from there find the walls.

The described procedure is summarized in Algorithm 1.

Algorithm 1 NOISELESS ROOM RECOVERY Input: Times of Arrival T₁, . . . , T_M Output: Room walls 1: for every {square root over (t)} ∈ T₁× . . . × T_Mdo 2:

Build the matrix D_{aug} = (\begin{matrix} D & t \\ t^{T} & 0 \end{matrix}),

3: if rank D_aug≦ 4, or equivalently, [t^T, 0]^T∈ ([D, t]^T) then 4: Triangulate the location of image source corresponding to t, 5: Compute the wall normal as the vector from the loudspeaker to the image source, 6: Compute the distance of the wall from the loudspeaker. 7: end if 8: end for 9: Reconstruct the convex room using the collected information.

Three-Dimensional Case

In the three-dimensional case, at least 5 microphones are needed to apply the EDM method (see below for a method that enables to use 4). Only slight adjustments are needed that reflect the change of the ambient dimension. In fact, it is possible to immediately apply the Algorithm 1, but instead of testing whether rank D_aug≦4, one have to test whether rank D_aug≦5.

Uniqueness

The goal is to show that the probability for the described algorithm to fail is 0. To this end, it is defined a set of “good” rooms in which the algorithm can be applied, and then prove two theorems about the uniqueness of the solution. Since the algorithms rely in the knowledge of the first-order TOAs, it is required that the microphones hear them. This defines a “good” room, which is in fact a combination of a room geometry and the microphone array/loudspeaker location.

Definition 1 (Feasibility).

Given a room R and a loudspeaker position s, the point xεR is feasible if a microphone placed at x receives all the first-order echoes of a pulse emitted from s. The interior of the set of all feasible points is called a feasible region.

FIG. 6 illustrates the concept of a feasible region. With this definition it is possible to state the first uniqueness result.

Theorem 1.

Assume we are given a room and a source location. Assume further that the room-loudspeaker combination generates a non-empty feasible region and that the microphones are placed uniformly at random in the feasible region. Then with probability 1, there is only one room corresponding to the collected RIRs and it can be retrieved by the Algorithm 1.

Sketch of proof. Fix any configuration of microphones (r₁, . . . , r_M) such that all r_mare in the feasible region. This microphone configuration includes an M-tuple of first-order TOAs, t₀(t₁, . . . , t_M)^T. Now since the feasible region is open, there is some ε=ε(r₁, . . . , r_M)>0 such that we can achieve any tεB_ε(t₀) by adjusting the microphone positions. To see this, one can observe that it is possible to adjust each t_mindependently of others by moving the corresponding microphone.

Since this is true of any t₀one might generate, it follows that the space of possible TOA combinations is the union of all such open balls, and thus M-dimensional. By Corollary 1, the dimension of the set of the M-tuples t that pass the EDM test is smaller than M. But μ(A)=0 if dim(A)<M, where μ is the Lebesgue measure in R^M. It is possible to note that the probability distribution introduced on ts is non singular since the mapping t is continuous and the Jacobian of the mapping is non-zero, so the claim follows. Alternatively by the same token it is possible to note that the measure of all Rs that give viable ts is zero, and directly conclude.

Remark:

A good way to think about this is that one can draw K^Msamples from the non-singular (continuous) probability distribution on the set of M-tuples t. By definition of the continuous probability distribution, the probability to draw a sample from a set with Lebesgue measure 0 must be 0 itself. It might appear surprising that even if the probability to nail the correct M-tuple is zero, one always has K correct ones. This is easy to explain by noting that the echoes corresponding to one single wall are not independent, but they are independent of the other echoes.

Theorem 2.

Assume you are given a fixed microphone array and a loudspeaker position. A room is generated at random in such a manner that the array is in the feasible region. Then with probability 1, there is only one room corresponding to the collected RIRs and it can be retrieved by the Algorithm 1.

The meaning of these theorems is essentially that in whatever room one runs the algorithm so that the microphones are in the feasible region, the solution is unique.

A Subspace Approach

The approach described in the previous section requires at least four microphones in the 2-D case, and five microphones in the 3-D case. Now it is described another approach that works with a minimal number of microphones (minimal in the sense that one cannot use less by exploiting only the first order TOA information).

It is possible to always choose the origin of the coordinate system so that one has

$\begin{matrix} \sum_{m = 1}^{M} r_{m} = 0 & (6) \end{matrix}$

with r_m=(r_m^x, r_m^y)^T. Let {tilde over (s)}_kbe the location vector of one image source (with respect to the wall k). Then, up to a possible permutation, one receives at each microphone the squared distance information,

$\begin{matrix} y_{k, m} \overset{def}{=} 〈 {\tilde{s}}_{k} - r_{m}, {\tilde{s}}_{k} - r_{m} 〉 = { {\tilde{s}}_{k} }^{2} - 2 〈 {\tilde{s}}_{k}, r_{m} 〉 + { r_{m} }^{2} . & (7) \end{matrix}$

Define further

${\tilde{y}}_{k, m} \overset{def}{=} - \frac{1}{2} (y_{k, m} - { r_{m} }^{2}) = 〈 r_{m}, {\tilde{s}}_{k} 〉 - \frac{1}{2} { {\tilde{s}}_{k} }^{2}$

We have in vector form

$\begin{matrix} (\begin{matrix} {\tilde{y}}_{k, 1} \\ {\tilde{y}}_{k, 2} \\ ⋮ \\ {\tilde{y}}_{k, M} \end{matrix}) = (\begin{matrix} r_{1}^{T} & - \frac{1}{2} \\ r_{2}^{T} & - \frac{1}{2} \\ ⋮ & ⋮ \\ r_{M}^{T} & - \frac{1}{2} \end{matrix}) (\begin{matrix} {\tilde{s}}_{k} \\ { {\tilde{s}}_{k} }^{2} \end{matrix}) & (8) \end{matrix}$

Demote by M the above matrix,

$\begin{matrix} M \overset{def}{=} (\begin{matrix} r_{1}^{T} & - \frac{1}{2} \\ r_{2}^{T} & - \frac{1}{2} \\ ⋮ & ⋮ \\ r_{M}^{T} & - \frac{1}{2} \end{matrix}) & (9) \end{matrix}$

and set

${\tilde{y}}_{k} \overset{def}{=} {({\tilde{y}}_{k, 1}, \dots, {\tilde{y}}_{k, M})}^{T}, {\tilde{u}}_{k} \overset{def}{=} {(s_{k}^{x}, s_{k}^{y}, { {\tilde{s}}_{k} }^{2})}^{T} .$

We write the above expression (8) co{tilde over (y)}_k=Mũ_k∃.

Thanks to the condition that

$\sum_{m = 1}^{M} r_{m} = 0,$

we have that

$\begin{matrix} 1^{T} {\tilde{y}}_{k} = - \frac{M}{2} { {\tilde{s}}_{k} }^{2} i . e . & (10) \\ { {\tilde{s}}_{k} }^{2} = - \frac{2}{M} \sum_{m = 1}^{M} {\tilde{y}}_{k, m} . & (11) \end{matrix}$

Furthermore,

{tilde over (s)}_k=A{tilde over (y)}_k, (12)

where A is a matrix such that

$\begin{matrix} AM = (\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \end{matrix}) . & (13) \end{matrix}$

These two conditions provide a complete characterisation of the distance information. In practice, it is sufficient to verify the linear constraint

{tilde over (y)}_kε(M), (14)

where (M) is a proper subspace when M≧4.

This approach enables to formulate equivalent theorems and algorithms to the ones for EDM formulation, with analogous argumentations. But more than that, it is possible to use the nonlinear condition (11) to solve the problem with only 3 microphones in the 2-D and 4 microphones in 3-D.

Theorem 3.

The minimal number of microphones required to hear the room given that they observe the first order echoes is 3 in 2-D and 4 in 3-D.

Proof.

Construct a family of counterexamples for M=2.

Practical Considerations—Working with Uncertainties

In practice one encounters several sources of error. The first error term comes from the uncertainty when measuring the inter-microphone distances, that is

d_ij=d_ij+e_ij, (15)

so that

D=D+E, (16)

where E is a symmetric, zero-diagonal error matrix.

This can be dealt with the calibration, but note that the schemes proposed in the following seem to be very stable with respect to uncertainties in array calibration.

The second source of error comes from the effects of the finite sampling rate and the finite precision of peak-picking algorithms. Some of this can be alleviated by using a high sampling rate, and better time-of-arrival estimation algorithms.

However it is better to use some kind of a distance measure between the measured/assembled D_augand some feasible D_aug. One possible approach would be to build a heuristic based on the singular values of D_aug. Such approach, however, would capture only the rank requirement on the matrix. But the requirement that D_augbe an EDM brings in many more subtle dependencies between its elements. For instance one has that

$\begin{matrix} (I - \frac{1}{n} 11^{T}) D_{aug} (I - \frac{1}{n} 11^{T})  0. & (17) \end{matrix}$

Furthermore (17) does not allow to specify the ambient dimension of the point set. Imposing this constraint leads to even more dependencies between the matrix elements, and the resulting space of matrices is no longer a cone (it is actually not anymore convex). Nevertheless, it is possible to use a family of algorithms known as multidimensional scaling (MDS) to find the closest EDM between the points in a fixed ambient dimension.

Multidimensional Scaling

As pointed out, in the presence of noise it is not favourable to use the rank test on D_aug. A very good way (as verified through simulations) to deal with this nuisance is to measure how close D_augis to a true EDM. In order to measure the distance, it is possible to use Multidimensional Scaling to construct a point set in a given dimension (either 2-D or 3-D) which produces the EDM “closest” to D_aug.

Multidimensional Scaling (MDS) was originally proposed in psychometrics as a method for data visualization. Many variations have been proposed to adapt the method for sensor localization.

Here it is used the s-stress criterion as proposed by Takane, Young and de Leeuw (1977). Given an observed noisy matrix {tilde over (D)}, the s-stress criterion is

$s (\tilde{D}) = minimize \sum_{i, j} {(d_{i, j}^{2} - {\tilde{d}}_{i, j}^{2})}^{2}$ $subject to D \in {}^{2} .$

We call s({tilde over (D)}) the score of matrix {tilde over (D)}. By EDM²we denote the set of EDMs with embedding dimension 2 (produced by point sets in 2-D). In the 3-D case, EDM²is replaced by EDM³.

From now on, it is assumed that the target space is R². The 3-D adaptation is immediate. If one associates to each point in R²a coordinate vector x_i=(x_i,y_i)^T, one has that d²_i,j=∥x_i−x_j∥₂²=(x_i−x_j)²+(y_i−y_j)².

Thus, the s-stress criterion can be rephrased as

$\begin{matrix} s (\tilde{D}) = \underset{x_{i}, y_{i} \in ℝ}{minimize} \sum_{i, j} {[{(x_{i} - x_{j})}^{2} + {(y_{i} - y_{i})}^{2} - {\tilde{d}}_{i, j}^{2}]}^{2} & (18) \end{matrix}$

The objective function in (18) is not convex. However, it has been shown to have less local minima compared to other MDS criteria. Furthermore, it yields a meaningful definition of the distance of a matrix from an optimal EDM.

In order to further skip the local minima of (18), it is possible to use coordinate alternation for finding the optimal EDM: it is possible to compute (18), by first minimizing over x_iand then over y_i. Although this approach is suboptimal compared to simultaneous minimization with respect to x_i, it leads to simpler computations.

Assuming that x_ihas to be updated by Δx_ito give the minimum of s({tilde over (D)}), one will have

$\begin{matrix} {s (\tilde{D})}_{i}^{(k + 1)} = \overset{n}{\sum_{j = 1}} {[{(x_{i}^{(k)} + Δ x_{i}^{(k + 1)} - x_{j}^{(k)})}^{2} + {(y_{i}^{(k)} - y_{j}^{(k)})}^{2} - {\tilde{d}}_{i, j}^{2}]}^{2}, & (19) \end{matrix}$

where (•)^(k)returns the value at iteration k. Taking the derivative of s({tilde over (D)})_i^(k+1)with respect to Δx_i^(k+1), one will have

$\begin{matrix} \frac{\partial {s (\tilde{D})}_{i}^{(k + 1)}}{\partial Δ x_{i}^{(k + 1)}} = 4 {n (Δ x_{i}^{(k + 1)})}^{3} + 3 \sum_{j = 1}^{n} (x_{i}^{(k)} - x_{j}^{(k)}) {(Δ x_{i}^{(k + 1)})}^{2} + \sum_{j = 1}^{n} [3 {(x_{i}^{(k)} - x_{j}^{(k)})}^{2} + {(y_{i}^{(k)} - y_{j}^{(k)})}^{2} - {\tilde{d}}_{i, j}^{2}] Δ x_{i}^{(k + 1)} + \sum_{j = 1}^{n} [{(x_{i}^{(k)} - x_{j}^{(k)})}^{3} + (x_{i}^{(k)} - x_{j}^{(k)}) {(y_{i}^{(k)} - y_{j}^{(k)})}^{2} - (x_{i}^{(k)} - x_{j}^{(k)}) {\tilde{d}}_{i, j}^{2}] . & (20) \end{matrix}$

Setting (20) to zero yields at most real solutions, and comparing the value of s({tilde over (D)})_i^(k+1)for the results gives the optimal value for Δx_i^(k+1).

The complete optimization procedure is summarized in Algorithm 2.

Algorithm 2 COORDINATE ALTERNATION FOR S-STRESS OPTIMIZATION Input: Symmetric and zero-diagonal matrix {tilde over (D)} Output: Estimate positions: x and s({tilde over (D)}) 1: Assume an initial configuration for the points x⁰ 2: repeat 3: for i = 1 to n do 4: Assume the configuration of the points different than i fixed, 5: Update x_iusing the i^throw of {tilde over (D)}, 6: Update y_iusing the i^throw of {tilde over (D)}, 7: end for 8: until convergence or maximum number of iterations is reached.

FIG. 8 is an embodiment of a data processing system 300 in which an embodiment of a method of the present invention may be implemented. The data processing system 300 of FIG. 8 may be located and/or otherwise operate at any node of a computer network, that may exemplarily comprise clients, servers, etc., and it is not illustrated in the Figure. In the embodiment illustrated in FIG. 8, data processing system 300 includes communications fabric 302, which provides communications between processor unit 304, memory 306, persistent storage 308, communications unit 310, input/output (I/O) unit 312, and display 314.

Processor unit 304 serves to execute instructions for software that may be loaded into memory 306. Processor unit 304 may be a set of one or more processors or may be a multi-processor core, depending on the particular implementation. Further, processor unit 304 may be implemented using one or more heterogeneous processor systems in which a main processor is present with secondary processors on a single chip. As another illustrative example, the processor unit 304 may be a symmetric multi-processor system containing multiple processors of the same type.

In some embodiments, the memory 306 shown in FIG. 8 may be a random access memory or any other suitable volatile or non-volatile storage device. The persistent storage 308 may take various forms depending on the particular implementation. For example, the persistent storage 308 may contain one or more components or devices. The persistent storage 308 may be a hard drive, a flash memory, a rewritable optical disk, a rewritable magnetic tape, or some combination of the above. The media used by the persistent storage 308 also may be removable such as, but not limited to, a removable hard drive.

The communications unit 310 shown in FIG. 8 provides for communications with other data processing systems or devices. In these examples, communications unit 310 is a network interface card. Modems, cable modem and Ethernet cards are just a few of the currently available types of network interface adapters. Communications unit 310 may provide communications through the use of either or both physical and wireless communications links.

The input/output unit 312 shown in FIG. 8 enables input and output of data with other devices that may be connected to data processing system 300. In some embodiments, input/output unit 312 may provide a connection for user input through a keyboard and mouse. Further, input/output unit 312 may send output to a printer. Display 314 provides a mechanism to display information to a user.

Instructions for the operating system and applications or programs are located on the persistent storage 308. These instructions may be loaded into the memory 306 for execution by processor unit 304. The processes of the different embodiments may be performed by processor unit 304 using computer implemented instructions, which may be located in a memory, such as memory 306. These instructions are referred to as program code, computer usable program code, or computer readable program code that may be read and executed by a processor in processor unit 304. The program code in the different embodiments may be embodied on different physical or tangible computer readable media, such as memory 306 or persistent storage 308.

Program code 316 is located in a functional form on the computer readable media 318 that is selectively removable and may be loaded onto or transferred to data processing system 300 for execution by processor unit 304. Program code 316 and computer readable media 318 form a computer program product 320 in these examples. In one example, the computer readable media 318 may be in a tangible form, such as, for example, an optical or magnetic disc that is inserted or placed into a drive or other device that is part of persistent storage 308 for transfer onto a storage device, such as a hard drive that is part of persistent storage 308. In a tangible form, the computer readable media 318 also may take the form of a persistent storage, such as a hard drive, a thumb drive, or a flash memory that is connected to data processing system 300. The tangible form of computer readable media 318 is also referred to as computer recordable storage media. In some instances, computer readable media 318 may not be removable.

Alternatively, the program code 316 may be transferred to data processing system 300 from computer readable media 318 through a communications link to communications unit 310 and/or through a connection to input/output unit 312. The communications link and/or the connection may be physical or wireless in the illustrative examples. The computer readable media also may take the form of non-tangible media, such as communications links or wireless transmissions containing the program code.

The different components illustrated for data processing system 300 are not meant to provide architectural limitations to the manner in which different embodiments may be implemented. The different illustrative embodiments may be implemented in a data processing system including components in addition to or in place of those illustrated for data processing system 300. Other components shown in FIG. 8 can be varied from the illustrative examples shown. For example, a storage device in data processing system 300 is any hardware apparatus that may store data. Memory 306, persistent storage 308, and computer readable media 318 are examples of storage devices in a tangible form.

Therefore, as explained at least in connection with FIG. 8 the present invention is as well directed to a system for determining the geometry and/or the localisation of an element, a computer program product for determining the geometry and/or the localisation of an element and a computer data carrier.

In accordance with a further embodiment of the present invention is provided for a computer data carrier storing presentation content created while employing the methods of the present invention.

Although the present invention has been described in more detail in connection with its embodiment for determining the geometry of a room, the present invention finds applicability of connection with many other fields.

The present invention can be used for determining the exact position of a receiver r, which is a person in the FIG. 7. In the case a satellite, e.g. a GPS satellite is the source s of a radio signal which can be reflected by some buildings B1, B2. If the echo e1 is not used, the localisation of a mobile device r of a person can be computed incorrectly (the mobile device r will be considered located in correspondence of {tilde over (r)}).

Knowing the position of the satellite s, the position of the buildings B1, B2, etc. (this is possible e.g. by using an electronic map) and applying the method according to the invention, it is possible to accurately locate the mobile device r and then the person, without any error.

An application of the method lies in neurology. Neural activity is measured by electrodes introduced into the human or animal brain. These electrodes pick up signals coming from multiple neurons. Neural spike sorting aims at identifying spikes coming from a single neuron: such identification is a labeling o clustering problem. For finding its solution, the method according to the invention can be applied. Clustering is done based on the spike shape and the relative spike amplitudes at different electrodes.

Since the human or animal tissue is homogeneous and the electric signals are observed through the line-of-sight propagation, the relative spike amplitudes depend on the distance between the electrodes and the neurons. The exact amplitude pattern depends on the electrode array geometry and on the mutual position of the electrode array and a given neuron.

In the noiseless case, knowing the characteristics of the propagation in the human or animal tissue, and having a sufficient number of electrodes would uniquely identify the location of each given neuron.

In the noisy case, the method according to the invention allows to find the likely location of each neuron, by finding the closest EDM.

The method of the invention can also be used in audio-forensics. For example, a person moving in a room while talking on a phone might enable us to learn the shape of that room based on the audio signal transmitted over the phone channel.

The method according to the invention can also be applied in CDMA, or in general in MIMO communications. A possible application is the accurate channel estimation. In multipath propagation (for example indoor channels), the receiving antenna pick up the direct signal, and a number of echoes or reflections. These reflections, as discussed, can be modeled by image sources. It is possible then to estimate the EDM corresponding to multiple emitting and receiving antennas, and then include image sources. It is then possible to estimate the locations of these image sources, and then find the “perfect” locations of the corresponding path components in impulse responses.

Furthermore, if the geometry or the position of the antenna arrays changes, it is likely that the major reflections will still be coming from the same reflectors. It is then possible to efficiently re-estimate the channel by only learning the new geometry of the antenna array.

Advantageously the method according to the invention can be used for boost the signal power, as already attempted by the “RAKE” receivers. However, such receivers try to decide where the individual channel taps are from the estimated impulse responses. On the contrary with the method according to the invention after estimating the shape of the room, it is possible to have a perfect knowledge of the image source locations and this could be used for correctly combining the reflected signals in order to boost the power.

The method according to the invention can be applied to ToF (Time of Fly) camera, where a single light pulse illuminates the scene, and then the scene depth is computed based on the travel time of light. On the camera side there is a pixel array where pixels are time-resolving sensors (or there is a shutter that has the role of time resolving). The method according to the invention can allow to substantially reduce the number of pixels needed by approximating the scene with a number of planar reflectors and finding the image source corresponding the each planar reflector by using the EDM.

Another possible application of the method according to the invention is the indoor sound source localization, usually considered difficult since the reflections are difficult to predict and they masquerade as sources.

Another set of applications is in teleconferencing and auralization where one would, perhaps for different reasons, like to compensate the room influence or create an illusion that the sound is played in a specific room. This largely consists in compensating the early reflections, which in turn requires the knowledge of the reflector locations. The listed techniques work because knowing the boundary conditions allow to compute the RIR for an arbitrary source-receiver geometry inside the room.

A different field of application is in wave field synthesis: knowing the locations of early reflections might enable to develop more specific indoor sampling theorems.

Claims

1. A method for determining the geometry and/or the position of an object, comprising the steps of

sending one or more signals with one transmitter;

receiving by one or more receivers the transmitted signals and echoes of the transmitted signals reflected by one or more reflective surfaces;

building with a computing module a first Euclidean distance matrix corresponding to mutual positions of the receivers;

adding to said matrix a new row and a new column, the new row and a new column corresponding to the time of arrivals of at least some of said echoes, and computing the rank of the modified matrix, or computing the distance between the modified matrix from a true Euclidean Distance Matrix;

determining the geometry and/or the position of the object based on the computed information.

2. The method of claim 1, wherein only first order echoes are considered.

3. The method of claim 1, wherein only echoes received during a predetermined time window are considered.

4. The method of claim 1, said object being a convex room, said transmitter being a loudspeaker, each receiver being a microphone, said geometry being a 2D geometry, the number of receivers being 3.

5. The method of claim 1, said object being a convex room, said transmitter being a loudspeaker, each receiver being a microphone, said geometry being a 3D geometry, the number of receivers being higher than 4.

6. The method of claim 1, said object being a receiver, said transmitter being a satellite, said receiver being a mobile device.

7. The method of claim 1, comprising determining the geometry and/or the localisation of an object comprising the step of labelling echos.

8. The method of claim 1, comprising determining which of the peaks of the impulse response received by each receiver correspond to which reflective surface.

9. The method of claim 1, comprising verifying if the augmented matrix still verify the rank property according which a EDM in Rn has a rank at most n+2, n being an integer and positive number.

10. The method of claim 9, comprising testing at least some echoes combination and selecting the combination for which the rank property is satisfied.

11. The method of claim 1, comprising augmenting said EDM matrix by a vector t formed by the TOA from the transmitter to each receiver.

12. The method of claim 1, comprising determining the location of the transmitter by using least-squared distance trilateration.

13. The method of claim 1, comprising multi-dimensional scaling.

14. The method of claim 13, comprising applying a s-stress criterion.

15. A system for determining the geometry and/or the localisation of an object, comprising:

a transmitter for sending one or more signals;

one or more receivers for receiving the transmitted signals and the echoes of the transmitted signals as reflected by one or more reflective surfaces;

a first computing module for building a first Euclidean Distance Matrix (EDM) corresponding to mutual positions of the receivers;

a second computing module for adding to the EDM a new row and a new column, the new row and a new column comprising time of arrivals of said echoes and computing its second rank or its distance from the first EDM;

a third computing module for determining the geometry and/or the position of the object based on said second rank or distance.

16. The system of claim 15, the first, second and third modules being the same module.

17. The system of claim 15, the transmitter being a loudspeaker, the receiver being a microphone, the object being a room comprising said loudspeaker and said microphone.

18. The system of claim 15, the transmitter being a satellite, the receiver a mobile device, the object being said mobile device.

19. A computer program product, comprising:

a tangible computer usable medium including computer usable program code for determining the geometry and/or the localisation of an object, the computer usable program code being used for building a first Euclidean Distance Matrix (EDM) comprising the mutual positions of the receivers; adding to the EDM a new row and a new column, the new row and a new column comprising time of arrivals of echoes of the signals transmitted by a transmitter as reflected by one or more reflective surfaces and received by one or more receivers and computing its rank or its distance to the first EDM; determining the geometry and/or the position of the object based on said rank or distance.