Description
You are going to track a smart phone through an employee restaurant using the phone’s wifi signals. The restaurant is at the KPMG head quarters at Amstelveen and the data that you will use are actual wifi signal data from one particular smart phone. The phone’s owner has given consent on using his data.
To do wifi tracking the KPMG Big Data team has built a system that measures wifi signals from wifi enabled devices and has developed algorithms that reconstruct the position of the devices from the measured signals. In this exercise you yourself will also develop such an algorithm. The system consists of some eleven normal wifi routers that have been reprogrammed to function as wifi sensors. This basically means that the routers no longer pass through information to and from the internet, but only listen to and record all wifi tra c around them.
0.1 Wifi signals
Wifi communication is done in wifi packets whose format follows the 802.11 protocol. For tracking purposes the packet content (e.g. an http request) is not so interesting, but rather the packet header is, which contains among others
• The wifi MAC address of the device the packet is sent from. The MACaddress is generally hashed and/or replaced with a pseudonym in the tracking system out of privacy concerns.
• The type and subtype of the packet indicating the packet’s purpose. Apacket can convey data/content like an http request, but can also be a management packet between phone and network conveying information on how to keep a stable connection. For tracking purposes the type and subtype are not so interesting, apart from the fact that they are needed to uniquely identify a packet.
• The sequence number of the packet.subtype is sent with a sequence number, which is thereafter incrementedEvery packet of every type and before a next packet of the type and subtype is sent. After running through 4096 values (12 bits) the sequence number loops back on itself.
There are other header items that can be important such as the retry flag, the fragment number and destination MAC address, but for convenience the subtleties of these have been taken out of the data set. When the router registers a wifi packet the header information is supplemented with among others
• A time stamp when the packet is registered, in Unix time with millisecond precision.
• The signal strength of the packet as seen by the router, in decibels with respect to 1 mWatt (dBm).
• The name of the router that registered the wifi packet.
0.2 Signal strength and distance
The basic idea behind determining the location of a wifi device is that the signal strength of a wifi packet is larger for routers that are closer to the device than for routers that are farther away. Thus the position of the device is somehow set in the di↵erent signal strengths of the same wifi packet at di↵erent routers. The relation between signal strength at the router and the distance between device and router is given by the Friis free space equation.
(1)
a. Make a plot of Pr as a function of r when Pt = 0 dBm, for r ranging from 0.4 m to 30 m . For which is a router more sensitive when it comes to the distance to a device, devices that are close or devices that are far away, and why?
b. Invert the Friis equation 1 to give the distance r as a function of Pr.
c. When Pt = 0 dBm, what is the di↵erence in distance between a signal strength of 30 dBm and of 31 dBm, and between a signal strength of 60 dBm and 61 dBm?
d. For which case does an uncertainty on the measured signal strength trans-late to a larger uncertainty on the distance, for a larger signal strength of e.g. 30 dBm or for a smaller signal strength of e.g. 60 dBm, and why?
0.3 Position reconstruction
e. Explain and draw an example of why there will not be an exact intersectionpoint?
If the wifi device were at a position [x,y] and transmits with a transmission power Pt, then the expected signal strength for router i at position [xi,yi] is
Where Z is the di↵erence in height between device and router. Since phones are mostly kept at pocket height we don’t keep the z-coordinate as a variable but fix Z at a value of Z = 2 m (the routers are placed approximately 3 m up and a pants pocket is approximately 1 m up).
f. Derive the above equation from the Friis equation 1.
Of course we don’t know the position [x,y] of the device, estimating it is the whole purpose here.
The normalized residual expresses how close the measured and expected signal strength are to each other if we assume the device to be at position [x,y]. It expresses it in how many times the uncertainty it is o↵, how many standard deviations from zero. The larger the normalized residual is, the more likely it is that the di↵erence between expectation and measurement is the result of wrong parameters than of noise. In this case the more likely it is that the di↵erence Si Pri(x,y) between expected and measured signal strength is the result of the estimated position [x,y] being wrong than from measurement errors.
g. Make a small simulation, where you have one router at coordinate [0,0], a mobile device at coordinate [20,0] which transmits with power Pt = 0 dBm. The router measures wifi packets from the mobile device according to the Friis equation 1, but the signal strength at the router obtains an additional random Gaussian fluctuations of 1 dBm around the Friis equation value. Generate 1000 such packets (incl. fluctuations). For each packet calculate the expected signal strength (this is just the Friis equation value) and calculate the normalized residual when you estimate the measurement uncertainties (correctly) at = 1. Plot the distribution of the normalized residuals. This distribution is called a pull distribution. What is the mean and the standard deviation of the pull distribution?
h. Now generate another 1000 packets but with 2 dBm random Gaussian fluctuations. However, in determining the normalized residuals keep = 1 dBm in the denominator. This is the case when in truth the noise has an amplitude of 2 dBm while you underestimate it to have an amplitude of 1 dBm. What happens to the pull distribution if you underestimate the noise/fluctuations/measurement uncertainties? And what will happen if you overestimate?
If we square the normalized residual and sum them up for all routers
, (2)
then we obtain a Euclidean distance measure that expresses how close the measured signal strengths are to the expected signal strengths if the device is assumed to be at position [x,y]. Such a distance measure of a sum of squared normalized residuals is a called a chi-squared. And we say that the position [x,y] which has the smallest chi-squared is closest to being the intersection point and most likely the device’s true position.
0.4 Toy Monte Carlo
Before we work on actual data we are going to build a simulation. We often call the type of simulation you will build a Toy Monte Carlo simulation. In such a simulation you build a simplified (toy) world to understand how di↵erent factors can influence your real complex system. Often you start by first building a world that agrees exactly with your model and then adding disruptions to see what happens.
Suppose we have four routers at positions [0,0], [0,20], [20,20] and [20,0], and they are all 3 m up high. Suppose furthermore that we have a wifi enabled device at position [5,5], at pocket height (1 m up high), that sends out wifi packets with transmission power Pt = 0 dBm
k. Generate a single wifi packet, determine the signal strength it producesat each of the four routers (without noise). Then make a plot of the 2 as a function of assumed x-position for x between 5 m and 15 m and y = 5 m. For the 2 calculation let your estimated measurement uncertainty be i = 1 dBm for each of the four routers. The 2 should be minimum at the true position [5,5].
l. Again generate a single wifi packet, and determine the signal strength itproduces at each of the four router, but this time add random Gaussian fluctuations of 1 dBm to the measured signal strength at each of the four routers independently. Again make a plot of the 2 as a function of assumed x-position for x between 5 m and 15 m and y = 5 m. Does the minimum occur at the true position [5,5]? And what happens to the minimum if you repeat this procedure several times (with di↵erent independent random fluctuations)?
m. The 2 has a minimum at some point [x,y]. Use a minimization procedure (for python you can use scipy.optimize.minimize) to find the [x,y] for which the 2 is minimum.
n. Generate 1000 wifi packets (each one with random Gaussian fluctuationsat the routers). For each one find the [x,y] position that minimizes the 2 and the 2 at this minimum. Plot the positions in a scatter plot. Plot the 2 minimums in a histogram. What is the average x- and y-position? What is the average 2 of the minimums?
In an experiment or a simulation as you just did, you expect on average to find a 2 equal to the number of degrees of freedom. The number of degrees of freedom, or NDoF, is the number of data points that you have minus the number of parameters that you are estimating. In this case you have four data points, the four signal strengths measured at each of the four drones. And you have two parameters that you are estimating, the x- and y-position of the wifi device. Thus in this case you have two degrees of freedom. The idea behind the NDoF is that not all your data points are ’free’, but a number of them ’are needed’ to fix your parameters. The rest of them are then free to deviate from what you expect and crank up your 2.
In fact you expect your minimums to be distributed according to a chisquared distribution. Go online and find a text on chi-squared distributions, e.g. Wikipedia, just to have a notion on what the distribution looks like for di↵erent numbers of degrees of freedom.
o. Does the average 2 of the minimums that you found in the previous item agree with the NDoF?
q. Again generate 1000 wifi packets but now with random Gaussian fluctu-ations of 2 dBm. For each one find the [x,y] position that minimizes the 2 and the 2 at this minimum, but still assuming a measurement uncertainty of i = 1 dBm at each of the routers. Plot the positions in a scatter plot. Plot the 2 minimums in a histogram (normalized if you need). Plot the chi-squared distribution with two degrees of freedom on top of the histograms of minimums. What is the average x- and yposition? What happened to the cloud of estimated device positions, and why? What is the average 2 of the minimums? Does it agree with what you expect? Does your histogram agree with what you expect from a chi-squared distribution? What happened and why?
In the last exercise you simulated the important case where you as an analyst underestimate the measurement uncertainty. In reality you had random fluctuations of 2 dBm, while you estimated them to be 1 dBm, and you saw this in your average 2 value (and in the distribution).
r. Suppose you repeatedly do a chi-squared fit where you have 10 degreesof freedom, and you find your average minimum chi-squared to be 40.0. What does this tell you about your estimate of the measurement/data uncertainties?
0.5 Error on fit parameters
There is debate on how to properly calculate the uncertainty on fit parameters (in this case the device position), but in case you just want a fairly rough estimate of how far from the true position your reconstructed position can be o↵, it can be done fairly easily by determining the covariance matrix of the parameters.
When you do a chi-squared fit you find the parameters (here the device xand y-position) which minimize the chi-squared. Call this their optimal values. If you then change your parameters slightly from their optimal values, then of course your chi-squared will increase. Now the variation in your fit parameters from their optimal value that raises the chi-squared by 1 is then a measure for the uncertainty on your fit parameters.
You can do this search for the variations programmatic by ’scanning’ the chi-squared value around the minimum, but when you have a lot of fit parameters this becomes quite involved. In stead, you can get an approximation by linearizing your model around the optimal values and calculate the necessary variations analytically.
A linearization of our model around the optimal position, call it [x0,y0], is given by
This is a Taylor expansion of our Friis equation to first order around the optimal value [x0,y0].
s. Fill this approximation of our model in the chi-squared equation 2 to obtain an approximation of the chi2 in the neighborhood of its minimal value [x0,y0] (you don’t have to work out yet, just leave them as symbols). The result should be a quadratic equation in (x x0) and (y y0).
From chi-squared minimization we know that the optimal position [x0,y0] is actually the one that solves the two equations
= 0 (4)
= 0 (5) t. Use the two equation 4 and 5 in your previous approximation and derive an equation for the 2 in terms of only constant and quadratic terms in (x x0)2, (y y0)2 and (x x0)(y y0), no linear terms in (x x0) or (y y0).
u. For convenience now call (x x0) := x and (y y0) := y. You can write your approximation as a matrix equation 2( x, y) ⇡ A+[ x, y] [ x, y]. What is the constant A and what are the matrix elements of
(symbolically)?
Now the diagonal elements of the matrix that you just built are directly related to the uncertainties x and y on the device position, they are in fact the inverse of their variance.
v. How large does x have to be to raise the 2 by 1 if you keep y = 0. This is the variance on the x-position estimate. And equally how large does y have to be to raise the 2 by 1 if x = 0. This is the variance on the y-position estimate.
The o↵-diagonal elements of the matrix are directly related to the correlation between the two coordinates of the device position, they are the inverse of their covariance.
w. Generate again a single wifi packet with random Gaussian fluctuations of1 dBm at each of the routers. Use the minimization procedure to estimate the device’s location assuming a (correct) measurement uncertainty of 1 dBm. Now also calculate the two diagonal elements of the matrix Bˆ that you previously derived (di↵erentiate the Friis equation 1 with respect to x and y on paper and program the result in a calculation of the elements of Bˆ). Use these two diagonal elements to determine the variance of the x- and y-position of the device, and then the standard deviations (square root of the variance). Plot the estimated position of the device and draw around it an ellipse whose width is equal to the standard deviation in x and whose height is equal to the standard deviation in y. The ellipse represents your uncertainty on the estimated device position. Also plot the true location of the device for comparison.
x. You can repeat the above procedure for a few wifi packets if you like.
y. Generate 1000 wifi packets with random Gaussian fluctuations of 1 dBm at each of the routers. For each of the wifi packets estimate the device’s location and uncertainty on the location, assuming a (correct) measurement error of 1 dBm.Make pull distributions of the estimated x- and y-positions. Are the pull distributions centered where you expect, and are they as wide as you expect? Could you think of reason(s) why the pull distributions would be (slightly) di↵erent from what you expect?
z. What would happen to your estimated errors on the x and y positions if
you underestimate your measurement error by a factor 2?
Finally a bonus question
In reality the error ellipses are not horizontal or vertical as you just plotted them, this is just an approximation (on top of one anyway). In reality they are generally diagonally oriented and could even be skewed (the two axes of the ellipse not perpendicular). Derive the proper form the ellipses from the matrix Bˆ that we derived previously. Why is the ellipse never skewed?
0.6 Wifi tracking
In the next section you will use the foregoing to actually track a smart phone through a restaurant. The restaurant is the employee restaurant at the KPMG head quarters at Amstelveen, and the phone’s owner has given consent on using his data.
You are given a csv file containing the measurements of the wifi signals that were sent out by the phone. Every line in the file is a wifi packet registered
column name description unit
sourceMac The hashed MAC address the wifi packet was sent from
typeNr Number indicating the type of the packet 0,1, or 2
subTypeNr Number indicating the sub type of the packet 0,…,11
seqNr Number indicating which in a sequence of sent packets of this type and sub type this is 0,…,4096
retryFlag Flag indicating whether this packet has been sent before or not 0,1
measurement Timestamp when the packet was UTC
Timestamp registered in ms precision
droneId Name of the router that registered the packet
signal Signal strength of the packet as measured by the router dBm
Table 1: Description of the columns in the csv data file.
by one of the routers. A description of the columns contained in the csv file is given in table 1. Further information about how wifi packets are sent was given at the start of this document.
router name (droneId) location x,y
Lima 5.82, 5.48
Mike 11.33, 9.43
Kilo 12.39, 6.77
Oscar 2.48, 7.36
Alpha 8.53, 2.16
India 2.18, 5.61
Hotel 5.43, 4.71
Romeo [10.99, 5.94]
Quebec [6.82, 9.78]
Papa [9.9, 10.39]
Table 2: Location of the routers.
0.6.1 Your goal
In the previous exercise you have been taken to the process of setting up a chi-squared fit step by step. In this exercise you will use these steps and their implications to track a smart phone from real data. Your goal it to perform all of the following:
a. Identify the various wifi packets that were sent.
b. Estimate the device’s position for each one of the wifi packets.
c. Estimate the uncertainty on the device’s position for each one of the wifipackets.
d. Plot the device’s positions and uncertainties together with the router lo-cations.
e. Estimate our system’s resolution, i.e. how well can we determine a device’slocation.
f. Estimate the average transmission power (in dBm) of the device.
g. Some wifi packets were sent so shortly after each other that the devicecould not have moved very far if at all. Compose another chi-squared method to combine the locations of wifi packets that are closely spaced in time into one location.
h. Plot the device’s positions and uncertainties from the combination ofclosely spaced packets together with the router locations.
Document your analysis well, e.g. with inline explanations or additional outside documentation. Make sure your reader can understand what you did, imagine for example that your analysis is a publication going to be peer reviewed. Plots (histograms, scatter plots) help you understand the data, and help us understand how you did this exercise.
0.6.2 Some hints and help
The transmission power Pt is not known, so it is a parameter of the chi-squared method.
We actually do not know what the uncertainties on the measured signal strengths are at the routers. They are at least 1 dBm, because the hardware only gives values without a decimal point. However with signal attenuation, reflection, interference from other sources, and non-spherical wave-fronts the measurement uncertainties can really be anything. Perhaps you can give an estimate for us?
Making plots or other (sanity) checks during an analysis is very important to make sure you don’t have bugs or are not making mistakes. E.g. estimated transmission powers shouldn’t be bigger than 0 dBm unless the smart phone’s owner hooked it up with an amplifier (which he didn’t). These also help the reader ’believe’ your results.
The system’s x- and y-resolution you can estimate as the average of the uncertainties on the device’s x- and y-position.
You decide what packets are closely enough spaced in time to be combined, but be sensible and be clear about your decision.




Reviews
There are no reviews yet.