Supplementary material

What the world looks like for a bootstrapping agent

This animation shows the initial state of knowledge for a bootstrapping agent. At the beginning, we only have an uninterpreted stream of observations and commands, and we must make sense of them. The bootstrapping agent must interpret this initial confusion. The only semantics assumed is that the commands somehow have a causal effect on the observations. Can you tell which sensor is this?

At left, you see the observations, which in the paper are called y (white: low, black: high, whathever "high" and "low" mean for the unknown sensors); in the middle, the derivative dy/dt of the observations (red: positive, white: zero, blue: negative); at right, the uninterpreted commands u. In this case, the commands correspond to linear and angular velocities.

Can you tell which changes in the observations are due to the agent actions (motion) or other things moving in the environment? This is the anomaly detection task considered in the paper. This is a passive task that can be done on logged data, compared to the servoing task, considered in our previous work, which being active is more representative of the model learned, but cannot be evaluated on static data.

Range-finder data

play download (92.4 mp4)

This video displays the laser data obtained by the two Sick range-finders mounted at approximately 0deg and 180deg with respect to the robot front.

On the right, the raw readings are displayed by a simple plot. The first 181 readings are from the front laser, and the rest from the back laser.

On the left, the laser readings are plotted in polar form superimposed to the data from the omnidirectional camera. (Note that the alignment is only approximative).

In both cases the maximum distance is capped at 20m (for visualization purposes).

Range-finder data - statistics

play download (6.0 mp4)

This video displays some second-order statistics of the laser data, namely the sample covariance of the readings (left), of the readings derivative (center), and of the sign of the readings derivative (right).

Eventually, when averaged over long trajectories in various environments, all of the three statistics are a function of the distance between the sensels. However, their convergence properties vary. The covariance of the readings is slow to converge because the robot is driven along stereotypical trajectories (e.g., straight in the middle of a corridor). Instead, the derivative of the readings converges faster (informally, deriving something tends to get rid of the slow phenomena).

From second-order statistics, it is possible to infer similarities of the sensels, and from those similarities, to obtain an embedding of the sensels on the sensel space. The metric information is not recovered precisely, but the topology can be reliably estimated. In this case, this means that, even starting from shuffled values, it is possible to recover the ordering of the sensels. The remaining uncertainty can be considered a diffeomorphism nuisance.

Actually, the most reliable statistics for embedding purposes is the information distance between the sensels, which is not shown here. Here is a Python pickle file containing the information distance matrix estimated from the data.

Range-finder data - embedding and population code

play download (27.7 mp4)

Using statistics of the data we can recover the ordering of sensels. The next step is percentile normalization: the value of each sensel is normalized in the [0,100] range according to its percentile in the sequence. This step normalizes a diffeomorphism nuisance acting on the values. For example, suppose that the data of a range-finder are modified by an invertible nonlinearity, such as x --> 1/x, so that the values represent nearness instead of distances. The percentile normalization step equalizes the effect of such nonlinearity.

Also, it has the effect that the data is represented more densely for more probable values, which makes it an efficient representation. For example, the range-finder readings are mostly in the <10m range, while very few samples are >20m. The percentile representation gives more space to the immediate surroundings.

The next step is the population code computation. N cells are assigned to each sensel. Each cell is activated if the sensel value is close to its reference point, according to a certain kernel. The result is a 362xN array which we can display as an image for easy visualization.

Now, notice that all these operations are generic and data-agnostic. However, for the case of the range-finder, the end result is an image which is diffeomorphic to a polar map of the environment. On the y axis we have the sensel position in the sensel space, which is the angle (up to a diffeomorphism); on the y axis we have the percentile as a pop. code, which is diffeomorphic to the range.

In the paper we prove that the range-finder data thus preprocessed can be approximated by a BGDS model.

Range-finder data - BGDS tensors learning

play download (2.5 mp4)

This video shows the tensor H being learned for the range-finder data. Its four slices are shown side-by-side in false colors (red: positive, white: zero, blue: negative). The video is only for 1 log out of 11, so the final results are not as smooth as those shown in the paper's figures.

Camera data

play download (240.7 mp4)

This is the composite frame used as the data in the experiments.

On the left, the data from the omnidirectional camera. Top right is a frontal camera with large field of view, and bottom right one of the Three triclops cameras.

Unfortunately, there might be some glitches due to the fact that different cameras have different framerates.

Camera data - mean

play download (0.9 mp4)

This video shows the computation of the mean values for each pixel. Notice how simple statistics identify the role

Camera data - variance

play download (0.9 mp4)

This video shows the computation of the variance for each pixel. Here white = low variance, and dark = high variance. In general, it is not necessarily true that high variance is equivalent to more information, but in this case extremely low-variance pixels are non informative and could be discarded. Those correspond to "dead" parts of the image: borders or the robot itself reflected in the images.

Camera data - gray-scale - BGDS tensors learning

play download (2.9 mp4)

This video shows the tensor H being learned for the camera data. Its four slices are shown side-by-side in false colors (red: positive, white: zero, blue: negative).

The video is only for 1 log out of 11, so the final results are not as smooth as those shown in the paper's figures.

In this video the false-color image of each tensor slice is generated independently from the other slices. This visualization exaggerates the the importance of the (vertical gradient, angular velocity, at the bottom right) slide, which is 0 theoretically and just noise in practice, and would appear white if all slices are normalized together as in the paper's figures.

Camera data - contrast - BGDS tensors learning

play download (2.5 mp4)

This is the same as the previous video, with the difference that we are pre-filtering the camera data using a contrast operation before learning. The results are very similar. In general, filtering the data with a local operation should not change the learning result.

Camera data - grayscale - observations and predictions

play download (314.8 mp4)

This video shows side-by-side the derivative of the data and the prediction based on the learning tensors.

Note that fast rotations cause problems because of motion blur and the fact that the time resolution becomes relevant, so that the observations cannot be explained by a continuous dynamics. Here we should take a multi-scale approach, reducing the resolution of the image for fast motions.

Camera data - grayscale - anomaly detection signal

play download (85.5 mp4)

This video shows the anomaly detection signal (white: no anomaly, black: anomaly).

The false colors are normalized per-frame. This means that in the first part of the sequence, when there are no moving objects, you are looking mainly at noise.

Skip to time 45s or 105s (in log time, displayed in the bottom right) to see the strong signal when people walk past the robot.

Also, genuine anomalies that do not correspond to moving objects but to model failures are:

occlusions
motion blur due to large rotations
when the robot tilts due to uneven pavement, which produces a motion not explained by linear/angular velocities