Overview

ViSpa (Vision Spaces) provides vision-based representations for individual images and prototypes, which are obtained by averaging over images of the same label. ViSpa is intended as a resource for use in psychology and cognitive science, with possible applications for other fields.

The ViSpa website serves as a convenient tool for anyone interested in using these models. It provides the core functionalities for interacting with the ViSpa system: Similarity computations between individual images, between individual images and prototypes, and between prototypes. The website also includes the Picture picker interface that allows you to browse through the individual images for which representations are available in the current instance of the ViSpa system.

The model

ViSpa is based on a pre-trained version of the VGG-F model (Chatfield et al., 2014) provided in the MatConvNet toolbox for MATLAB (Vedaldi & Lenc, 2015). The VGG-F model is a deep convolutional neural network trained to predict labels for images. Once trained, any image can be given as input to the model. The activation values of the units at the different layers of the network can then be taken as vision-based representations for the input image, with the deeper (i.e., later) layers tending to capture more complex gestalt-level features (Mahendran & Vedaldi, 2015).


The VGG-F model

vggf


The ViSpa system does not only include representations for individual images, but also prototypical representations for classes of images. These are obtained by averaging over 100 to 200 images for that category (for a detailed description, see Petilli, Günther, Vergallito, Ciapparelli, & Marelli, 2021).


The ViSpa system

vispa


Individual images and prototypes

The website offers you the possibility to work with individual images and prototypes. In its current state, it contains vision-based representations for 7,801 different image categories (each with its unique label). For each of these labels, it currently contains vision-based representations for:

  • the protoype representation (for example, LION_PRO), obtained by averaging 100 to 200 vision-based representations for images with that label. As such, the prototype representation is not tied to a specific image, only to a labelled category
  • 16 different randomly-selected individual images (for example, LION_001 or LION_012). You can search these images and copy their names using the Picture picker interface
  • 5 additional different individual images (for example, LION_000Q, LION_025Q, LION_050Q, LION_075Q, LION_100Q) with different similarities to the protoype representation. Of all 100 to 200 images used to construct the prototype representation (for example, LION_PRO), LION_100Q is the most similar image to the prototype representation, LION_000Q is the least similar image to the prototype representation, LION_025Q is the 0.25-percentile, and so on

All functionalities on the website run on image representations and prototype representations.

Available visual space

Currently, the website loads a default visual space on which all computations are run. This space was created using the pre-trained VGG-F network in the MatConvNet toolbox for MATLAB (Vedaldi & Lenc, 2015). In our large-scale evaluation studies (Günther, Marelli, Tureski, & Petilli, in preparation), we identified the 6th layer of the network as best predicting human behavioral data, and thus extracted all vision-based representations used here from this layer. For computational manageability, we reduced the dimensionality of these representations from their original 4,096 dimension to 400 dimensions using Singular Value Decomposition (SVD); this has virtually no effect on the measures obtained (in our evaluation set, the similarities are essentially identical, with a correlation of > .999).

Menus

Neighbours

This menu allows to look up the nearest neighbours of individual images or prototypes. Neighbours are defined as the N representations with the highest similarity or smallest distance to the search item. For example, in order to check what are the words with a smallest distance to LION_001, CASTLE_025Q or MONKEY_PRO, type those into separate lines the input field like this

LION_001
MONKEY_PRO
CASTLE_025Q
MOUNTAIN_PRO

and press Calculate.

You can choose the metric that is used to compute similarity or distance. Using the Vector space option, can also choose the set of representations in which the neighbours are searched: You can restrict the search to only find the nearest prototype representations, only the nearest image representations, or impose no such restrictions and search in both prototypes and images.

You can read in the list of search items to the text field from a file on your disc by clicking on the Load from a file button below the target input field.

Matrix

If you need to obtain all possible similarities or distances between a number of images or prototypes, you can use the matrix menu.

The representations for which you need the similarities or distances should be entered in the input form on the left. Each search item should be entered in a separate line.

An example input could look like this

LION_001
MONKEY_PRO
CASTLE_025Q
MOUNTAIN_PRO

Next, in the dropdown menu you can choose what kind of comparison do you want to make. The available options are:

  • distances between all pairs of items in the list
  • distances between the items in the list and all other image and prototype representations in the loaded visual space
  • distances between all items between the left input field and the right input field

When you click on Calculate, the ViSpa website will compute the metrics and, after this is finished, it will initialize download of a file with the results. The file is in a CSV format: it contains a table in a plain text with columns separated with commas.

As for neighbors, you can read in the list of search items to the text field from a file on your disc by clicking on the Load from a file button below the target input field. The Check availability button can be used check whether representations for all items specified in the input field are present in the visual space. Keep in mind that, if some of the words are not in the space, the website will ignore them when computing the metrics.

Pairwise

You can use this menu to investigate similarities or distances between individual pairs of image and/or prototypes. Each pair should be entered on a separate row in the input field and elements of the pair should be speparated with a colon (':') . After clicking on the Calculate button, the website will perform the calculation and a download of a CSV file containing the results will be initialized.

An example input could look like this

LION_001:LION_PRO
GORILLA_008:MONKEY_PRO
CASTLE_025Q:CATHEDRAL_004
MOUNTAIN_PRO:HILL_PRO

Similarily to the Matrix interface you can load the list of pairs from a text file or check the availability of the items in the visual space.

References

Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil inthe details: Delving deep into convolutional nets. arXiv preprint; arXiv:1405.3531.

Mahendran, A., & Vedaldi, A. (2015). Understanding deep image representations by inverting them. In Proceedings of the ieee conference on computer vision and pattern recognition (pp. 5188–5196).

Petilli, M. A., Günther, F., Vergallito, A., Ciapparelli, M., & Marelli, M. (2021). Data-driven computational models reveal perceptual simulation in wordcomprehension. Journal of Memory and Language, 117, 104194.

Vedaldi, A., & Lenc, K. (2015). Matconvnet: Convolutional neural networks for Matlab. In Proceedings of the 23rd ACM international conference on Multimedia (pp. 689–692).