Overview
ViSpa (Vision Spaces) provides vision-based representations for individual images and prototypes, which are obtained by averaging over images of the same label. ViSpa is intended as a resource for use in psychology and cognitive science, with possible applications for other fields.
The ViSpa website serves as a convenient tool for anyone interested in using these models. It provides the core functionalities for interacting with the ViSpa system: Similarity computations between individual images, between individual images and prototypes, and between prototypes. The website also includes the Picture picker interface that allows you to browse through the individual images for which representations are available in the current instance of the ViSpa system.
The model
ViSpa is based on a pre-trained version of the VGG-F model (Chatfield et al., 2014) provided in the MatConvNet toolbox for MATLAB (Vedaldi & Lenc, 2015). The VGG-F model is a deep convolutional neural network trained to predict labels for images. Once trained, any image can be given as input to the model. The activation values of the units at the different layers of the network can then be taken as vision-based representations for the input image, with the deeper (i.e., later) layers tending to capture more complex gestalt-level features (Mahendran & Vedaldi, 2015).
The VGG-F model
The ViSpa system does not only include representations for individual images, but also prototypical representations for classes of images. These are obtained by averaging over 100 to 200 images for that category (for a detailed description, see Petilli, Günther, Vergallito, Ciapparelli, & Marelli, 2021).
The ViSpa system
Individual images and prototypes
The website offers you the possibility to work with individual images and prototypes. In its current state, it contains vision-based representations for 7,801 different image categories (each with its unique label). For each of these labels, it currently contains vision-based representations for:
- the protoype representation (for example, LION_PRO), obtained by averaging 100 to 200 vision-based representations for images with that label. As such, the prototype representation is not tied to a specific image, only to a labelled category
- 16 different randomly-selected individual images (for example, LION_001 or LION_012). You can search these images and copy their names using the Picture picker interface
- 5 additional different individual images (for example, LION_000Q, LION_025Q, LION_050Q, LION_075Q, LION_100Q) with different similarities to the protoype representation. Of all 100 to 200 images used to construct the prototype representation (for example, LION_PRO), LION_100Q is the most similar image to the prototype representation, LION_000Q is the least similar image to the prototype representation, LION_025Q is the 0.25-percentile, and so on
All functionalities on the website run on image representations and prototype representations.
Available visual space
Currently, the website loads a default visual space on which all computations are run. This space was created using the pre-trained VGG-F network in the MatConvNet toolbox for MATLAB (Vedaldi & Lenc, 2015). In our large-scale evaluation studies (Günther, Marelli, Tureski, & Petilli, in preparation), we identified the 6th layer of the network as best predicting human behavioral data, and thus extracted all vision-based representations used here from this layer. For computational manageability, we reduced the dimensionality of these representations from their original 4,096 dimension to 400 dimensions using Singular Value Decomposition (SVD); this has virtually no effect on the measures obtained (in our evaluation set, the similarities are essentially identical, with a correlation of > .999).
Menus
Neighbours
This menu allows to look up the nearest neighbours of individual images or prototypes. Neighbours are defined as the N representations with the highest similarity or smallest distance to the search item. For example, in order to check what are the words with a smallest distance to LION_001, CASTLE_025Q or MONKEY_PRO, type those into separate lines the input field like this
LION_001
MONKEY_PRO
CASTLE_025Q
MOUNTAIN_PRO
and press Calculate
.
You can choose the metric that is used to compute similarity or distance. Using the Vector space
option, can also choose the set of representations in which the neighbours are searched:
You can restrict the search to only find the nearest prototype representations, only the nearest image representations, or impose no such restrictions and search in both prototypes and images.
You can read in the list of search items to the text field from a file on your disc by
clicking on the Load from a file
button below the target input field.
Matrix
If you need to obtain all possible similarities or distances between a number of images or prototypes, you can use the matrix menu.
The representations for which you need the similarities or distances should be entered in the input form on the left. Each search item should be entered in a separate line.
An example input could look like this
LION_001
MONKEY_PRO
CASTLE_025Q
MOUNTAIN_PRO
Next, in the dropdown menu you can choose what kind of comparison do you want to make. The available options are:
- distances between all pairs of items in the list
- distances between the items in the list and all other image and prototype representations in the loaded visual space
- distances between all items between the left input field and the right input field
When you click on Calculate
, the ViSpa website will compute the metrics and, after this
is finished, it will initialize download of a file with the results. The file
is in a CSV format: it contains a table in a plain text with columns separated
with commas.
As for neighbors, you can read in the list of search items to the text field from a file on your disc by
clicking on the Load from a file
button below the target input field. The Check availability
button can be used check whether representations for all items specified in the input field
are present in the visual space. Keep in mind that, if some of the words are not in the space, the website will ignore them when computing the metrics.
Pairwise
You can use this menu to investigate similarities or distances between individual pairs of
image and/or prototypes. Each pair should be entered on a separate row in the input
field and elements of the pair should be speparated with a colon (':') . After
clicking on the Calculate
button, the website will perform the calculation and a
download of a CSV file containing the results will be initialized.
An example input could look like this
LION_001:LION_PRO
GORILLA_008:MONKEY_PRO
CASTLE_025Q:CATHEDRAL_004
MOUNTAIN_PRO:HILL_PRO
Similarily to the Matrix interface you can load the list of pairs from a text file or check the availability of the items in the visual space.
References
Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil inthe details: Delving deep into convolutional nets. arXiv preprint; arXiv:1405.3531.
Mahendran, A., & Vedaldi, A. (2015). Understanding deep image representations by inverting them. In Proceedings of the ieee conference on computer vision and pattern recognition (pp. 5188–5196).
Petilli, M. A., Günther, F., Vergallito, A., Ciapparelli, M., & Marelli, M. (2021). Data-driven computational models reveal perceptual simulation in wordcomprehension. Journal of Memory and Language, 117, 104194.
Vedaldi, A., & Lenc, K. (2015). Matconvnet: Convolutional neural networks for Matlab. In Proceedings of the 23rd ACM international conference on Multimedia (pp. 689–692).