Built-in data structures

Reading time3 min

In brief

Article summary

When one wants to store more than one value in a variable, a data structure is needed. Here are four built-in data structures commonly provided by high-level programming languages, like Python e.g.:

  • lists,
  • sets,
  • tuples,
  • and dictionaries.

This article briefly recalls the specificities of each of these data types.

Main takeaways

  • A list is used to store multiple indexed values and may evolve,
  • A set stores multiple unique and unchangeable values in an unordered manner and may evolve.
  • A tuple stores multiple unchangeable indexed values.
  • A dictionary is structure to store key/value pairs,

Article contents

1 — What is a List?

The list type is undoubtedly the most commonly used data structure for storing multiple values in a single variable. Its main properties are:

  • Values in a list are ordered (it is a sequence), which means that each value is associated with an index from 0 to size of the list minus one.
  • A list allows for duplicate values.
  • A list is a mutable data type thus allowing for the modification of its values.
  • A list is dynamic meaning that values can be added or deleted.
  • A list allows for slice selection.
"""
Basic operations on lists
"""
from typing import List

T : List[int] = [1, 2, 3, 4, 5]
print(f"Initial list: {T} with id {id(T)}")
#dynamic
T.append("6")
#mutable
T[0]=2
print(f"Modified list: {T} still with id {id(T)}")
#slice selection
print(f"Two last elements: {T[-2:]}")
import java.util.ArrayList;
import java.util.List;

public class ExampleList {
    public static void main(String[] args) {
        List<Integer> T = new ArrayList<>();
        T.add(1);
        T.add(2);
        T.add(3);
        T.add(4);
        T.add(5);

        System.out.println("Initial list: " + T + " with ref " + System.identityHashCode(T));

        T.add(6); // Appending an integer instead of a string
        T.set(0, 2); // Modifying the first element

        System.out.println("Modified list: " + T + " still with ref " + System.identityHashCode(T));
        System.out.println("Two last elements: " + T.subList(T.size() - 2, T.size()));
    }
}

2 — What is a Set?

A set in Python (or Java) has the same properties as those of the mathematical set theory. Its main properties are:

  • Values in a set are unordered, subsequently a value cannot be changed.
  • A set does not allow for duplicate values.
  • A set is a mutable data type thus allowing for the addition or deletion of values.
"""
Basic operations on sets
"""
from typing import Set

T : Set[int] = {1, 3, 5, 5}
print(f"Initial set: {T} with id {id(T)}")
T.add(6)
print(f"Modified set: {T} with id {id(T)}")
print("Set operations available | for union, & for intersection, - for difference, ^ for symmetric difference")
print(f"Contains 4: {4 in T}")
import java.util.HashSet;
import java.util.Set;

public class ExampleSet {
    public static void main(String[] args) {
        Set<Integer> T = new HashSet<>();
        T.add(1);
        T.add(3);
        T.add(5);
        T.add(5); // Duplicate element, will not be added

        System.out.println("Initial set: " + T + " with hash code " + System.identityHashCode(T));
        T.add(6);
        System.out.println("Modified set: " + T + " with hash code " + System.identityHashCode(T));
        System.out.println("Contains 4: " + T.contains(4));
    }
}

3 — What is a Tuple?

A Tuple in Python (or Java) is used to store multiple values that do not evolve. Its main properties are:

  • Vamies in a Tuple are ordered (it is a sequence).
  • A Tuple allows for duplicate values.
  • A Tuple is an imutable data type thus its content cannot evolve.
"""
Basic operations on tuples
"""

from typing import Tuple

T : Tuple[int, str, float] = 1, "3", 3.14
print(f"Initial tuple: {T} with id {id(T)}")
print(f"Second element: {T[1]}")
T=T+("4",)
print(f"New tuple: {T} with id {id(T)}")
import java.util.Arrays;

public class ExampleTuple {
    public static void main(String[] args) {
        Object[] T = {1, "3", 3.14};
        System.out.println("Initial tuple: " + Arrays.toString(T) + " with hash code " + System.identityHashCode(T));
        System.out.println("Second element: " + T[1]);
        
        Object[] newT = Arrays.copyOf(T, T.length + 1);
        newT[newT.length - 1] = "4";
        System.out.println("New tuple: " + Arrays.toString(newT) + " with hash code " + System.identityHashCode(newT));
    }
}

4 — What is a Dictionary?

A dictionary, also called associative array, hash or map, indexes values using keys in order to give a kind of semantics to each stored value. Its main properties are:

  • A dictionnary is used to store key/value pairs.
  • A dictionary does not allow for key duplicate.
  • A dictionary is a mutable data type thus allowing for the modification of its values.
  • A dictionary is dynamic meaning that key/value pairs can be added or deleted at any time.
from typing import Dict, List

def index_words ( text: str
                ) -> Dict[str, List[int]]:

    """
        Indexes all the words in a text.
        The function will return a dictionary where the keys are words, and the values are lists of indexes where the word appears in the text.
        In:
            * text: The text to index.
        Out:
            * indexes: The dictionary containing the indexes of all the words in the text.
    """

    # Debug
    assert isinstance(text, str) # Type check for text

    # Process the text
    words : List[str] = text.split()
    indexes : Dict[str, List[int]]= {}
    for i, word in enumerate(words):
        if word not in indexes:
            indexes[word] = []
        indexes[word].append(i)
    return indexes

# Let's test the function
text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed lacinia, nibh a tincidunt luctus, tortor nunc molestie ante, ut egestas est sapien quis orci. Nunc laoreet eget ipsum id rhoncus. Curabitur vel nibh bibendum, eleifend nibh gravida, tempus massa. Donec id felis blandit, pretium nisl ut, sagittis ligula. Curabitur vitae accumsan ex. Pellentesque interdum condimentum sem, eu aliquet arcu mollis ac. Mauris pellentesque lobortis mi, eget aliquam mauris gravida vel. Nulla quis dui vel arcu malesuada vulputate. Praesent nisi ex, tincidunt fermentum augue eu, cursus vehicula ipsum. Nam vel diam gravida, ultrices eros sed, cursus libero. Etiam maximus tincidunt metus in maximus."
index = index_words(text)
print(index)
// Needed imports
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

/**
 * To run this code, you need to have Java installed on your computer, then:
 * - Create a file named `Main.java` in a directory of your choice.
 * - Copy this code in the file.
 * - Open a terminal in the directory where the file is located.
 * - Run the command `javac Main.java` to compile the code.
 * - Run the command `java -ea Main` to execute the compiled code.
 * Note: '-ea' is an option to enable assertions in Java.
 */
public class Main {

    /**
     * This function was translated from the Python code.
     * This is also the case for the contents of the "main" function.
     *
     * @param text The text to index.
     * @return     The dictionary containing the indexes of all the words in the text.
     */
    public static Map<String, List<Integer>> indexWords(String text) {
        // Split the text into words
        String[] words = text.split("\\s+");
        // Create a map to store the indexes
        Map<String, List<Integer>> indexes = new HashMap<>();

        // Loop over the words and populate the index map
        for (int i = 0; i < words.length; i++) {
            String word = words[i];
            // If the word is not yet in the map, add it
            if (!indexes.containsKey(word)) {
                indexes.put(word, new ArrayList<>());
            }
            // Append the current index to the list
            indexes.get(word).add(i);
        }

        return indexes;
    }

    /**
     * This is the entry point of your program.
     * It contains the first codes that are going to be executed.
     *
     * @param args Command line arguments received.
     */
    public static void main(String[] args) {
        // Let's test the method
        String text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed lacinia, nibh a tincidunt luctus, tortor nunc molestie ante, ut egestas est sapien quis orci. Nunc laoreet eget ipsum id rhoncus. Curabitur vel nibh bibendum, eleifend nibh gravida, tempus massa. Donec id felis blandit, pretium nisl ut, sagittis ligula. Curabitur vitae accumsan ex. Pellentesque interdum condimentum sem, eu aliquet arcu mollis ac. Mauris pellentesque lobortis mi, eget aliquam mauris gravida vel. Nulla quis dui vel arcu malesuada vulputate. Praesent nisi ex, tincidunt fermentum augue eu, cursus vehicula ipsum. Nam vel diam gravida, ultrices eros sed, cursus libero. Etiam maximus tincidunt metus in maximus.";
        Map<String, List<Integer>> index = indexWords(text);
        System.out.println(index);
    }

}

When running the previous code, we get the following output:

Output
{'Lorem': [0], 'ipsum': [1, 27], 'dolor': [2], 'sit': [3], 'amet,': [4], 'consectetur': [5], 'adipiscing': [6], 'elit.': [7], 'Sed': [8], 'lacinia,': [9], 'nibh': [10, 32, 35], 'a': [11], 'tincidunt': [12, 80, 98], 'luctus,': [13], 'tortor': [14], 'nunc': [15], 'molestie': [16], 'ante,': [17], 'ut': [18], 'egestas': [19], 'est': [20], 'sapien': [21], 'quis': [22, 71], 'orci.': [23], 'Nunc': [24], 'laoreet': [25], 'eget': [26, 65], 'id': [28, 40], 'rhoncus.': [29], 'Curabitur': [30, 48], 'vel': [31, 73, 88], 'bibendum,': [33], 'eleifend': [34], 'gravida,': [36, 90], 'tempus': [37], 'massa.': [38], 'Donec': [39], 'felis': [41], 'blandit,': [42], 'pretium': [43], 'nisl': [44], 'ut,': [45], 'sagittis': [46], 'ligula.': [47], 'vitae': [49], 'accumsan': [50], 'ex.': [51], 'Pellentesque': [52], 'interdum': [53], 'condimentum': [54], 'sem,': [55], 'eu': [56], 'aliquet': [57], 'arcu': [58, 74], 'mollis': [59], 'ac.': [60], 'Mauris': [61], 'pellentesque': [62], 'lobortis': [63], 'mi,': [64], 'aliquam': [66], 'mauris': [67], 'gravida': [68], 'vel.': [69], 'Nulla': [70], 'dui': [72], 'malesuada': [75], 'vulputate.': [76], 'Praesent': [77], 'nisi': [78], 'ex,': [79], 'fermentum': [81], 'augue': [82], 'eu,': [83], 'cursus': [84, 94], 'vehicula': [85], 'ipsum.': [86], 'Nam': [87], 'diam': [89], 'ultrices': [91], 'eros': [92], 'sed,': [93], 'libero.': [95], 'Etiam': [96], 'maximus': [97], 'metus': [99], 'in': [100], 'maximus.': [101]}
{mollis=[59], ante,=[17], sem,=[55], vel.=[69], adipiscing=[6], est=[20], cursus=[84, 94], eleifend=[34], accumsan=[50], Pellentesque=[52], egestas=[19], augue=[82], nunc=[15], mi,=[64], orci.=[23], mauris=[67], maximus.=[101], id=[28, 40], Nulla=[70], molestie=[16], vel=[31, 73, 88], elit.=[7], ut=[18], lobortis=[63], Etiam=[96], luctus,=[13], in=[100], lacinia,=[9], fermentum=[81], vulputate.=[76], sed,=[93], eu=[56], Sed=[8], pellentesque=[62], vitae=[49], ipsum=[1, 27], amet,=[4], felis=[41], Curabitur=[30, 48], laoreet=[25], tempus=[37], Mauris=[61], ac.=[60], arcu=[58, 74], dolor=[2], bibendum,=[33], metus=[99], rhoncus.=[29], ex,=[79], vehicula=[85], ex.=[51], diam=[89], sit=[3], interdum=[53], a=[11], nisi=[78], tortor=[14], blandit,=[42], tincidunt=[12, 80, 98], nisl=[44], ipsum.=[86], ultrices=[91], condimentum=[54], aliquam=[66], massa.=[38], ut,=[45], Praesent=[77], nibh=[10, 32, 35], gravida,=[36, 90], Donec=[39], libero.=[95], Lorem=[0], aliquet=[57], malesuada=[75], pretium=[43], sapien=[21], Nunc=[24], dui=[72], quis=[22, 71], Nam=[87], eu,=[83], consectetur=[5], eget=[26, 65], sagittis=[46], ligula.=[47], gravida=[68], eros=[92], maximus=[97]}

To go further

Important

The content of this section is optional. It contains additional material for you to consolidate your understanding of the current topic.

To go beyond

Important

The content of this section is very optional. We suggest you directions to explore if you wish to go deeper in the current topic.

A fundamental underlying notion of dictionaries is hash function. A hash function is used to determine how to access the value associated with a key.