Semigroup

Aggregation, data combining

Publish at: 28 Mar 2026

 ____  _____ __  __ ___ ____ ____   ___  _   _ ____  ____
/ ___|| ____|  \/  |_ _/ ___|  _ \ / _ \| | | |  _ \/ ___|
\___ \|  _| | |\/| || | |  _| |_) | | | | | | | |_) \___ \
 ___) | |___| |  | || | |_| |  _ <| |_| | |_| |  __/ ___) |
|____/|_____|_|  |_|___\____|_| \_\\___/ \___/|_|   |____/

Previous chapters were demanding. Understanding context mapping and parallel computations within a context can be challenging. This time we turn to a more fundamental concept that is still simple to understand: semigroups^[1]. I am not sure there is anything simpler that is also as useful in practice.

Algebraic Data Structures #

So far, we have met functors and applicative functors. Putting the programming mindset aside, we can think of them as sets equipped with operations. From this point of view, they are very similar, except that an applicative functor is also a functor. Expanding the idea of a set with operations, where those operations are closed and their results remain inside the same set, opens a wider, more abstract point of view^[2] on the tools we are building.

Algebraic structure^[3] or algebraic system consists of a nonempty set A (called the underlying set, carrier set or domain), a collection of operations on A (typically binary operations such as addition and multiplication), and a finite set of identities (known as axioms) that these operations must satisfy.*

This perfectly correlates with what we've been dealing with so far about objects and morphisms and laws of their application. So algebraic structures are sets with one or more operations that obey specific laws.

When a new problem involves the same laws as such an algebraic structure, all the results that have been proved using only the laws of the structure can be directly applied to the new problem.

Remember functor country mapping example? That was about studying the problem in one country using a different country. Now, it's about the same country and building new bridges using laws for previous ones, so to speak.

Semigroup #

A semigroup is one of the simplest structures in computer science. At its core, a semigroup is simply a set equipped with an associative binary operation.

binary - involves two elements of the set (x, y)
associative - recall that the operation ∘ is associative if the order of grouping does not matter.

Important note: Commutativity is not mandatory.

What makes semigroups particularly powerful is their composability. When you can combine two things of the same type to get another thing of the same type, and when this combination is associative, you unlock powerful patterns for building scalable, parallelizable systems.

Formal definition #

A semigroup is an algebraic structure consisting of:

A set S (the underlying set or carrier)
A binary operation ∘ : S × S -> S (closed under the operation)
Associativity law: For all a, b, c ∈ S, the following must hold: (a ∘ b) ∘ c = a ∘ (b ∘ c)

Examples #

import Prelude hiding (Semigroup, (<>))

-- The Semigroup type class in Haskell
class Semigroup a where
  (<>) :: a -> a -> a

-- List concatenation semigroup
instance Semigroup [a] where
  (<>) = (++)

-- Integer addition semigroup (using Sum wrapper)
newtype Sum a = Sum a deriving (Show)
instance Num a => Semigroup (Sum a) where
  Sum x <> Sum y = Sum (x + y)

-- Integer multiplication semigroup (using Product wrapper)
newtype Product a = Product a deriving (Show)
instance Num a => Semigroup (Product a) where
  Product x <> Product y = Product (x * y)

-- Examples in action:
main :: IO ()
main = do
  -- String concatenation
  print $ "Hello" <> " " <> "World"  -- "Hello World"

  -- List concatenation
  print $ ([1,2] :: [Int]) <> [3,4] <> [5,6]    -- [1,2,3,4,5,6]

  -- Sum semigroup
  print $ (Sum 1 :: Sum Integer) <> Sum 2 <> Sum 3    -- Sum 6

  -- Product semigroup
  print $ (Product 2 :: Product Integer) <> Product 3 <> Product 4  -- Product 24

// Semigroup interface in TypeScript
interface Semigroup<T> {
  combine(a: T, b: T): T;
}

// String concatenation semigroup
const StringSemigroup: Semigroup<string> = {
  combine: (a, b) => a + b
};

// Array concatenation semigroup
const ArraySemigroup = <T>(): Semigroup<T[]> => ({
  combine: (a, b) => [...a, ...b]
});

// Number addition semigroup
const SumSemigroup: Semigroup<number> = {
  combine: (a, b) => a + b
};

// Number multiplication semigroup
const ProductSemigroup: Semigroup<number> = {
  combine: (a, b) => a * b
};

// Generic combine function (associative)
function combineAll<T>(semigroup: Semigroup<T>, items: T[]): T {
  if (items.length === 0) {
    throw new Error("Cannot combine empty array");
  }

  return items.reduce(semigroup.combine);
}

// Examples in action:
console.log(StringSemigroup.combine("Hello", " World")); // "Hello World"
console.log(combineAll(StringSemigroup, ["Hello", " ", "World"])); // "Hello World"
console.log(combineAll(ArraySemigroup<number>(), [[1,2], [3,4], [5,6]])); // [1,2,3,4,5,6]
console.log(combineAll(SumSemigroup, [1, 2, 3, 4])); // 10
console.log(combineAll(ProductSemigroup, [2, 3, 4])); // 24

using System;
using System.Collections.Generic;
using System.Linq;

// Semigroup interface in C#
public interface ISemigroup<T>
{
    T Combine(T a, T b);
}

// String concatenation semigroup
public class StringSemigroup : ISemigroup<string>
{
    public string Combine(string a, string b) => a + b;
}

// List concatenation semigroup
public class ListSemigroup<T> : ISemigroup<List<T>>
{
    public List<T> Combine(List<T> a, List<T> b) => a.Concat(b).ToList();
}

// Number addition semigroup
public class SumSemigroup : ISemigroup<int>
{
    public int Combine(int a, int b) => a + b;
}

// Number multiplication semigroup
public class ProductSemigroup : ISemigroup<int>
{
    public int Combine(int a, int b) => a * b;
}

// Generic combine function (associative)
public static class SemigroupExtensions
{
    public static T CombineAll<T>(this ISemigroup<T> semigroup, IEnumerable<T> items)
    {
        var list = items.ToList();
        if (!list.Any())
        {
            throw new ArgumentException("Cannot combine empty collection");
        }

        return list.Aggregate(semigroup.Combine);
    }
}

// Examples in action:
class Program
{
    static void Main()
    {
        var stringSemigroup = new StringSemigroup();
        var listSemigroup = new ListSemigroup<int>();
        var sumSemigroup = new SumSemigroup();
        var productSemigroup = new ProductSemigroup();

        // String concatenation
        Console.WriteLine(stringSemigroup.Combine("Hello", " World")); // "Hello World"
        Console.WriteLine(stringSemigroup.CombineAll(new[] { "Hello", " ", "World" })); // "Hello World"

        // List concatenation
        var list1 = new List<int> { 1, 2 };
        var list2 = new List<int> { 3, 4 };
        var list3 = new List<int> { 5, 6 };
        Console.WriteLine(string.Join(",", listSemigroup.CombineAll(new[] { list1, list2, list3 }))); // "1,2,3,4,5,6"

        // Sum semigroup
        Console.WriteLine(sumSemigroup.CombineAll(new[] { 1, 2, 3, 4 })); // 10

        // Product semigroup
        Console.WriteLine(productSemigroup.CombineAll(new[] { 2, 3, 4 })); // 24
    }
}

Triangle Semigroup Example #

Let's revisit our geometric example, but this time describe the semigroup in terms of centroids^[4].

The Set and Operation #

To obtain a centroid-based semigroup, we restrict attention to one fixed triangle shape and all of its translations. Fix a reference triangle T_0 whose centroid is at the origin, and let S = {T_c | c ∈ R^2} be the set of all translated copies of T_0, where T_c is the unique translated copy whose centroid is c.

This works because, for a fixed triangle shape, the centroid uniquely determines the translation.

Define the operation by adding centroids:

T_c ∘ T_d = T_{c + d}

If c = (x1, y1) and d = (x2, y2), then:

c + d = (x1 + x2, y1 + y2)

Verification of Semigroup Laws #

Closure

If c, d ∈ R^2, then c + d ∈ R^2, so T_{c + d} is again an element of S.

Associativity

For any c, d, e ∈ R^2:

(T_c ∘ T_d) ∘ T_e = T_{(c + d) + e} = T_{c + (d + e)} = T_c ∘ (T_d ∘ T_e)

This holds because vector addition in R^2 is associative.

CSV Transform example #

Our practical example can benefit from semigroups as well. Let's take the initial one as a base.

With semigroups we can use things like:

👥 Person Combination: Merges person records by combining names and aggregating ages
➕ Age Summation: Uses addition semigroup to total all ages
📋 Name Collection: Concatenates arrays of names
🗂️ Dataset Merging: Combines multiple CSV datasets
📊 Statistics Aggregation: Accumulates counts, sums, and collections in parallel

// Semigroup interface
interface Semigroup<T> {
  combine(a: T, b: T): T;
}

// CSV parsing function (as provided)
function readCSV(data: string): Person[] {
  const [header, ...rows] = data.trim().split('\n');
  const keys = header.split(',');
  return rows.map(row => {
    const values = row.split(',');
    const obj = Object.fromEntries(keys.map((k, i) => [k, values[i]])) as any;
    return {
      name: obj.name,
      age: obj.age,
      totalAge: undefined,
      count: undefined
    } as Person;
  });
}

// Person type for type safety
interface Person {
  name: string;
  age: string;
  totalAge?: number;
  count?: number;
}

// Semigroup for combining person records
const PersonSemigroup: Semigroup<Person> = {
  combine: (p1, p2) => ({
    name: p1.name + " & " + p2.name,  // Combine names
    age: Math.max(parseInt(p1.age), parseInt(p2.age)).toString(), // Take max age
    totalAge: (p1.totalAge || parseInt(p1.age)) + (p2.totalAge || parseInt(p2.age)),
    count: (p1.count || 1) + (p2.count || 1)
  })
};

// Semigroup for aggregating ages (sum)
const AgeSumSemigroup: Semigroup<number> = {
  combine: (a, b) => a + b
};

// Semigroup for collecting names
const NameListSemigroup: Semigroup<string[]> = {
  combine: (a, b) => [...a, ...b]
};

// Generic combine function for arrays
function combineAll<T>(semigroup: Semigroup<T>, items: T[]): T {
  if (items.length === 0) {
    throw new Error("Cannot combine empty array");
 }

  return items.reduce(semigroup.combine);
}

// Example usage
const data = readCSV('name,age\nAlice,25\nBob,40\nCharlie,35');

console.log("Original data:", data);

// 1. Combine all persons into one record
const combinedPerson = combineAll(PersonSemigroup, data);
console.log("Combined person:", combinedPerson);

// 2. Sum all ages using semigroup
const ages = data.map(person => parseInt(person.age));
const totalAge = combineAll(AgeSumSemigroup, ages);
console.log("Total age:", totalAge);

// 3. Collect all names using semigroup
const nameArrays = data.map(person => [person.name]);
const allNames = combineAll(NameListSemigroup, nameArrays);
console.log("All names:", allNames);

// 4. Process multiple CSV datasets and combine them
const data2 = readCSV('name,age\nDave,28\nEve,32');
const data3 = readCSV('name,age\nFrank,45\nGrace,29');

// Combine datasets using array concatenation semigroup
const ArraySemigroup = <T>(): Semigroup<T[]> => ({
  combine: (a, b) => [...a, ...b]
});

const allDatasets = combineAll(ArraySemigroup<Person>(), [data, data2, data3]);
console.log("Combined datasets:", allDatasets);

// 5. Create summary statistics using semigroups
interface Stats {
  totalAge: number;
  count: number;
  names: string[];
}

const StatsSemigroup: Semigroup<Stats> = {
  combine: (s1, s2) => ({
    totalAge: s1.totalAge + s2.totalAge,
    count: s1.count + s2.count,
    names: [...s1.names, ...s2.names]
  })
};

const statsFromData = (persons: Person[]): Stats[] =>
  persons.map(p => ({
    totalAge: parseInt(p.age),
    count: 1,
    names: [p.name]
  }));

const overallStats = combineAll(StatsSemigroup, statsFromData(allDatasets));
console.log("Overall statistics:", overallStats);
console.log("Average age:", overallStats.totalAge / overallStats.count);

import Data.List.Split (splitOn)
import Data.Semigroup (Sum(..), getSum)

-- Person data type
data Person = Person
  { name :: String
  , age :: Int
  , totalAge :: Maybe Int
  , count :: Maybe Int
  } deriving (Show, Eq)

-- Semigroup instance for Person
instance Semigroup Person where
  p1 <> p2 = Person
    { name = name p1 ++ " & " ++ name p2
    , age = max (age p1) (age p2)
    , totalAge = Just $ (maybe (age p1) id (totalAge p1)) + (maybe (age p2) id (totalAge p2))
    , count = Just $ (maybe 1 id (count p1)) + (maybe 1 id (count p2))
    }

-- Stats data type for summary statistics
data Stats = Stats
  { statsTotalAge :: Int
  , statsCount :: Int
  , statsNames :: [String]
  } deriving (Show, Eq)

-- Semigroup instance for Stats
instance Semigroup Stats where
  s1 <> s2 = Stats
    { statsTotalAge = statsTotalAge s1 + statsTotalAge s2
    , statsCount = statsCount s1 + statsCount s2
    , statsNames = statsNames s1 ++ statsNames s2
    }

-- CSV parsing function
readCSV :: String -> [Person]
readCSV csvData =
  let allLines = lines csvData
  in case allLines of
       [] -> []
       (header:rows) ->
         let keys = splitOn "," header
             parseRow row =
               let values = splitOn "," row
                   rowMap = zip keys values
                   n = lookup "name" rowMap
                   a = lookup "age" rowMap >>= readMaybe
               in case (n, a) of
                    (Just nameVal, Just ageVal) ->
                      Just $ Person nameVal ageVal Nothing Nothing
                    _ -> Nothing
         in map (\r -> case parseRow r of
                         Just p -> p
                         Nothing -> error "Malformed row") rows
  where
    readMaybe :: String -> Maybe Int
    readMaybe s = case reads s of
                    [(x, "")] -> Just x
                    _ -> Nothing

-- Combine all elements using semigroup
combineAll :: Semigroup a => [a] -> Maybe a
combineAll [] = Nothing
combineAll (x:xs) = Just $ foldl (<>) x xs

-- Convert Person to Stats
personToStats :: Person -> Stats
personToStats p = Stats (age p) 1 [name p]

-- Main function
main :: IO ()
main = do
  let csvData = "name,age\nAlice,25\nBob,40\nCharlie,35"
  let dataList = readCSV csvData

  putStrLn "Original data:"
  mapM_ print dataList

  -- 1. Combine all persons into one record
  case combineAll dataList of
    Just combinedPerson -> do
      putStrLn "\nCombined person:"
      print combinedPerson
    Nothing -> putStrLn "No data to combine"

  -- 2. Sum all ages using built-in Sum semigroup
  let ages = map age dataList
  let ageSums = map (Sum . fromIntegral :: Int -> Sum Integer) ages  -- Convert to Sum semigroup
  case combineAll ageSums of
    Just totalSum -> putStrLn $ "\nTotal age (using Sum semigroup): " ++ show (getSum totalSum)
    Nothing -> putStrLn "No ages to sum"

  -- 3. Collect all names using semigroup
  let names = map ((: []) . name) dataList  -- Convert to list of lists
  case combineAll names of
    Just allNames -> putStrLn $ "All names: " ++ show allNames
    Nothing -> putStrLn "No names to collect"

  -- 4. Process multiple CSV datasets and combine them
  let csvData2 = "name,age\nDave,28\nEve,32"
  let csvData3 = "name,age\nFrank,45\nGrace,29"
  let data2 = readCSV csvData2
  let data3 = readCSV csvData3

  let allDatasets = dataList ++ data2 ++ data3
  putStrLn "\nCombined datasets:"
  mapM_ print allDatasets

  -- 5. Create summary statistics using semigroups
  let statsFromData = map personToStats allDatasets
  case combineAll statsFromData of
    Just overallStats -> do
      putStrLn "\nOverall statistics:"
      print overallStats
      let avgAge = fromIntegral (statsTotalAge overallStats) / fromIntegral (statsCount overallStats) :: Double
      putStrLn $ "Average age: " ++ show avgAge

      -- Also demonstrate Sum semigroup for total age calculation
      let allAges = map (Sum . fromIntegral . age :: Person -> Sum Integer) allDatasets
      case combineAll allAges of
        Just totalAgeSum -> putStrLn $ "Total age (Sum semigroup): " ++ show (getSum totalAgeSum)
        Nothing -> putStrLn "No ages to sum with Sum semigroup"
    Nothing -> putStrLn "No stats to combine"

using System;
using System.Collections.Generic;
using System.Linq;

// Semigroup interface
public interface ISemigroup<T>
{
    T Combine(T a, T b);
}

// Person class for type safety
public class Person
{
    public string Name { get; set; }
    public string Age { get; set; }
    public int? TotalAge { get; set; }
    public int? Count { get; set; }

    public Person(string name, string age, int? totalAge = null, int? count = null)
    {
        Name = name;
        Age = age;
        TotalAge = totalAge;
        Count = count;
    }
}

// Semigroup for combining person records
public class PersonSemigroup : ISemigroup<Person>
{
    public Person Combine(Person p1, Person p2)
    {
        return new Person(
            name: $"{p1.Name} & {p2.Name}",  // Combine names
            age: Math.Max(int.Parse(p1.Age), int.Parse(p2.Age)).ToString(), // Take max age
            totalAge: (p1.TotalAge ?? int.Parse(p1.Age)) + (p2.TotalAge ?? int.Parse(p2.Age)),
            count: (p1.Count ?? 1) + (p2.Count ?? 1)
        );
    }
}

// Semigroup for aggregating ages (sum)
public class AgeSumSemigroup : ISemigroup<int>
{
    public int Combine(int a, int b) => a + b;
}

// Semigroup for collecting names
public class NameListSemigroup : ISemigroup<List<string>>
{
    public List<string> Combine(List<string> a, List<string> b)
    {
        var result = new List<string>(a);
        result.AddRange(b);
        return result;
    }
}

// Generic array semigroup
public class ArraySemigroup<T> : ISemigroup<List<T>>
{
    public List<T> Combine(List<T> a, List<T> b)
    {
        var result = new List<T>(a);
        result.AddRange(b);
        return result;
    }
}

// Statistics class for aggregating data
public class Stats
{
    public int TotalAge { get; set; }
    public int Count { get; set; }
    public List<string> Names { get; set; }

    public Stats(int totalAge, int count, List<string> names)
    {
        TotalAge = totalAge;
        Count = count;
        Names = names ?? new List<string>();
    }
}

// Semigroup for combining statistics
public class StatsSemigroup : ISemigroup<Stats>
{
    public Stats Combine(Stats s1, Stats s2)
    {
        var combinedNames = new List<string>(s1.Names);
        combinedNames.AddRange(s2.Names);

        return new Stats(
            totalAge: s1.TotalAge + s2.TotalAge,
            count: s1.Count + s2.Count,
            names: combinedNames
        );
    }
}

// Extension methods for semigroup operations
public static class SemigroupExtensions
{
    // Generic combine function for collections
    public static T CombineAll<T>(this ISemigroup<T> semigroup, IEnumerable<T> items)
    {
        var list = items.ToList();
        if (!list.Any())
        {
            throw new ArgumentException("Cannot combine empty collection");
        }

        return list.Aggregate(semigroup.Combine);
    }
}

// CSV parsing utility
public static class CsvParser
{
    public static List<Person> ReadCSV(string data)
    {
        var lines = data.Trim().Split('\n');
        if (lines.Length == 0) return new List<Person>();

        var header = lines[0];
        var rows = lines.Skip(1);
        var keys = header.Split(',');

        return rows.Select(row =>
        {
            var values = row.Split(',');
            var obj = keys.Zip(values, (k, v) => new { Key = k, Value = v })
                         .ToDictionary(x => x.Key, x => x.Value);

            return new Person(
                name: obj["name"],
                age: obj["age"]
            );
        }).ToList();
    }
}

// Example usage
class Program
{
    static void Main()
    {
        // Initialize semigroups
        var personSemigroup = new PersonSemigroup();
        var ageSumSemigroup = new AgeSumSemigroup();
        var nameListSemigroup = new NameListSemigroup();
        var arraySemigroup = new ArraySemigroup<Person>();
        var statsSemigroup = new StatsSemigroup();

        // Parse CSV data
        var data = CsvParser.ReadCSV("name,age\nAlice,25\nBob,40\nCharlie,35");

        Console.WriteLine("Original data:");
        foreach (var person in data)
        {
            Console.WriteLine($"  {person.Name}, {person.Age}");
        }

        // 1. Combine all persons into one record
        var combinedPerson = personSemigroup.CombineAll(data);
        Console.WriteLine($"\nCombined person: {combinedPerson.Name}, " +
                         $"age: {combinedPerson.Age}, total age: {combinedPerson.TotalAge}, " +
                         $"count: {combinedPerson.Count}");

        // 2. Sum all ages using semigroup
        var ages = data.Select(person => int.Parse(person.Age));
        var totalAge = ageSumSemigroup.CombineAll(ages);
        Console.WriteLine($"Total age: {totalAge}");

        // 3. Collect all names using semigroup
        var nameArrays = data.Select(person => new List<string> { person.Name });
        var allNames = nameListSemigroup.CombineAll(nameArrays);
        Console.WriteLine($"All names: [{string.Join(", ", allNames)}]");

        // 4. Process multiple CSV datasets and combine them
        var data2 = CsvParser.ReadCSV("name,age\nDave,28\nEve,32");
        var data3 = CsvParser.ReadCSV("name,age\nFrank,45\nGrace,29");

        var allDatasets = arraySemigroup.CombineAll(new[] { data, data2, data3 });
        Console.WriteLine($"\nCombined datasets ({allDatasets.Count} people):");
        foreach (var person in allDatasets)
        {
            Console.WriteLine($"  {person.Name}, {person.Age}");
        }

        // 5. Create summary statistics using semigroups
        var statsFromData = allDatasets.Select(p => new Stats(
            totalAge: int.Parse(p.Age),
            count: 1,
            names: new List<string> { p.Name }
        ));

        var overallStats = statsSemigroup.CombineAll(statsFromData);
        Console.WriteLine($"\nOverall statistics:");
        Console.WriteLine($"  Total age: {overallStats.TotalAge}");
        Console.WriteLine($"  Count: {overallStats.Count}");
        Console.WriteLine($"  Names: [{string.Join(", ", overallStats.Names)}]");
        Console.WriteLine($"  Average age: {(double)overallStats.TotalAge / overallStats.Count:F2}");
    }
}

Visualizing semigroups #

Understanding semigroups becomes much clearer with visual representations. Let's explore how semigroup operations work through diagrams and illustrations.

1. The fundamental semigroup operation takes two elements and
produces a third element of the same type

Semigroup Operation: S × S → S

   a ∈ S      b ∈ S
     │          │
     └────┬─────┘
          │
          ∘ (binary operation)
          │
          ▼
        c ∈ S

Where: c = a ∘ b

The key property of semigroups is associativity.
The order of grouping doesn't matter:

Associativity: (a ∘ b) ∘ c = a ∘ (b ∘ c)

Left-associative grouping:          Right-associative grouping:
      (a ∘ b) ∘ c                         a ∘ (b ∘ c)

        a   b   c                           a   b   c
        │   │   │                           │   │   │
        └─┬─┘   │                           │   └─┬─┘
          │     │                           │     │
          ∘     │              =            │     ∘
          │     │                           │     │
          └──┬──┘                           └──┬──┘
             │                                 │
             ∘                                 ∘
             │                                 │
             ▼                                 ▼
           result                            result

Both paths produce the same result!

2. Triangle example

Triangles with centroids `c`, `d`, and `c + d`

```txt
        T_c     +      T_d       =       T_(c + d)

         /\            /\                    /\
        /| \          /| \                  /| \
       / •--\        / •--\                / •--\
      /__|___\      /__|___\              /__|___\

    centroid = c    centroid = d         centroid = c + d

The centroid is shown at the intersection of the medians; the triangle shape stays the same, only the centroid position changes.

Parallel Processing #

We mentioned parallel computations before, but have not examined them closely. Associativity is what makes semigroups so useful here. Because regrouping does not change the result, we can split a reduction into independent chunks, process those chunks separately, and combine the partial results later. The following examples compute parallel sums and illustrate a few techniques commonly used in parallel computing:

Divide and Conquer: Breaking large datasets into smaller chunks that can be processed independently.
Tree Reduction: Combining results in a tree-like fashion to minimize synchronization points.
Load Distribution: Spreading work across available CPU cores or worker threads.
Safe Parallelization: Maintaining correctness regardless of how the work is partitioned.

The key insight is that (a + b) + c = a + (b + c). We can safely split sum computations across multiple threads or workers and combine the partial results in any grouping, making parallel processing both safe and efficient.

import Control.DeepSeq (NFData)
import Control.Parallel.Strategies
import Data.List (foldl1')
import Data.Semigroup (Sum(..), getSum)

-- Parallel divide-and-conquer combiner
parallelCombine :: (NFData a, Semigroup a) => [a] -> Maybe a
parallelCombine [] = Nothing
parallelCombine [x] = Just x
parallelCombine xs =
  let mid = length xs `div` 2
      (left, right) = splitAt mid xs
      [leftResult, rightResult] =
        map parallelCombine [left, right] `using` parList rdeepseq
  in case (leftResult, rightResult) of
       (Just l, Just r) -> Just (l <> r)
       (Just l, Nothing) -> Just l
       (Nothing, Just r) -> Just r
       (Nothing, Nothing) -> Nothing

-- Parallel map-reduce using strategies
parallelMapReduce :: (NFData b, Semigroup b) => (a -> b) -> [a] -> Maybe b
parallelMapReduce f xs =
  let mapped = map f xs `using` parList rdeepseq
      chunks = chunksOf 1000 mapped  -- Process in chunks
      chunkResults = map (foldl1' (<>)) chunks `using` parList rdeepseq
  in case chunkResults of
       [] -> Nothing
       _  -> Just $ foldl1' (<>) chunkResults

-- Helper function to split list into chunks
chunksOf :: Int -> [a] -> [[a]]
chunksOf _ [] = []
chunksOf n xs =
  let (chunk, rest) = splitAt n xs
  in chunk : chunksOf n rest

-- Example usage
main :: IO ()
main = do
  let numbers = [1..1000000]
  let smallNumbers = [1..100]

  -- Demonstrate parallelCombine with smaller dataset
  let wrappedSmallNumbers = map (Sum . fromIntegral :: Int -> Sum Integer) smallNumbers
  let combineResult = parallelCombine wrappedSmallNumbers
  putStrLn $ "Parallel combine result: " ++ show (getSum <$> combineResult)

  -- Parallel sum calculation using built-in Sum semigroup
  let sumResult = parallelMapReduce (Sum . fromIntegral :: Int -> Sum Integer) numbers
  putStrLn $ "Parallel map-reduce result: " ++ show (getSum <$> sumResult)

  -- Compare with sequential
  let sequentialSum = sum numbers
  putStrLn $ "Sequential sum: " ++ show sequentialSum

  -- Show both approaches give same result for smaller dataset
  let sequentialSmall = sum smallNumbers
  putStrLn $ "Sequential small sum: " ++ show sequentialSmall

// Semigroup interface
interface Semigroup<T> {
  combine(a: T, b: T): T;
}

// Sum semigroup
const SumSemigroup: Semigroup<number> = {
  combine: (a, b) => a + b
};

// Simulate parallel processing with Promise-based chunks
async function parallelReduce<T>(
  semigroup: Semigroup<T>,
  data: T[]
): Promise<T | null> {
  if (data.length === 0) return null;
  if (data.length === 1) return data[0];

  // Split into chunks for parallel processing
  const chunkSize = Math.ceil(data.length / (navigator.hardwareConcurrency || 4));
  const chunks: T[][] = [];

  for (let i = 0; i < data.length; i += chunkSize) {
    chunks.push(data.slice(i, i + chunkSize));
  }

  // Process chunks in parallel (simulated with setTimeout)
  const chunkPromises = chunks.map(chunk =>
    new Promise<T>((resolve) => {
      setTimeout(() => {
        // Sequential reduce within each chunk
        const result = chunk.reduce(semigroup.combine);
        resolve(result);
      }, Math.random() * 100); // Simulate processing time
    })
  );

  // Wait for all chunks to complete
  const chunkResults: T[] = await Promise.all(chunkPromises);

  // Combine chunk results using a simple loop
  if (chunkResults.length === 0) return null;
  if (chunkResults.length === 1) return chunkResults[0];

  let result = chunkResults[0];
  for (let i = 1; i < chunkResults.length; i++) {
    result = semigroup.combine(result, chunkResults[i]);
  }
  return result;
}

// Map-reduce with parallel processing
async function parallelMapReduce<T, U>(
  semigroup: Semigroup<T>,
  data: U[],
  mapper: (item: U) => T
): Promise<T | null> {
  const mapped = data.map(mapper);
  return parallelReduce(semigroup, mapped);
}

// Example usage
async function demonstrateParallelSum() {
  const numbers = Array.from({length: 1000000}, (_, i) => i + 1);

  console.time('Parallel Sum');
  const parallelSum = await parallelMapReduce(SumSemigroup, numbers, x => x);
  console.timeEnd('Parallel Sum');
  console.log('Parallel sum result:', parallelSum);

  // Compare with sequential processing
  console.time('Sequential Sum');
  const sequentialSum = numbers.reduce((a, b) => a + b, 0);
  console.timeEnd('Sequential Sum');
  console.log('Sequential sum result:', sequentialSum);
}

// Run the demonstration
demonstrateParallelSum();

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

public interface ISemigroup<T>
{
    T Combine(T a, T b);
}

// Semigroup implementation for sum
public class SumSemigroup : ISemigroup<long>
{
    public long Combine(long a, long b) => a + b;
}

// Parallel processor using TPL
public class ParallelProcessor<T>
{
    private readonly ISemigroup<T> _semigroup;

    public ParallelProcessor(ISemigroup<T> semigroup)
    {
        _semigroup = semigroup;
    }

    // Parallel reduce using divide-and-conquer
    public async Task<T?> ParallelReduceAsync(IEnumerable<T> data)
    {
        var list = data.ToList();
        if (!list.Any()) return default(T);
        if (list.Count == 1) return list[0];

        return await Task.Run(() => ParallelReduceInternal(list));
    }

    private T ParallelReduceInternal(IList<T> data)
    {
        if (data.Count == 1) return data[0];
        if (data.Count == 2) return _semigroup.Combine(data[0], data[1]);

        // Split into chunks for parallel processing
        var mid = data.Count / 2;
        var left = data.Take(mid).ToList();
        var right = data.Skip(mid).ToList();

        // Process both halves in parallel
        var leftTask = Task.Run(() => ParallelReduceInternal(left));
        var rightTask = Task.Run(() => ParallelReduceInternal(right));

        Task.WaitAll(leftTask, rightTask);

        return _semigroup.Combine(leftTask.Result, rightTask.Result);
    }

    // Map-reduce with parallel processing
    public async Task<T?> ParallelMapReduceAsync<U>(
        IEnumerable<U> data,
        Func<U, T> mapper)
    {
        var list = data.ToList();
        if (!list.Any()) return default(T);

        // Determine optimal chunk size
        var processorCount = Environment.ProcessorCount;
        var chunkSize = Math.Max(1, list.Count / processorCount);

        // Split into chunks
        var chunks = list
            .Select((item, index) => new { item, index })
            .GroupBy(x => x.index / chunkSize)
            .Select(g => g.Select(x => x.item).ToList())
            .ToList();

        // Process chunks in parallel
        var chunkTasks = chunks.Select(chunk => Task.Run(() =>
        {
            var mapped = chunk.Select(mapper);
            return mapped.Aggregate(_semigroup.Combine);
        }));

        var chunkResults = await Task.WhenAll(chunkTasks);

        // Combine chunk results
        return chunkResults.Aggregate(_semigroup.Combine);
    }

    // PLINQ-based parallel processing
    public T? PLinqMapReduce<U>(IEnumerable<U> data, Func<U, T> mapper)
    {
        return data.AsParallel()
                  .Select(mapper)
                  .Aggregate(_semigroup.Combine);
    }
}

// Performance demonstration
class Program
{
    static async Task Main(string[] args)
    {
        var numbers = Enumerable.Range(1, 10_000).ToList();
        // Parallel sum processing
        var sumProcessor = new ParallelProcessor<long>(new SumSemigroup());

        var parallelSum = await sumProcessor.ParallelMapReduceAsync(numbers, x => (long)x);

        Console.WriteLine($"Parallel Sum: {parallelSum}");

        // PLINQ comparison
        var plinqSum = sumProcessor.PLinqMapReduce(numbers, x => (long)x);

        Console.WriteLine($"PLINQ Sum: {plinqSum}");

        // Sequential comparison
        var sequentialSum = numbers.Select(x => (long)x).Aggregate((a, b) => a + b);
        Console.WriteLine($"Sequential Sum: {sequentialSum}");

        // Demonstrate associativity in parallel context
        Console.WriteLine("\nDemonstrating Associativity in Parallel Processing:");

        var smallNumbers = new[] { 1, 2, 3, 4, 5, 6, 7, 8 };
        var result1 = await sumProcessor.ParallelReduceAsync(smallNumbers.Select(x => (long)x));
        var result2 = smallNumbers.Select(x => (long)x).Aggregate((a, b) => a + b);

        Console.WriteLine($"Parallel result: {result1}");
        Console.WriteLine($"Sequential result: {result2}");
        Console.WriteLine($"Results match: {result1 == result2}");
    }
}

We can visualize parallel computations as well.

Sequential Processing:

  a -> (∘) -> temp1 -> (∘) -> temp2 -> (∘) -> result
  b ->       -> c ->         -> d ->

Reduction depth: O(n)

Parallel Processing (Divide & Conquer):

                    Level 1 (Parallel)
   a ──┐                              ┌── c
       |--> (∘) --> temp1             │
   b ──┘                              |--> (∘) --> temp2
                                      │
                                  d ──┘

                    Level 2 (Combine)
            temp1 ──┐
                    |--> (∘) --> final_result
            temp2 ──┘

Reduction depth: O(log n) with enough workers

Conclusion #

Semigroups are among the most fundamental and practical algebraic structures. Their simplicity, requiring only an associative binary operation, belies their power and versatility in real-world applications.

Semigroups are useful for:

🔧 Simplicity Minimal requirements: just a set with an associative operation Easy to understand and implement across different programming languages Natural abstraction that models many common operations (concatenation, addition, multiplication, merging)

⚡ Parallel Processing Associativity enables safe parallelization without coordination overhead Support for divide-and-conquer algorithms Scalability across multiple CPU cores and distributed systems MapReduce-style computations

🧩 Composability and Modularity Combine small, simple operations into larger transformations Build larger systems from smaller, testable components

Source code #

Reference implementation (opens in a new tab)

References

Semigroup (opens in a new tab) · Back
Magma (opens in a new tab) · Back
Algebraic structure (opens in a new tab) · Back
Centroid (opens in a new tab) · Back