Process vs Threads, Hardware Threads vs Software Threads, Hyperthreading

In today’s world of high-performance and responsive software, concurrency is no longer a luxury — it’s a necessity. Whether you’re building a real-time trading engine, a game engine, or a simple web server, understanding how to execute tasks in parallel can dramatically improve performance and user experience. This is where threading comes into play.

In our previous blogs, we explored Flynn’s Taxonomy and various Parallel Programming Models — the essential building blocks of multithreading and advanced parallelism. Now that we’ve laid the groundwork, it’s time to move beyond theory and dive into the core question: What exactly is a thread, and what does it do? If you haven’t read those earlier stories yet, I highly recommend checking them out first for better context.

Threading allows programs to do more than one thing at a time: download files while updating the UI, compute results while responding to user input, or process incoming network requests in parallel. Behind the scenes, modern CPUs support multiple hardware threads, and programming languages like C++ provide powerful tools to take advantage of them.

Whether you’re new to multithreading or looking to reinforce your foundations, this story will give you a solid start. Let’s dive into the world of threads and unlock the power of parallel execution.

Highlights: Threads vs processes | Hardware threads vs Software Threads | Hyperthreading | Fork/join threading model

What Is a Thread?

One of the first hurdles for beginners learning about concurrency is understanding the difference between a thread and a process.

A process is an independent program in execution. It has its own memory space, file descriptors, and system resources, isolated from other processes. For instance, when you open a browser or a terminal, you’re launching a new process. The operating system manages processes and does not share memory space unless explicitly configured to do so.

A thread, on the other hand, is the smallest unit of execution within a process. Multiple threads can exist within the same process, running concurrently and sharing the same memory space. This shared environment allows for faster communication between threads, but also opens the door to race conditions and synchronization issues if not managed properly.

Conceptually, you can think of a thread as a lightweight process — an independent stream of execution with its own program counter, registers, and stack, but sharing heap and global memory with other threads in the same process.

When discussing threads, it’s important to distinguish between hardware threads and software threads. Although both refer to “units of execution,” they operate at very different levels in the computing stack.

What is a Hardware Thread?

A hardware thread is an execution stream directly supported by the processor. It is effectively a dedicated control unit within a core that can fetch, decode, and execute a stream of instructions independently.

Traditionally, one processor equaled one hardware thread — in other words, one control unit per physical CPU. On systems with multiple sockets (e.g., server motherboards), there would be one hardware thread per socket. But this model evolved rapidly with the introduction of multi-core and multi-threaded architectures.

In modern CPUs:

To add to the confusion, many operating systems report hardware threads or logical cores as “processors.” So, when you check your CPU information using system tools, the number of “processors” shown might refer to logical threads, not physical cores or sockets.

How do I check the number of hardware threads I have?

  1. Windows

There are several ways to check the number of hardware threads on Windows, but don’t worry, this isn’t one of those “10 ways to do it” blog posts. For quick reference, the easiest method is to open Task Manager using Ctrl + Shift + Esc. Head to the Performance tab and select CPU.

You’ll see a summary that includes the number of cores and logical processors (i.e., hardware threads). It looks something like this:

Other options, if you’d like to explore on your own:

Alternative 1: PowerShell:

(Get-WmiObject -Class Win32_Processor).NumberOfLogicalProcessors

Alternative 2: Command Prompt (requires wmic):

wmic cpu get NumberOfLogicalProcessors,NumberOfCores

2. Linux

If you’re using any flavor of Linux, the configuration of your system and the number of Hardware threads can be checked by reading the /proc/cpuinfo. The output gives one entry for each hardware thread. One entry of this file looks like this:

~$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 183
model name      : 13th Gen Intel(R) Core(TM) i5-13450HX
stepping        : 1
microcode       : 0xffffffff
cpu MHz         : 2611.201
cache size      : 20480 KB
physical id     : 0
siblings        : 16
core id         : 0
cpu cores       : 8
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 28
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni vnmi umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
vmx flags       : vnmi invvpid ept_x_only ept_ad ept_1gb tsc_offset vtpr ept vpid unrestricted_guest ept_mode_based_exec tsc_scaling usr_wait_pause
bugs            : spectre_v1 spectre_v2 spec_store_bypass swapgs retbleed eibrs_pbrsb rfds bhi
bogomips        : 5222.40
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

...

A few things to note from this:

There are a couple of alternatives to this command, which also give clearer output:

Alternative 1: lscpu

I get the following output for lscpu:

~$ lscpu | grep -E 'Core|Socket|Thread'
Model name:                           13th Gen Intel(R) Core(TM) i5-13450HX
Thread(s) per core:                   2
Core(s) per socket:                   8
Socket(s):                            1

Alternative 2: hwloc (lstopo)

Another useful tool for inspecting your system’s CPU topology is the popular Linux utility hwloc, which provides both command-line and graphical representations of your hardware layout. It’s especially handy for visualizing cores, hardware threads, cache levels, and NUMA nodes.

If hwloc is already installed, you can generate a visual map of your system’s architecture using the lstopo command:

What is Hyper-Threading?

Hyper-Threading (HT) is Intel’s implementation of Simultaneous Multithreading (SMT), allowing each physical core to run two instruction streams simultaneously. When one thread stalls (e.g., waiting on memory), the other can make use of the execution units. This leads to:

- Better CPU utilization

- Improved performance in I/O-bound or multitasking workloads

But: It doesn’t double performance — typical gains are around 15–30%.

Caution for parallel programming:

While HT benefits everyday multitasking (e.g., running multiple programs on a laptop), it can negatively affect performance in HPC or parallel workloads. Running multiple heavy threads on the same core can lead to resource contention and reduce speedup. This is why many HPC centers disable HT by default — careful thread scheduling is critical on SMT systems.

Example — i5–13450HX:

6 P-cores with HT → 12 threads

4 E-cores without HT → 4 threads

➡️ Total = 16 logical threads

Understanding Software Threads: The Foundation

Unlike hardware threads, which exist at the processor level, software threads are programming abstractions that represent independent streams of execution within a process. A software thread is essentially a lightweight execution unit that exists within a process, sharing the same address space while maintaining its own execution context.

When you create a software thread in your code, the operating system and runtime environment work together to map it onto available hardware threads for actual execution. This mapping is dynamic and depends on the thread scheduler, which determines when and where each thread runs.

The distinction between software and hardware threads is crucial. Hardware threads represent the physical execution units available on your processor, while software threads are the abstraction that programmers work with. A single hardware thread can execute multiple software threads over time through context switching, and modern systems often support thousands of software threads running concurrently.

The Evolution From Processes to Threads

To understand why threading was introduced, we must first examine the traditional process model. Operating systems historically managed processes as the primary unit of execution, where each process had:

Threading was introduced to address the limitations of this process-centric model by enabling finer-grained concurrency.

Threads provide several key advantages:

Thread Architecture and Memory Model

Each thread maintains its own private execution context while sharing certain resources with other threads in the same process.

Private Thread Resources

Each thread has its own:

Shared Resources

All threads in a process share:

This shared memory model is both a strength and a challenge. While it enables efficient communication, it also introduces the need for careful synchronization to prevent data races and ensure thread safety.

The Fork/Join Model: Structured Parallelism

The fork/join model represents the most common pattern for structured parallel programming. This model provides a clean abstraction for dividing work among multiple threads and collecting results. An execution flow of a Fork/Join model looks as such:

  1. Sequential Start: The main thread begins executing sequentially
  2. Fork Phase: When parallel work is needed, the main thread creates (forks) new threads, each starting at a specified function
  3. Parallel Execution: Both main and spawned threads execute concurrently, potentially on different hardware threads
  4. Join Phase: The main thread waits for all spawned threads to complete before continuing
  5. Sequential Continuation: Execution resumes sequentially with results from parallel work

What’s Next?

We’ve now reached the end of the third installment in this multithreading series. So far, we’ve covered the fundamental concepts of threads and processes, giving you a solid foundation to build on. In the next part, we’ll shift gears from theory to practice and explore the world of threading in action. Get ready to dive into POSIX threads (pthreads) and C++ std::thread, where we’ll write real code, analyze outputs, and understand how these libraries bring concurrency to life.

Suggested Reads

[1] Multithreaded Computer Architecture: A Summary of the State of the ART

[2] Distributed Computing: Principles, Algorithms, and Systems — Kshemkalyani and Singhal (uic.edu)