sia.hackernoon.com

Process vs Threads, Hardware Threads vs Software Threads, Hyperthreading

In today’s world of high-performance and responsive software, concurrency is no longer a luxury — it’s a necessity. Whether you’re building a real-time trading engine, a game engine, or a simple web server, understanding how to execute tasks in parallel can dramatically improve performance and user experience. This is where threading comes into play.

In our previous blogs, we explored Flynn’s Taxonomy and various Parallel Programming Models — the essential building blocks of multithreading and advanced parallelism. Now that we’ve laid the groundwork, it’s time to move beyond theory and dive into the core question: What exactly is a thread, and what does it do? If you haven’t read those earlier stories yet, I highly recommend checking them out first for better context.

Threading allows programs to do more than one thing at a time: download files while updating the UI, compute results while responding to user input, or process incoming network requests in parallel. Behind the scenes, modern CPUs support multiple hardware threads, and programming languages like C++ provide powerful tools to take advantage of them.

Whether you’re new to multithreading or looking to reinforce your foundations, this story will give you a solid start. Let’s dive into the world of threads and unlock the power of parallel execution.

Highlights: Threads vs processes | Hardware threads vs Software Threads | Hyperthreading | Fork/join threading model

What Is a Thread?

One of the first hurdles for beginners learning about concurrency is understanding the difference between a thread and a process.

A process is an independent program in execution. It has its own memory space, file descriptors, and system resources, isolated from other processes. For instance, when you open a browser or a terminal, you’re launching a new process. The operating system manages processes and does not share memory space unless explicitly configured to do so.

A thread, on the other hand, is the smallest unit of execution within a process. Multiple threads can exist within the same process, running concurrently and sharing the same memory space. This shared environment allows for faster communication between threads, but also opens the door to race conditions and synchronization issues if not managed properly.

Conceptually, you can think of a thread as a lightweight process — an independent stream of execution with its own program counter, registers, and stack, but sharing heap and global memory with other threads in the same process.

When discussing threads, it’s important to distinguish between hardware threads and software threads. Although both refer to “units of execution,” they operate at very different levels in the computing stack.

What is a Hardware Thread?

A hardware thread is an execution stream directly supported by the processor. It is effectively a dedicated control unit within a core that can fetch, decode, and execute a stream of instructions independently.

Traditionally, one processor equaled one hardware thread — in other words, one control unit per physical CPU. On systems with multiple sockets (e.g., server motherboards), there would be one hardware thread per socket. But this model evolved rapidly with the introduction of multi-core and multi-threaded architectures.

In modern CPUs:

Each core contains at least one hardware thread.
With Simultaneous Multithreading (SMT) — Intel’s version called Hyper-Threading — a single core can support multiple hardware threads.
Now, one processor (socket) may have multiple cores, and each core may have multiple hardware threads.

To add to the confusion, many operating systems report hardware threads or logical cores as “processors.” So, when you check your CPU information using system tools, the number of “processors” shown might refer to logical threads, not physical cores or sockets.

How do I check the number of hardware threads I have?

Windows

There are several ways to check the number of hardware threads on Windows, but don’t worry, this isn’t one of those “10 ways to do it” blog posts. For quick reference, the easiest method is to open Task Manager using Ctrl + Shift + Esc. Head to the Performance tab and select CPU.

You’ll see a summary that includes the number of cores and logical processors (i.e., hardware threads). It looks something like this:

Other options, if you’d like to explore on your own:

Alternative 1: PowerShell:

(Get-WmiObject -Class Win32_Processor).NumberOfLogicalProcessors

Alternative 2: Command Prompt (requires wmic):

wmic cpu get NumberOfLogicalProcessors,NumberOfCores

2. Linux

If you’re using any flavor of Linux, the configuration of your system and the number of Hardware threads can be checked by reading the /proc/cpuinfo. The output gives one entry for each hardware thread. One entry of this file looks like this:

~$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 183
model name      : 13th Gen Intel(R) Core(TM) i5-13450HX
stepping        : 1
microcode       : 0xffffffff
cpu MHz         : 2611.201
cache size      : 20480 KB
physical id     : 0
siblings        : 16
core id         : 0
cpu cores       : 8
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 28
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni vnmi umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
vmx flags       : vnmi invvpid ept_x_only ept_ad ept_1gb tsc_offset vtpr ept vpid unrestricted_guest ept_mode_based_exec tsc_scaling usr_wait_pause
bugs            : spectre_v1 spectre_v2 spec_store_bypass swapgs retbleed eibrs_pbrsb rfds bhi
bogomips        : 5222.40
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

...

A few things to note from this:

cpu cores: This might be misleading for hybrid architectures, as this number may not reflect the actual core count (e.g., my Intel i5-13th Gen has 10 cores: 6 P-cores + 4 E-cores, but it shows 8 cores).
siblings: Refers to the total number of logical threads.
The presence of the ht flag (short for Hyper-Threading) confirms that SMT is supported.

There are a couple of alternatives to this command, which also give clearer output:

Alternative 1: lscpu

I get the following output for lscpu:

~$ lscpu | grep -E 'Core|Socket|Thread'
Model name:                           13th Gen Intel(R) Core(TM) i5-13450HX
Thread(s) per core:                   2
Core(s) per socket:                   8
Socket(s):                            1

Alternative 2: hwloc (lstopo)

Another useful tool for inspecting your system’s CPU topology is the popular Linux utility hwloc, which provides both command-line and graphical representations of your hardware layout. It’s especially handy for visualizing cores, hardware threads, cache levels, and NUMA nodes.

If hwloc is already installed, you can generate a visual map of your system’s architecture using the lstopo command:

What is Hyper-Threading?

Hyper-Threading (HT) is Intel’s implementation of Simultaneous Multithreading (SMT), allowing each physical core to run two instruction streams simultaneously. When one thread stalls (e.g., waiting on memory), the other can make use of the execution units. This leads to:

- Better CPU utilization

- Improved performance in I/O-bound or multitasking workloads

⚠ But: It doesn’t double performance — typical gains are around 15–30%.

Caution for parallel programming:
While HT benefits everyday multitasking (e.g., running multiple programs on a laptop), it can negatively affect performance in HPC or parallel workloads. Running multiple heavy threads on the same core can lead to resource contention and reduce speedup. This is why many HPC centers disable HT by default — careful thread scheduling is critical on SMT systems.

Example — i5–13450HX:

6 P-cores with HT → 12 threads

4 E-cores without HT → 4 threads

➡️ Total = 16 logical threads

Understanding Software Threads: The Foundation

Unlike hardware threads, which exist at the processor level, software threads are programming abstractions that represent independent streams of execution within a process. A software thread is essentially a lightweight execution unit that exists within a process, sharing the same address space while maintaining its own execution context.

When you create a software thread in your code, the operating system and runtime environment work together to map it onto available hardware threads for actual execution. This mapping is dynamic and depends on the thread scheduler, which determines when and where each thread runs.

The distinction between software and hardware threads is crucial. Hardware threads represent the physical execution units available on your processor, while software threads are the abstraction that programmers work with. A single hardware thread can execute multiple software threads over time through context switching, and modern systems often support thousands of software threads running concurrently.

The Evolution From Processes to Threads

To understand why threading was introduced, we must first examine the traditional process model. Operating systems historically managed processes as the primary unit of execution, where each process had:

One execution stream running sequentially
Complete isolation with separate address spaces, file descriptors, and user IDs
Communication only through Inter-Process Communication (IPC) mechanisms
Heavy resource overhead due to complete process duplication

Threading was introduced to address the limitations of this process-centric model by enabling finer-grained concurrency.

Threads provide several key advantages:

Simplified Data Sharing

Unlike processes, threads within the same process share the same address space, heap, and global variables. This eliminates the need for complex IPC mechanisms and allows for more efficient communication between concurrent execution units.

Resource Efficiency

Creating a thread requires significantly fewer resources than creating a process. Thread creation typically requires only 64KB for the thread’s private data area and two system calls, while process creation involves duplicating the entire parent process address space.

Enhanced Responsiveness

Threads enable asynchronous behavior patterns that are essential for modern applications. Consider a web browser: one thread handles the user interface, another manages network requests, while others handle rendering and background tasks. This separation ensures that the interface remains responsive even when heavy operations are running.

Operating System Level Scheduling

Threads still benefit from OS-level scheduling features, including preemption (the ability to interrupt a thread) and fair progress guarantees among threads. This provides the balance between user control and system management.

Thread Architecture and Memory Model

Each thread maintains its own private execution context while sharing certain resources with other threads in the same process.

Private Thread Resources

Each thread has its own:

Thread Control Block (TCB) containing thread ID, program counter, register set, and scheduling information
Stack memory for local variables and function call management
Program Counter tracking the current instruction being executed

Shared Resources

All threads in a process share:

Code section containing the program instructions
Data section with global and static variables
Heap memory for dynamically allocated data
File descriptors and other system resources

This shared memory model is both a strength and a challenge. While it enables efficient communication, it also introduces the need for careful synchronization to prevent data races and ensure thread safety.

The Fork/Join Model: Structured Parallelism

The fork/join model represents the most common pattern for structured parallel programming. This model provides a clean abstraction for dividing work among multiple threads and collecting results. An execution flow of a Fork/Join model looks as such:

Sequential Start: The main thread begins executing sequentially
Fork Phase: When parallel work is needed, the main thread creates (forks) new threads, each starting at a specified function
Parallel Execution: Both main and spawned threads execute concurrently, potentially on different hardware threads
Join Phase: The main thread waits for all spawned threads to complete before continuing
Sequential Continuation: Execution resumes sequentially with results from parallel work

What’s Next?

We’ve now reached the end of the third installment in this multithreading series. So far, we’ve covered the fundamental concepts of threads and processes, giving you a solid foundation to build on. In the next part, we’ll shift gears from theory to practice and explore the world of threading in action. Get ready to dive into POSIX threads (pthreads) and C++ std::thread, where we’ll write real code, analyze outputs, and understand how these libraries bring concurrency to life.

Learning About Threads: An Essential Guide for Developers