SlideShare a Scribd company logo
1 of 4
Download to read offline
Shared memory Parallelism
Subhajit Sahu
Abstract — Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Symmetric vs Asymmetric Multiprocessing.
Index terms — Symmetric multiprocessing, Asymmetric multiprocessing.
1. Shared memory
Shared memory is memory that may be simultaneously accessed by multiple programs
with an intent to provide communication among them or avoid redundant copies. Shared
memory is an efficient means of passing data between programs. Using memory for
communication inside a single program, e.g. among its multiple threads, is also referred to
as shared memory [REF].
Shared memory systems may use [REF]:
- Uniform Memory Access (UMA): all the processors share the physical memory uniformly;
- Non-Uniform Memory Access (NUMA): memory access time depends on the memory
location relative to a processor;
- Cache-Only Memory Architecture (COMA): the local memories for the processors at each
node is used as cache instead of as actual main memory.
A shared memory system is relatively easy to program since all processors share a single
view of data and the communication between processors can be as fast as memory
accesses to the same location. The issue with shared memory systems is that many CPUs
need fast access to memory and will likely cache memory, which has two complications
[REF]:
- Access time degradation: when several processors try to access the same memory
location it causes contention. Trying to access nearby memory locations may cause false
sharing. Shared memory computers cannot scale very well. Most of them have ten or
fewer processors;
- Lack of data coherence: whenever one cache is updated with information that may be
used by other processors, the change needs to be reflected to the other processors,
otherwise the different processors will be working with incoherent data. Such cache
coherence protocols can, when they work well, provide extremely high-performance
access to shared information between multiple processors. On the other hand, they can
sometimes become overloaded and become a bottleneck to performance.
In case of a Heterogeneous System Architecture (processor architecture that integrates
different types of processors, such as CPUs and GPUs, with shared memory), the memory
management unit (MMU) of the CPU and the input–output memory management unit
(IOMMU) of the GPU have to share certain characteristics, like a common address space.
Compared to multiple address space operating systems, memory sharing -- especially of
sharing procedures or pointer-based structures -- is simpler in single address space
operating systems. The alternatives to shared memory are distributed memory and
distributed shared memory, each having a similar set of issues [REF].
2. Multi-process parallelism
A crash in a thread takes down the whole process. And you probably wouldn't want it any
other way since a crash signal (like SIGSEGV, SIGBUS, SIGABRT) means that you lost
control over the behavior of the process and anything could have happened to its memory.
So if you want to isolate things, spawning separate processes is definitely better [REF]. Two
or more processes can share data between each other efficiently through the use of
interprocess shared memory. In most cases, you need to use interprocess synchronization
mechanisms (such as a semaphore, or a mutex) [REF]. For iterative algorithms such as
PageRank (barrier-free), this synchronization may not be necessary [SAHU]. When a
process dies, the shared memory is left as it is. It is mapped as a file under /dev/shm/
directory. It is removed either when the system reboots, or when all processes un-mmap
the shared memory file and the shm_unlink() is called [REF]. What if the cause of the crash
was a write to a rogue pointer stomping through the shared memory? You'll never be 100%
safe [REF].
Interprocess synchronization can work differently on different operating systems. What
happens when a process that has locked the shared memory crashes? Windows frees the
locked named mutex after a process crash whereas Linux doesn't free it [REF]. The
advantage of Windows is that the waiting thread is freed to continue. The disadvantage is
that it has no idea what the state of the shared memory is—the crashed process may have
been part way through updates. In practice the only safe thing to do is to reset the shared
memory in some way (assuming that can be done meaningfully) or to fail [REF].
Threads require significantly less overhead for switching and for communicating. A process
context switch requires changing the system’s current view of virtual memory, which is a
time-consuming operation. Switching from one thread to another is a lot faster. For two
processes to exchange data, they have to initiate interprocess communication (IPC), which
requires asking the kernel for help. IPC generally involves performing at least one context
switch. Since all threads in a process share the heap, data, and code, there is no need to get
the kernel involved. The threads can communicate directly by reading and writing global
variables [REF].
At the same time, the lack of isolation between threads in a single process can also lead to
some disadvantages. If one thread crashes due to a segmentation fault or other error, all
other threads and the entire process are killed. This situation can lead to data and system
corruption if the other threads were in the middle of some important task. As an example,
assume that the application is a server program with one thread responsible for logging all
requests and a separate thread for handling the requests. If the request handler thread
crashes before the logging thread has a chance to write the request to its log file, there
would be no record of the request. The system administrators left to determine what went
wrong would have no information about the request, so they may waste a lot of time
validating other requests that were all good [REF].
3. Heterogeneous Multiprocessing
Why do we have CPUs with all the cores at the same speeds and not combinations of
different speeds [REF]? This is known as heterogeneous multiprocessing (HMP) and is
widely adopted by mobile devices. In ARM-based devices which implement big.LITTLE, the
processor contains cores with different performance and power profiles, e.g. some cores run
fast but draw lots of power (faster architecture and/or higher clocks) while others are
energy-efficient but slow (slower architecture and/or lower clocks). This is useful because
power usage tends to increase disproportionately as you increase performance once you
get past a certain point. The idea here is to get performance when you need it and battery
life when you don't [REF].
On desktop platforms, power consumption is much less of an issue so this is not truly
necessary. Most applications expect each core to have similar performance characteristics,
and scheduling processes for HMP systems is much more complex than scheduling for
traditional SMP systems. (Windows 10 technically has support for HMP, but it's mainly
intended for mobile devices that use ARM big.LITTLE.) Also, most desktop and laptop
processors today are not thermally or electrically limited to the point where some cores
need to run faster than others even for short bursts. We've basically hit a wall on how fast
we can make individual cores, so replacing some cores with slower ones won't allow the
remaining cores to run faster [REF].
While there are a few desktop processors that have one or two cores capable of running
faster than the others, this capability is currently limited to certain very high-end Intel
processors (as Turbo Boost Max Technology 3.0) and only involves a slight gain in
performance for those cores that can run faster. While it is certainly possible to design a
traditional x86 processor with both large, fast cores and smaller, slower cores to optimize
for heavily-threaded workloads, this would add considerable complexity to the processor
design and applications are unlikely to properly support it [REF].
Take a hypothetical processor with two fast Kaby Lake (7th-generation Core) cores and
eight slow Goldmont (Atom) cores. You'd have a total of 10 cores, and heavily-threaded
workloads optimized for this kind of processor may see a gain in performance and efficiency
over a normal quad-core Kaby Lake processor. However, the different types of cores have
wildly different performance levels, and the slow cores don't even support some of the
instructions the fast cores support, like AVX (ARM avoids this issue by requiring both the
big and LITTLE cores to support the same instructions) [REF].
Again, most Windows-based multithreaded applications assume that every core has the
same or nearly the same level of performance and can execute the same instructions, so
this kind of asymmetry is likely to result in less-than-ideal performance, perhaps even
crashes if it uses instructions not supported by the slow cores. While Intel could modify the
slow cores to add advanced instruction support so that all cores can execute all
instructions, this would not resolve issues with software support for heterogeneous
processors [REF].
A different approach to application design, closer to what you're probably thinking about in
your question, would use the GPU for acceleration of highly parallel portions of
applications. This can be done using APIs like OpenCL and CUDA. As for a single-chip
solution, AMD promotes hardware support for GPU acceleration in its APUs, which
combine a traditional CPU and a high-performance integrated GPU onto the same chip, as
Heterogeneous System Architecture, though this has not seen much industry uptake
outside of a few specialized applications [REF].

More Related Content

Similar to Shared memory Parallelism (NOTES)

Multiprocessor Scheduling
Multiprocessor SchedulingMultiprocessor Scheduling
Multiprocessor SchedulingKhadija Saleem
 
PARALLEL ARCHITECTURE AND COMPUTING - SHORT NOTES
PARALLEL ARCHITECTURE AND COMPUTING - SHORT NOTESPARALLEL ARCHITECTURE AND COMPUTING - SHORT NOTES
PARALLEL ARCHITECTURE AND COMPUTING - SHORT NOTESsuthi
 
Operating Systems R20 Unit 2.pptx
Operating Systems R20 Unit 2.pptxOperating Systems R20 Unit 2.pptx
Operating Systems R20 Unit 2.pptxPrudhvi668506
 
Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Nakul Manchanda
 
What Every Programmer Should Know About Memory
What Every Programmer Should Know About MemoryWhat Every Programmer Should Know About Memory
What Every Programmer Should Know About MemoryYing wei (Joe) Chou
 
Distributed Shared Memory
Distributed Shared MemoryDistributed Shared Memory
Distributed Shared MemoryPrakhar Rastogi
 
Parallel Processing (Part 2)
Parallel Processing (Part 2)Parallel Processing (Part 2)
Parallel Processing (Part 2)Ajeng Savitri
 
Session1 intro to_os
Session1 intro to_osSession1 intro to_os
Session1 intro to_osKalyani Patil
 
COMPARATIVE ANALYSIS OF SINGLE-CORE AND MULTI-CORE SYSTEMS
COMPARATIVE ANALYSIS OF SINGLE-CORE AND MULTI-CORE SYSTEMSCOMPARATIVE ANALYSIS OF SINGLE-CORE AND MULTI-CORE SYSTEMS
COMPARATIVE ANALYSIS OF SINGLE-CORE AND MULTI-CORE SYSTEMSijcsit
 

Similar to Shared memory Parallelism (NOTES) (20)

Chapter 10
Chapter 10Chapter 10
Chapter 10
 
ppt
pptppt
ppt
 
Multiprocessor Scheduling
Multiprocessor SchedulingMultiprocessor Scheduling
Multiprocessor Scheduling
 
PARALLEL ARCHITECTURE AND COMPUTING - SHORT NOTES
PARALLEL ARCHITECTURE AND COMPUTING - SHORT NOTESPARALLEL ARCHITECTURE AND COMPUTING - SHORT NOTES
PARALLEL ARCHITECTURE AND COMPUTING - SHORT NOTES
 
Unit v
Unit vUnit v
Unit v
 
Memory comp
Memory compMemory comp
Memory comp
 
Operating Systems R20 Unit 2.pptx
Operating Systems R20 Unit 2.pptxOperating Systems R20 Unit 2.pptx
Operating Systems R20 Unit 2.pptx
 
CH08.pdf
CH08.pdfCH08.pdf
CH08.pdf
 
Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)
 
What Every Programmer Should Know About Memory
What Every Programmer Should Know About MemoryWhat Every Programmer Should Know About Memory
What Every Programmer Should Know About Memory
 
Opetating System Memory management
Opetating System Memory managementOpetating System Memory management
Opetating System Memory management
 
Os
OsOs
Os
 
Distributed Shared Memory
Distributed Shared MemoryDistributed Shared Memory
Distributed Shared Memory
 
Parallel Processing (Part 2)
Parallel Processing (Part 2)Parallel Processing (Part 2)
Parallel Processing (Part 2)
 
Session1 intro to_os
Session1 intro to_osSession1 intro to_os
Session1 intro to_os
 
Multi-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IKMulti-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IK
 
COMPARATIVE ANALYSIS OF SINGLE-CORE AND MULTI-CORE SYSTEMS
COMPARATIVE ANALYSIS OF SINGLE-CORE AND MULTI-CORE SYSTEMSCOMPARATIVE ANALYSIS OF SINGLE-CORE AND MULTI-CORE SYSTEMS
COMPARATIVE ANALYSIS OF SINGLE-CORE AND MULTI-CORE SYSTEMS
 
4.Process.ppt
4.Process.ppt4.Process.ppt
4.Process.ppt
 
Compiler design
Compiler designCompiler design
Compiler design
 
ch1.ppt
ch1.pptch1.ppt
ch1.ppt
 

More from Subhajit Sahu

DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESDyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESSubhajit Sahu
 
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESA Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESSubhajit Sahu
 
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESScalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESSubhajit Sahu
 
Application Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTESApplication Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTESSubhajit Sahu
 
Community Detection on the GPU : NOTES
Community Detection on the GPU : NOTESCommunity Detection on the GPU : NOTES
Community Detection on the GPU : NOTESSubhajit Sahu
 
Survey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTESSurvey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTESSubhajit Sahu
 
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTERDynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTERSubhajit Sahu
 
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...Subhajit Sahu
 
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESFast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESSubhajit Sahu
 
Can you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTESCan you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTESSubhajit Sahu
 
HITS algorithm : NOTES
HITS algorithm : NOTESHITS algorithm : NOTES
HITS algorithm : NOTESSubhajit Sahu
 
Basic Computer Architecture and the Case for GPUs : NOTES
Basic Computer Architecture and the Case for GPUs : NOTESBasic Computer Architecture and the Case for GPUs : NOTES
Basic Computer Architecture and the Case for GPUs : NOTESSubhajit Sahu
 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESDynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESSubhajit Sahu
 
Are Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTESAre Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTESSubhajit Sahu
 
Taxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTESTaxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTESSubhajit Sahu
 
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESA Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESSubhajit Sahu
 
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...Subhajit Sahu
 
Income Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTESIncome Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTESSubhajit Sahu
 
Youngistaan Foundation: Annual Report 2020-21 : NOTES
Youngistaan Foundation: Annual Report 2020-21 : NOTESYoungistaan Foundation: Annual Report 2020-21 : NOTES
Youngistaan Foundation: Annual Report 2020-21 : NOTESSubhajit Sahu
 
Youngistaan: Voting awarness-campaign : NOTES
Youngistaan: Voting awarness-campaign : NOTESYoungistaan: Voting awarness-campaign : NOTES
Youngistaan: Voting awarness-campaign : NOTESSubhajit Sahu
 

More from Subhajit Sahu (20)

DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESDyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
 
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESA Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
 
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESScalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
 
Application Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTESApplication Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTES
 
Community Detection on the GPU : NOTES
Community Detection on the GPU : NOTESCommunity Detection on the GPU : NOTES
Community Detection on the GPU : NOTES
 
Survey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTESSurvey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTES
 
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTERDynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
 
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
 
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESFast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTES
 
Can you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTESCan you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTES
 
HITS algorithm : NOTES
HITS algorithm : NOTESHITS algorithm : NOTES
HITS algorithm : NOTES
 
Basic Computer Architecture and the Case for GPUs : NOTES
Basic Computer Architecture and the Case for GPUs : NOTESBasic Computer Architecture and the Case for GPUs : NOTES
Basic Computer Architecture and the Case for GPUs : NOTES
 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESDynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
 
Are Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTESAre Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTES
 
Taxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTESTaxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTES
 
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESA Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTES
 
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
 
Income Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTESIncome Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTES
 
Youngistaan Foundation: Annual Report 2020-21 : NOTES
Youngistaan Foundation: Annual Report 2020-21 : NOTESYoungistaan Foundation: Annual Report 2020-21 : NOTES
Youngistaan Foundation: Annual Report 2020-21 : NOTES
 
Youngistaan: Voting awarness-campaign : NOTES
Youngistaan: Voting awarness-campaign : NOTESYoungistaan: Voting awarness-campaign : NOTES
Youngistaan: Voting awarness-campaign : NOTES
 

Recently uploaded

High Profile Call Girls In Andheri 7738631006 Call girls in mumbai Mumbai ...
High Profile Call Girls In Andheri 7738631006 Call girls in mumbai  Mumbai ...High Profile Call Girls In Andheri 7738631006 Call girls in mumbai  Mumbai ...
High Profile Call Girls In Andheri 7738631006 Call girls in mumbai Mumbai ...Pooja Nehwal
 
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /WhatsappsBeautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsappssapnasaifi408
 
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...anilsa9823
 
VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...Suhani Kapoor
 
如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一
如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一
如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一ga6c6bdl
 
Thane Escorts, (Pooja 09892124323), Thane Call Girls
Thane Escorts, (Pooja 09892124323), Thane Call GirlsThane Escorts, (Pooja 09892124323), Thane Call Girls
Thane Escorts, (Pooja 09892124323), Thane Call GirlsPooja Nehwal
 
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样qaffana
 
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一ga6c6bdl
 
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Top Rated Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Call Girls in Nagpur High Profile
 
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)kojalkojal131
 
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311  Call Girls in Thane , Independent Escort Service ThanePallawi 9167673311  Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service ThanePooja Nehwal
 
9892124323 Pooja Nehwal Call Girls Services Call Girls service in Santacruz A...
9892124323 Pooja Nehwal Call Girls Services Call Girls service in Santacruz A...9892124323 Pooja Nehwal Call Girls Services Call Girls service in Santacruz A...
9892124323 Pooja Nehwal Call Girls Services Call Girls service in Santacruz A...Pooja Nehwal
 
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...Pooja Nehwal
 
Call Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Slim Call Girls Service Badshah Nagar * 9548273370 Naughty Call Girls Service...
Slim Call Girls Service Badshah Nagar * 9548273370 Naughty Call Girls Service...Slim Call Girls Service Badshah Nagar * 9548273370 Naughty Call Girls Service...
Slim Call Girls Service Badshah Nagar * 9548273370 Naughty Call Girls Service...nagunakhan
 
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...Call Girls in Nagpur High Profile
 
Gaya Call Girls #9907093804 Contact Number Escorts Service Gaya
Gaya Call Girls #9907093804 Contact Number Escorts Service GayaGaya Call Girls #9907093804 Contact Number Escorts Service Gaya
Gaya Call Girls #9907093804 Contact Number Escorts Service Gayasrsj9000
 

Recently uploaded (20)

High Profile Call Girls In Andheri 7738631006 Call girls in mumbai Mumbai ...
High Profile Call Girls In Andheri 7738631006 Call girls in mumbai  Mumbai ...High Profile Call Girls In Andheri 7738631006 Call girls in mumbai  Mumbai ...
High Profile Call Girls In Andheri 7738631006 Call girls in mumbai Mumbai ...
 
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /WhatsappsBeautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
 
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
 
VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
 
如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一
如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一
如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一
 
Thane Escorts, (Pooja 09892124323), Thane Call Girls
Thane Escorts, (Pooja 09892124323), Thane Call GirlsThane Escorts, (Pooja 09892124323), Thane Call Girls
Thane Escorts, (Pooja 09892124323), Thane Call Girls
 
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
 
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
 
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
🔝 9953056974🔝 Delhi Call Girls in Ajmeri Gate
🔝 9953056974🔝 Delhi Call Girls in Ajmeri Gate🔝 9953056974🔝 Delhi Call Girls in Ajmeri Gate
🔝 9953056974🔝 Delhi Call Girls in Ajmeri Gate
 
Top Rated Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
 
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
 
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311  Call Girls in Thane , Independent Escort Service ThanePallawi 9167673311  Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service Thane
 
9892124323 Pooja Nehwal Call Girls Services Call Girls service in Santacruz A...
9892124323 Pooja Nehwal Call Girls Services Call Girls service in Santacruz A...9892124323 Pooja Nehwal Call Girls Services Call Girls service in Santacruz A...
9892124323 Pooja Nehwal Call Girls Services Call Girls service in Santacruz A...
 
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
 
Call Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur Escorts
 
Slim Call Girls Service Badshah Nagar * 9548273370 Naughty Call Girls Service...
Slim Call Girls Service Badshah Nagar * 9548273370 Naughty Call Girls Service...Slim Call Girls Service Badshah Nagar * 9548273370 Naughty Call Girls Service...
Slim Call Girls Service Badshah Nagar * 9548273370 Naughty Call Girls Service...
 
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
 
Gaya Call Girls #9907093804 Contact Number Escorts Service Gaya
Gaya Call Girls #9907093804 Contact Number Escorts Service GayaGaya Call Girls #9907093804 Contact Number Escorts Service Gaya
Gaya Call Girls #9907093804 Contact Number Escorts Service Gaya
 

Shared memory Parallelism (NOTES)

  • 1. Shared memory Parallelism Subhajit Sahu Abstract — Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Symmetric vs Asymmetric Multiprocessing. Index terms — Symmetric multiprocessing, Asymmetric multiprocessing. 1. Shared memory Shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between programs. Using memory for communication inside a single program, e.g. among its multiple threads, is also referred to as shared memory [REF]. Shared memory systems may use [REF]: - Uniform Memory Access (UMA): all the processors share the physical memory uniformly; - Non-Uniform Memory Access (NUMA): memory access time depends on the memory location relative to a processor; - Cache-Only Memory Architecture (COMA): the local memories for the processors at each node is used as cache instead of as actual main memory. A shared memory system is relatively easy to program since all processors share a single view of data and the communication between processors can be as fast as memory accesses to the same location. The issue with shared memory systems is that many CPUs need fast access to memory and will likely cache memory, which has two complications [REF]: - Access time degradation: when several processors try to access the same memory location it causes contention. Trying to access nearby memory locations may cause false sharing. Shared memory computers cannot scale very well. Most of them have ten or fewer processors; - Lack of data coherence: whenever one cache is updated with information that may be used by other processors, the change needs to be reflected to the other processors,
  • 2. otherwise the different processors will be working with incoherent data. Such cache coherence protocols can, when they work well, provide extremely high-performance access to shared information between multiple processors. On the other hand, they can sometimes become overloaded and become a bottleneck to performance. In case of a Heterogeneous System Architecture (processor architecture that integrates different types of processors, such as CPUs and GPUs, with shared memory), the memory management unit (MMU) of the CPU and the input–output memory management unit (IOMMU) of the GPU have to share certain characteristics, like a common address space. Compared to multiple address space operating systems, memory sharing -- especially of sharing procedures or pointer-based structures -- is simpler in single address space operating systems. The alternatives to shared memory are distributed memory and distributed shared memory, each having a similar set of issues [REF]. 2. Multi-process parallelism A crash in a thread takes down the whole process. And you probably wouldn't want it any other way since a crash signal (like SIGSEGV, SIGBUS, SIGABRT) means that you lost control over the behavior of the process and anything could have happened to its memory. So if you want to isolate things, spawning separate processes is definitely better [REF]. Two or more processes can share data between each other efficiently through the use of interprocess shared memory. In most cases, you need to use interprocess synchronization mechanisms (such as a semaphore, or a mutex) [REF]. For iterative algorithms such as PageRank (barrier-free), this synchronization may not be necessary [SAHU]. When a process dies, the shared memory is left as it is. It is mapped as a file under /dev/shm/ directory. It is removed either when the system reboots, or when all processes un-mmap the shared memory file and the shm_unlink() is called [REF]. What if the cause of the crash was a write to a rogue pointer stomping through the shared memory? You'll never be 100% safe [REF]. Interprocess synchronization can work differently on different operating systems. What happens when a process that has locked the shared memory crashes? Windows frees the locked named mutex after a process crash whereas Linux doesn't free it [REF]. The advantage of Windows is that the waiting thread is freed to continue. The disadvantage is that it has no idea what the state of the shared memory is—the crashed process may have been part way through updates. In practice the only safe thing to do is to reset the shared memory in some way (assuming that can be done meaningfully) or to fail [REF].
  • 3. Threads require significantly less overhead for switching and for communicating. A process context switch requires changing the system’s current view of virtual memory, which is a time-consuming operation. Switching from one thread to another is a lot faster. For two processes to exchange data, they have to initiate interprocess communication (IPC), which requires asking the kernel for help. IPC generally involves performing at least one context switch. Since all threads in a process share the heap, data, and code, there is no need to get the kernel involved. The threads can communicate directly by reading and writing global variables [REF]. At the same time, the lack of isolation between threads in a single process can also lead to some disadvantages. If one thread crashes due to a segmentation fault or other error, all other threads and the entire process are killed. This situation can lead to data and system corruption if the other threads were in the middle of some important task. As an example, assume that the application is a server program with one thread responsible for logging all requests and a separate thread for handling the requests. If the request handler thread crashes before the logging thread has a chance to write the request to its log file, there would be no record of the request. The system administrators left to determine what went wrong would have no information about the request, so they may waste a lot of time validating other requests that were all good [REF]. 3. Heterogeneous Multiprocessing Why do we have CPUs with all the cores at the same speeds and not combinations of different speeds [REF]? This is known as heterogeneous multiprocessing (HMP) and is widely adopted by mobile devices. In ARM-based devices which implement big.LITTLE, the processor contains cores with different performance and power profiles, e.g. some cores run fast but draw lots of power (faster architecture and/or higher clocks) while others are energy-efficient but slow (slower architecture and/or lower clocks). This is useful because power usage tends to increase disproportionately as you increase performance once you get past a certain point. The idea here is to get performance when you need it and battery life when you don't [REF]. On desktop platforms, power consumption is much less of an issue so this is not truly necessary. Most applications expect each core to have similar performance characteristics, and scheduling processes for HMP systems is much more complex than scheduling for traditional SMP systems. (Windows 10 technically has support for HMP, but it's mainly intended for mobile devices that use ARM big.LITTLE.) Also, most desktop and laptop processors today are not thermally or electrically limited to the point where some cores
  • 4. need to run faster than others even for short bursts. We've basically hit a wall on how fast we can make individual cores, so replacing some cores with slower ones won't allow the remaining cores to run faster [REF]. While there are a few desktop processors that have one or two cores capable of running faster than the others, this capability is currently limited to certain very high-end Intel processors (as Turbo Boost Max Technology 3.0) and only involves a slight gain in performance for those cores that can run faster. While it is certainly possible to design a traditional x86 processor with both large, fast cores and smaller, slower cores to optimize for heavily-threaded workloads, this would add considerable complexity to the processor design and applications are unlikely to properly support it [REF]. Take a hypothetical processor with two fast Kaby Lake (7th-generation Core) cores and eight slow Goldmont (Atom) cores. You'd have a total of 10 cores, and heavily-threaded workloads optimized for this kind of processor may see a gain in performance and efficiency over a normal quad-core Kaby Lake processor. However, the different types of cores have wildly different performance levels, and the slow cores don't even support some of the instructions the fast cores support, like AVX (ARM avoids this issue by requiring both the big and LITTLE cores to support the same instructions) [REF]. Again, most Windows-based multithreaded applications assume that every core has the same or nearly the same level of performance and can execute the same instructions, so this kind of asymmetry is likely to result in less-than-ideal performance, perhaps even crashes if it uses instructions not supported by the slow cores. While Intel could modify the slow cores to add advanced instruction support so that all cores can execute all instructions, this would not resolve issues with software support for heterogeneous processors [REF]. A different approach to application design, closer to what you're probably thinking about in your question, would use the GPU for acceleration of highly parallel portions of applications. This can be done using APIs like OpenCL and CUDA. As for a single-chip solution, AMD promotes hardware support for GPU acceleration in its APUs, which combine a traditional CPU and a high-performance integrated GPU onto the same chip, as Heterogeneous System Architecture, though this has not seen much industry uptake outside of a few specialized applications [REF].