معالجة المسارات المتعددة (معمار الحاسب)

(تم التحويل من Multithreading (computer architecture))

متعدد المسارات Multithreading، تعني معالجة المسارات المتعددة، حيث تسمح لكل نواة من أنوية المعالج بالظهور كأنها نواتين، ومن ثم يظهر المعالج بأكمله، وكأنه ذا عدد مضاعف من الأنوية .

ولفهم كيفية حدوث ذلك يجب أن نعلم أن أي معالج يتكون من جزء للتحكم في البيانات وجزء آخر لتنفذيها (معالجتها)

جزء التحكم هذا يستقبل و يستقبل البيانات من و إلى الذاكرة ، كما يستطيع إلغاء تنفيذ عملية يتم معالجتها بالفعل Interruption ، ويستطيع كذلك جدولة عناوين البيانات وبعض المهم الأخرى .

يمكننا بكل بساطة إضافة تقنية multiThreading إلى أي معالج .. وذلك عن طريق مضاعفة عدد وحدات التحكم به ، فبدلاً من وحدة واحدة ، يتم جعلها اثنين، وبذلك يحتوي هذا المعالج علي وحدتي تحكم ووحدة تنفيذ واحدة فقط .

ولأن وحدة التحكم هي واجهة المعالج عند التعامل مع البيانات ، فان نظام التشغيل يظن في وجود معالجين اثنين ، بسبب وجود وحدتي تحكم ، وبهذا يبدأ في إرسال البيانات علي قسمين و توزيعها على المعالجين (في نظره)

وهنا تبرز نقطة هامة ، فهذا المعالج لا يحتوي إلا على وحدة تنفيذ واحدة فقط ، ماذا يحدث إذا بعث نظام التشغيل بقسمين مختلفين من البيانات إلى هذا المعالج ؟

القسم الأول يذهب إلى المعالج الحقيقي ، والقسم الثاني يذهب إلى المعالج الافتراضي ، ولأن المعالج يحتوي علي وحدة تنفيذ واحدة فقط ، فان القسمين يعودان إلى نفس المعالج .

و لكن المشكلة تكون بأن يتنافس القسمين المختلفين من البيانات على وحدة التنفيذ الوحيدة ..

و الحل في أن يميز نظام التشغيل بين المعالج الافتراضي والمعالج الحقيقي .. وهذا ما يحدث فعلاً ، فنظام التشغيل يقوم بجدولة بعض العمليات على المعالج الافتراضي (وجدولة هنا تعني أن تلك العمليات موجودة قيد الانتظار ولا يتم معالجتها في الوقت الحالي ) ، ثم يرسل العمليات العادية إلى المعالج الحقيقي .. عند حدوث خطأ في أحد تلك العمليات نتيجة خطأ في الذاكرة (وهذا يحدث كثيراً) ، أو عندما يطلب المعالج العمليات من الذاكرة ولا يجدها ، فإن المعالج ينتظر الحصول على البيانات الصحيحة أو المفقودة من الذاكرة ، وبذلك فقد يضيع ترددات ثمينة من زمن المعالج في انتظار لا فائدة منه .. وهنا تتدخل وحدة التحكم الثانية ، وتقوم بقطع العمليات عديمة الفائدة ، واستقبال تلك العمليات الموجودة في وضع الانتظار ، وبذلك توفر تلك الترددات الثمينة الضائعة .

وهنا نستنتج أن فائدة تقنية multiThreading هي فائدة نصفية ، فهي تعتبر كمعالج احتياطي في حالة حدوث خطأ للمعالج الأصلي .. ولا يمكننا أبداً مساواتها مع نواة حقيقة.ومع ذلك فان المعالجات التي تدعمها تحصل علي منفعة جيدة منها ، لأن الأخطاء كثيراً ما تحدث .


فالحل أن ننفذ تعليمة أخرى (تكون في حالة انتظار) غير متعلقة بالتعليمة الأولى و تنفيذها ريثما يتم تخزين ناتج التعليمة الأولى .

التحسينات التي أضافتها شركة intel لدعم الـ multithreading في المعالج Core i7 :


إن المعالج i7 هو معالج رباعي النواة Core 2 Quad .. و بعد جعل معالج i7 يعمل وفق تقنية multithreading أصبح لديه ثماني نوى : أربع نوى حقيقية (أو مادية) Physical Processors و أربع نوى افتراضية Logical Processors ، مما أدى إلى تحسين أدائه بشكل كبير (الضعف في الحالات المثالية ) و بدون أن يستنفذ طاقة إضافية . و أصبحت تسمى هذه التقنية في هذا المعالج باسم Hyperthreading و السبب هو أن هذه التقنية تحسن من تنفيذ التعليمات على التوازي .


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

نظرة عامة

The Multithreading paradigm has become more popular as efforts to further exploit instruction level parallelism have stalled since the late-1990s. This allowed the concept of Throughput Computing to re-emerge to prominence from the more specialized field of transaction processing:

  • Even though it is very difficult to further speed up a single thread or single program, most computer systems are actually multi-tasking among multiple threads or programs.
  • Techniques that would allow speed up of the overall system throughput of all tasks would be a meaningful performance gain.

The two major techniques for throughput computing are multiprocessing and multithreading.


المزايا

وتتضمن بعض المزايا:

  • If a thread gets a lot of cache misses, the other thread(s) can continue, taking advantage of the unused computing resources, which thus can lead to faster overall execution, as these resources would have been idle if only a single thread was executed.
  • If a thread cannot use all the computing resources of the CPU (because instructions depend on each other's result), running another thread permits to not leave these idle.
  • If several threads work on the same set of data, they can actually share their cache, leading to better cache usage or synchronization on its values.

العيوب

يتضمن النقد لمتعدد الخيوط:

  • Multiple threads can interfere with each other when sharing hardware resources such as caches or translation lookaside buffers (TLBs).
  • Execution times of a single-thread are not improved but can be degraded, even when only one thread is executing. This is due to slower frequencies and/or additional pipeline stages that are necessary to accommodate thread-switching hardware.
  • Hardware support for Multithreading is more visible to software, thus requiring more changes to both application programs and operating systems than Multiprocessing.

The mileage thus vary, Intel claims up to 30 percent benefits with its HyperThreading technology [1], a synthetic program just performing a loop of non-optimized dependent floating-point operations actually gets a 100 percent benefit when run in parallel. On the other hand, assembly-tuned programs using e.g. MMX or altivec extensions and performing data pre-fetches, such as good video encoders, do not suffer from cache misses or idle computing resources, and thus do not benefit from hardware multithreading and can indeed see degraded performance due to the contention on the shared resources.

Hardware techniques used to support multithreading often parallel the software techniques used for computer multitasking of computer programs.

أنواع متعدد الخيوط

Block multi-threading

المفهوم

The simplest type of multi-threading occurs when one thread runs until it is blocked by an event that normally would create a long latency stall. Such a stall might be a cache-miss that has to access off-chip memory, which might take hundreds of CPU cycles for the data to return. Instead of waiting for the stall to resolve, a threaded processor would switch execution to another thread that was ready to run. Only when the data for the previous thread had arrived, would the previous thread be placed back on the list of ready-to-run threads.

مثال:

  1. Cycle i  : instruction j from thread A is issued
  2. Cycle i+1: instruction j+1 from thread A is issued
  3. Cycle i+2: instruction j+2 from thread A is issued, load instruction which misses in all caches
  4. Cycle i+3: thread scheduler invoked, switches to thread B
  5. Cycle i+4: instruction k from thread B is issued
  6. Cycle i+5: instruction k+1 from thread B is issued

Conceptually, it is similar to cooperative multi-tasking used in real-time operating systems in which tasks voluntarily give up execution time when they need to wait upon some type of the event.

المصطلح

This type of multi threading is known as Block or Cooperative or Coarse-grained multithreading.

تكلفة الجهاز

The goal of multi-threading hardware support is to allow quick switching between a blocked thread and another thread ready to run. To achieve this goal, the hardware cost is to replicate the program visible registers as well as some processor control registers (such as the program counter). Switching from one thread to another thread means the hardware switches from using one register set to another.

Such additional hardware has these benefits:

  • The thread switch can be done in one CPU cycle.
  • It appears to each thread that it is executing alone and not sharing any hardware resources with any other threads. This minimizes the amount of software changes needed within the application as well as the operating system to support multithreading.

In order to switch efficiently between active threads, each active thread needs to have its own register set. For example, to quickly switch between two threads, the register hardware needs to be instantiated twice.

أمثلة

Interleaved multi-threading

[original research?]

  1. Cycle i+1: an instruction from thread B is issued
  2. Cycle i+2: an instruction from thread C is issued

The purpose of this type of multithreading is to remove all data dependency stalls from the execution pipeline. Since one thread is relatively independent from other threads, there's less chance of one instruction in one pipe stage needing an output from an older instruction in the pipeline.

Conceptually, it is similar to pre-emptive multi-tasking used in operating systems. One can make the analogy that the time-slice given to each active thread is one CPU cycle.

المصطلح

This type of multithreading was first called Barrel processing, in which the staves of a barrel represent the pipeline stages and their executing threads. Interleaved or Pre-emptive or Fine-grained or time-sliced multithreading are more modern terminology.

تكلفة الجهاز

In addition to the hardware costs discussed in the Block type of multithreading, interleaved multithreading has an additional cost of each pipeline stage tracking the thread ID of the instruction it is processing. Also, since there are more threads being executed concurrently in the pipeline, shared resources such as caches and TLBs need to be larger to avoid thrashing between the different threads.

Simultaneous multi-threading


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

المفهوم

The most advanced type of multi-threading applies to superscalar processors. A normal superscalar processor issues multiple instructions from a single thread every CPU cycle. In Simultaneous Multi-threading (SMT), the superscalar processor can issue instructions from multiple threads every CPU cycle. Recognizing that any single thread has a limited amount of instruction level parallelism, this type of multithreading tries to exploit parallelism available across multiple threads to decrease the waste associated with unused issue slots.

مثال:

  1. Cycle i  : instructions j and j+1 from thread A; instruction k from thread B all simultaneously issued
  2. Cycle i+1: instruction j+2 from thread A; instruction k+1 from thread B; instruction m from thread C all simultaneously issued
  3. Cycle i+2: instruction j+3 from thread A; instructions m+1 and m+2 from thread C all simultaneously issued

المصطلح

To distinguish the other types of multithreading from SMT, the term Temporal multithreading is used to denote when instructions from only one thread can be issued at a time.

تكلفة الجهاز

In addition to the hardware costs discussed for interleaved multithreading, SMT has the additional cost of each pipeline stage tracking the Thread ID of each instruction being processed. Again, shared resources such as caches and TLBs have to be sized for the large number of active threads.

مثال

Implementation specifics

A major area of research is the thread scheduler which must quickly choose among the list of ready-to-run threads to execute next as well as maintain the ready-to-run and stalled thread lists. An important sub-topic is the different thread priority schemes that can be used by the scheduler. The thread scheduler might be implemented totally in software or totally in hardware or as a hw/sw combination.

Another area of research is what type of events should cause a thread switch - cache misses, inter-thread communication, DMA completion, etc.

If the multithreading scheme replicates all software visible state, include privileged control registers, TLBs, etc., then it enables virtual machines to be created for each thread. This allows each thread to run its own operating system on the same processor. On the other hand, if only user-mode state is saved, less hardware is required which would allow for more threads to be active at one time for the same die-area/cost.

المصادر

انظر أيضا

الكلمات الدالة: