Stack Computers: 6.5 INTERRUPTS AND MULTI-TASKING

Chapter 6. Understanding Stack Machines

6.5 INTERRUPTS AND MULTI-TASKING

There are three components to the performance of processing interrupts. The first component is the amount of time that elapses between the time that an interrupt request is received by the processor and the time that the processor takes action to begin processing the interrupt service routine. This delay is called interrupt latency.

The second component of interrupt service performance is interrupt processing time. This is the amount of time that the processor spends actually saving the machine state of the interrupted job and diverting execution to the interrupt service routine. Usually the amount of machine state saved is minimal, on the presumption that the interrupt service routine can minimize costs by saving only those additional registers that it plans to use. Sometimes, one sees the term "interrupt latency" used to describe the sum of these first two components.

The third component of interrupt service performance is what we shall call state saving overhead. This is the amount of time taken to save machine registers that are not automatically saved by the interrupt processing logic, but which must be saved in order for the interrupt service routine to do its job. The state saving overhead can vary considerably, depending upon the complexity of the interrupt service routine. In the extreme case, state saving overhead can involve a complete context switch between multi-tasking jobs.

Of course, the costs of restoring all the machine state and returning to the interrupted routine are a consideration in determining overall system performance. We shall not consider them explicitly here, since they tend to be roughly equal to the state saving time (since everything that is saved must be restored), and are not as important in meeting a time-critical deadline for responding to an interrupt.

6.5.1 Interrupt response latency

CISC machines may have instructions which take a very long time to execute, degrading interrupt response latency performance. Stack machines, like RISC machines, can have a very quick interrupt response latency. This is because most stack machine instructions are only a single cycle long, so at worst only a few clock cycles elapse before an interrupt request is acknowledged and the interrupt is processed.

Once the interrupt is processed, however, the difference between RISC and stack machines becomes apparent. RISC machines must go through a tricky pipeline saving procedure upon recognizing an interrupt, as well as a pipeline restoring procedure when returning from the interrupt, in order to avoid losing information about partially processed instructions. Stack machines, on the other hand, have no instruction execution pipeline, so only the address of the next instruction to be executed needs to be saved. This means that stack machines can treat an interrupt as a hardware generated procedure call. Of course, since procedure calls are very fast, interrupt processing time is very low.

6.5.1.1 Instruction restartability

There is one possible problem with stack machine interrupt response latency. That is the issue of streamed instructions and microcoded loops.

Streamed instructions are used to repguitmvelq"execwdg af operation sus( as writ)ng dhe top dátq`wuackelement |o meiory® Ph%{e4iNstrÝctions aRm ymxlåmenôed`usinG an(instruÃtion rdp%cp feeture on tha ÎS00!6 and PTX :4p0( !n ijsôruct)ïn0òwffgs on(têe(M17¬!and!microcodåd loop{ on phu CTU/16 and ÒTX 9:P. Txese tramatHtEc yre0vEry usefuL singa thEy can bm 5óed"to(buil`0egficIeîd *striîg manipuLctioj primitiDes and stakk w~ eòfìkw.overflo÷ seòvice"pmuvijes. M Têe€pòo"lem /s tèat, inmost casås,0uhg{e mnstvuctéons"cpe alwo non-in|errõpti"le. =%p>

Oz% so¬]tionis$to máKe vHgsm iî3tbuctinns inverr}`ôibLeWith eptra qïntr_l hardward,(shichmEy kokrease rrocessor cMMpnåxity quite E (byt, A"popunti`lly hard prnble- thed non-{4eck2processírs"have sIth thécsolutiof`is uhm isswa oâ savhngi.verm%da`tu rtsults.!$W)tè a`spack prOCesqor tHis Is)not a"p2k"ìem, cica ij4ermelia}e results qãe alreaD; r5shôent gn q 3task, vhich0ir"the$admal mechanism for ravcnç(st`te0durilg an i&tesr4pt.( ,/X< ¼P

6.5.2 Lightweight interrupts

Let us examine three different degrees of state saving required by different interrupt categories: fast interrupts, lightweight threads for multi-tasking, and full context switching.

Fast interrupts are the kind most frequently seen at run time. These interrupts do things such as add a few milliseconds to the time-of-day counter, or copy a byte from an input port to a memory buffer. When conventional machines handle this kind of interrupt, they must usually save two or three registers in program memory to create working room in the register file. In stack machines, absolutely no state saving is required. The interrupt service routine can simply push its information on top of the stack without disturbing information from the program that was interrupted. So, for fast service interrupts, stack machines have zero state saving overhead.

Lightweight threads are tasks in a multi-tasking system which have a similar execution strategy as the interrupts just described. They can reap the benefits of multi-tasking without the cost of starting and stopping full-fledged processes. A stack machine can implement lightweight threads simply by requiring that each task run a short sequence of instructions when invoked, then relinquish control to the central task manager. This can be called non-preemptive, or cooperative task management. If each task starts and stops iôs mperation!wi|h nl parametgr3 o~ pxesp%#+(!uher% is(.n kveRhead$æor!-ckjteø|!qgitcdes bdtwaen0táros. The cOsv fïr tlis method of iultitas{ijg" is esaentiilny zero, since$a taqk mnly belinquishe{ its!c-nvrml t/p the!üisk mancger At a$logicel freak)ng `oinv hn 6hm pxoeram$`wheRe th% q5ack pr_ba"my soqld havebeåf(empty anywqy.( Äroe tH%ce tvo`ehaí les, w% CA~,see th`T"inter2}pô xRocessing0ifd Šlieh4weigxt%phsead mulÔémtás+ing0ara¨võry aneppefSivå oî suack process/rs. `The` mnly0kssue th¡t remqins ïzen(is thet nv ftll-vledgäd, preeíptive Multi-tiskéng agcmmp,k{hef with coftå|t SWiuching/$ " <-X> ) MJ6.4,³!Ãnpg8t óuitcheS<H< g¤is Usuadl9 "aseD on havhng to saöe q(tremedou{ a/ouît `gg,{tack bUffer space into p2ggRie memmri. Ôhis idua that stack$ íachines`are$any worsd at!oul4h-tcscinG t,an ôher"machiîes is$pivently `false> |P?Contexv wwitchin' y3 a pïTentiallq`exrenCive O0eraômn N ani cysuem. An RISC !Æd CKSC$ãoïpuôers with saGhg"mgnoòiUó- co~vexv r7iusHijç`can be-/re expegsire ôhan |he manufacturårs wouìd have oïa jelieve, as ! re{wlt)f hidäen agrfn:}ance0degr!tiuioni cauwed ry¨incsaasud cac(e í}s{es a'teR(vhe âontex4 swmukx. To"the!exuegt that RIÓC }akline3 use¡havge regi3tes filEó% they f`ae exactly thg samå"`roblems`that aRe$faced `}svack$machiNur® A$``Led dis!dvantage kfRISC machHles y{ that uhekr rcndom abcess"ôo registgss digTqtes saving all regasters (o2!adfmne cm-plicated hardware¨tg$dut%ct"w(içl rgGistors- a2e in usu),!hereas a stackheachine gan"Peedily saVa onlyôhe active aRea & thg rtack ftfGur. 4¯P> 6.5.;.1¼/A>4A cOnuext 3wits(inç¡ehðe2imeot<H5> ª<Ð~Table`6.7 shOWq nata gatheòed fzOm a |reóe)äriven$simu|avion oæ!phe number *of }emory cycles(speOt s`vyNG$afl rmstorIþg data stack(aleieîtó $fos Forth rrkgòams mn a!contexd rwidcèing$envmòonment. The pvogvams simulatedàwere Quee~, Lanoi, afd$q$Quick-rort pòoebam*0 Small values !çf N wePaused fob Quaen *and HanI$in orter ug kem` the runni~' tima of tlå s)mqhatkr reaqonable.0BotH dhe efgecpw ofstakk$verflow aîd underölow as çel| acàgïndeøt s7itkhi.e were m%aquRed, sance they( hn4eract!jeavéhy in such qf envirolment.% \/P>

<¯P>M 4R>TAclM 6.7. Memory"sùcles åxpended for Tata Suack spIlls rob 4`diffevenT beffer whzes and contex4 s÷ap0ing frequEjcies.

Buffew
Wize$"! tmmev=100  t)oer=500(timer}1"00 ti}er=10000 <UM
2  0 $      1'98"  `   5"      16124 8!( 9=916
0! "    (  1634       9924       9524      "$y:44
8 `       $ 8´40        3150   (0   3´30      
 ²910
10   0$     83<0 "    3;44      ¤ 3068  " `  214J36 !       11602     ! 2642`       06"0        632
20          12886       3122        1846        626
24          13120       2876        1518        330
28          14488       3058        1584        242
32          15032       3072        1556        124
36          15458       3108        1568        82

Table 6.7(a) Page-managed buffer management.

Buffer Size timer=100 timer=500 timer=1000 timer=10000 2 26424 24992 24798 24626 4 11628 8912 8548 8282 8 7504 3378 2762 2314 12 6986 1930 1286 630 16 7022 1876 1144 322 20 7022 1852 1084 180 24 7022 1880 1066 124 28 7022 1820 1062 90 32 7022 1828 1060 80 36 7022 1822 1048 80

Table 6.7(b) Demand-fed buffer management.

Figure 6.4 -- Overhead for page managed stack.

Table 6.7a and Figure 6.4 show the results for a page-managed stack. The notation "xxx CLOCKS/SWITCH" indicates the number of clock cycles between context switches. At 100 clock cycles between context switches, the number of memory cycles expended on managing the stack decreases as the buffer size increases. This is because of the effects of a reduced spilling rate while the program accesses the stack. As the buffer size increases beyond 8 elements, however, the memory traffic increases since the increasingly large buffers are constantly copied in and out of memory on context switches.

Notice how the program behaves at 500 cycles between context switches. Even at this relatively high rate (which corresponds to 20 000 context switches per second for a 10 MHz processor -- an excessively high rate in practice), the cost of context switching is oîly(abOut 0.0` clobi3 qer ilótsuctmoj°for$a wtack cuffez0si:e$ereatez than 92. $Sinbe $in¤this dxpepymeft %ach ilrtructign avEr kEd 1.680 sloCks ×)t`kut contex\ rukôaèilg ovesheed, this /nLx amoUgus tk-c"4.7% overHead/ Au 10 00 cycle{ between kolte`t switcl ha}i|l!seconlMJ"etween coîtaxt sw)tchgw), theover`eAd iq less than ±%. <'P> $P>How$it xossic,m(to hqve suchlow"oVerhe`D? Kne reas-n0is$phat( ôjg average Šstaâk eep`hhis only 12>1 elements `urkNg tje axecution¨%¯n These tlre% hmavil) r%gõrsiwe púëgvamsª`0That mdcns that,,siLce `4hur"as ndver ~ery muãh hnf/âiatiën on0tHd stag/< vePy |ittle infgrmation nduds to b% save$ on a cgntåxt ówit#è.`In fact¬ ckmpasddtO ad12-reG ster sIc machine, |hE stabk macHifg$s)m},ated indthasgxpariÍentqctUalLy" has LMs;(Stqte eo sAvEoO ` çojtext switCl. >oP4L
<P2 P>% |B>Fifur%*6.1(- Gveò(ead bos demanô ded oanñgqd sta#k.|/B> <+P>
Tablm 6.7b andDigqre 6.7 sdg7 4hd`resultg ~& thm same simulation run using a demand-fed stack management algorithm. In these results, the rise on the 100-cycle-interval curve when more than 12 elements are in the stack buffer is almost nonexistent. This is because the stack was not refilled when restoring the machine state, but rather was allowed to refill during program execution in a demand-driven fashion. For reasonable context switching frequencies (less than 1000 per second), the demand-fed strategy is somewhat better than the paged strategy, but not by an overwhelming margin.
6.5.3.2 Multiple stack spaces for multi-tasking

There is an approach that can be used with stack machines which can eliminate even the modest costs associated with context switching that we have seen. Instead of using a single large stack for all programs, high-priority/time-critical portions of a program can be assigned their own stack space. This means that each process uses a stack pointer and stack limit registers to carve out a piece of the stack for its use. Upon encountering a context switch, the process manager simply saves the current stack pointer for the process, since it already knows what the stack limits are. When the new stack pointer value and stack limit registers are loaded, the new process is ready to execute. No time at all is spent copying stack elements to and from memory.

The amount of stack memory needed by most programs is typically rather small. Furthermore, it can be guaranteed by design to be small in short, time-critical processes. So, even a modest stack buffer of 128 elements can be divided up among four processes with 32 elements each. If more than four processes are needed by the multi-tasking system, one of the buffers can be designated the low priority scratch buffer, which is to be shared using copy-in and copy-out among all the low priority tasks.

From this discussion we can see that the notion that stack processors have too large a state"vo$sqve `js(effdãviremudti tarking@is a mxth* In fact, in Jmany Easessua#k qroce#sorr kan"fe bett%r at mu,´i-tascing" and inpeRrupt procesSijg$ôh!n any o5hdr ki.d oF Komputår. Š<˜&Hayes and BRaeian(1=89- have independgntly obtainef òesulds foz !svaãKŠspÍlling afd coftaxt cwitC@hng$'ostS on
¼È? < HREÆ="cnnteîtsxtíl">< MG"SXK=¢conôents.ëif" ALT<*COÎTENTS2>  NEXD QDCT ON|/Á? >IMV SZB½"`ome.gifb`KLt="INME"><-A> Phil Coopman -= ¼A HBEF="mAéìto:klpma.PcmU/õet>koopmanPcme¢edu