CS 222/322 - Machine Organization
Take-Home Midterm Exam, Winter 1998
Due Wednesday, 18 February, 1:30 PM
7 problems on pages 2-6

Com Sci 222 and 322

  1. Some computers have hardware for arithmetic in base 10 to support business software. Base 10 is too bulky for an exam, so we'll look at base 3. Each base 3 (trinary) digit requires two binary bits. A 1-digit base 3 adder has five bits of input: tex2html_wrap_inline66 , tex2html_wrap_inline68 , tex2html_wrap_inline70 , tex2html_wrap_inline72 , and tex2html_wrap_inline74 , where

    tabular15

    The base 3 digits are represented as follows

    tabular21

    So, if tex2html_wrap_inline82 , tex2html_wrap_inline84 , tex2html_wrap_inline86 , tex2html_wrap_inline88 , and tex2html_wrap_inline90 , then we are adding the trinary digit 2 to the trinary digit 1, with no carry coming in. We never have tex2html_wrap_inline92 , nor tex2html_wrap_inline94 , since those values don't correspond to a trinary digit.

    Write a boolean formula in sum of products form, expressing the carry out ( tex2html_wrap_inline96 ) in terms of the five input bits. You should get a sum of six products, and each product involves two or three input bits.

  2. The left column below lists some choices that you might make in the design of a computer. The right column lists some types of technology whose qualities might lead to those choices. Make a one-to-one matching between the columns, showing the most likely correspondence of design choices to the types of technology affecting them most strongly.

    tabular28

  3. In problem 1.4 in the text, we found that the proposed change from a pure load/store instruction set to a machine with some register-memory instructions eliminated 10.75 load instructions per 100 load/store instructions, converting 10.75 old register-register ALU instructions to new register-memory ALU instructions, thereby saving 10.75 clock cycles. But, it paid 24 extra clock cycles per 100 load/store instructions because branches take longer, for a net increase of 13.25 cycles. Suppose that we find a way to implement the new register-memory instructions without adding a clock cycle to the execution of branch instructions, but instead by slowing down all clock cycles from 2 nanoseconds to tex2html_wrap_inline98 nanoseconds, for some s>1.

    1. For what values of s do the new register-memory instructions improve performance?
    2. Suppose that we can also add ALU operations that combine two values from registers, but store the result in memory (e.g., add R1 and R2, storing at address 1000 in memory), at the cost of slowing down clock cycles by the same factor of s. The new save-result-to-memory ALU operations allow many store instructions to be eliminated, saving one clock cycle for each. Which kinds of stores can be eliminated, and which cannot? Let n be the fraction of store instructions that can be eliminated ( tex2html_wrap_inline108 ). What relationship between s and n causes the new design to beat the original load/store design? If n=1, for what values of s does the new design win? In a program that does mainly number crunching, do you think n is likely to be nearer to 0 or to 1?

  4. Explain very briefly why a machine with a better MIPS rating may be slower than a machine with a worse MIPS rating. 2 or 3 sentences should be enough.

  5. This is a variation on problem 2.3 in the text, comparing a memory-memory instruction set to a load/store instruction set for total bytes of memory traffic.

    1. Write a program fragment consisting of a sequence of assignments in a high-level language that performs 5 integer operations. (That is, give a slightly larger example similar to the one in the statement of problem 2.3). You may use any number of variables, and you may write 5 assignments with 1 operation each, 1 assignment with 5 operations, or anything in between. Make the memory-memory instruction set beat the load/store instruction set by the largest factor that you can manage, counting total code and data. Report the total number of code bytes and date bytes transmitted for each instruction set. You do not have to write out the assembly code.
    2. Write another program fragment with the same restrictions as in part a, but this time make the load/store instruction set win by the largest factor that you can manage. Report the total number of code bytes and data bytes transmitted for each instruction set.
    3. Clearly, the relative efficiency of memory-memory vs.\ load/store depends on the program. Do loops in programs tend to tip the efficiency balance toward memory-memory, or toward load/store? Explain briefly (3 sentences can be enough).

  6. Look at the table describing the DLX pipeline that I provided on the Web. Compare a nonpipelined implementation of DLX with a pipelined version based on this table. Assume that each stage takes one clock cycle in the pipelined version, and that each nonpipelined load, store, ALU, and branch instruction takes a number of clock cycles equal to the nonidle stages in the pipelined implementation. Assume that the clock cycle is 10 nanoseconds unpipelined and 11 nanoseconds pipelined. For hazards:

    I tried to produce bad pipeline behavior. The following loop decrements R1 from 1000 to 0.

    tabular45

    There are plenty of instructions after the loop, but we don't care what they are.

    1. Compute and compare the running time for this loop in the pipelined and unpipelined machines. The pipeline loses the race by a very small margin.
    2. Point out where data and control values already computed in the simple DLX pipeline could be forwarded to speed up the pipeline on this program, and compute the improved running time.

Com Sci 322 only

  1. Figures 3.22 and 3.23 on pages 163-164 show how to use an adder in the ID stage to reduce stalls due to branches.
    1. How does the resolution of branches in the ID stage change the pipeline performance in Problem 6?
    2. For a very short loop, every instruction may be in the pipeline simultaneously. Propose a forwarding of information that takes advantage of the presence of both loop instructions in the pipeline to save one more machine cycle in the loop above. Be precise about which stage provides the forwarded information and which stage uses it. How long could a loop be and still have a reasonable chance to benefit from this trick?

About this document ...

This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html -split 0 midterm.tex.

The translation was initiated by Mike O'Donnell on Thu Feb 18 21:59:16 CST 1999


Mike O'Donnell
Thu Feb 18 21:59:16 CST 1999