USB 2.0 requires 5 m or less cables, USB 3.0 requires 3 m or less and USB 3.2 requires 1 m or less. Why?
What makes a bus? what does it do? What factors differentiate a bus from another?
What is a chipset? What is a bridge? Why north and south?
What kind of transfer speeds are there today? What is the advantage of lanes layout in PCIe bus
Modern systems do not have RT modules, how do you think they maintain the real time clock necessary for many operations
If one of the 16-bit counting modules is configured to work in the BCD, what is the maximum count it can do?
For a 3-module PTM chip, 16-bit each, what is the maximum count of each module? What is the maximum count of the chip when the 3 modules are cascaded?
What is the minimum sampling frequency of a composite signal in the range 200 Hz to 3000 Hz?
What is the maximum sampling error of 5V peak signal using 8-bit A/D converter? What about 16-bit A/D converter?
What don't we use the Flash A/D converter all the time since it is the fastest ?
In RS232 standard communication, there are straight cables and crossed cables, when do we use each?
Compare the software and hardware timing in terms of cost, ease of use, accuracy, programmability, impact on performance and parallelism
What is the idea behind memory shadowing? How is it carried out?
What is memory shadowing? Why do we need it? how do we do it?
What is the difference between the two exceptions: Address Error & Bus Error?
What kind of devices uses the non-maskable interrupt service of a µP? Give priority levels to the following devices: mouse, keyboard, sound card, and hard disk
Is it Ok to use the CPU RST* pin as the POR* of the shadowing logic? why and why not?
Assume we have a hard disk drive that rotates at 7200 rpm, and has a head seek latency of 2 ms and controller latency of 1 ms, and 50 MB/s transfer rate.
A 5,400 rpm, 100 MB/s HDD, 500 GB HDD is now 80% full and getting slow. The operating systems reports that it has one million files in 2 millions chunks. Assuming that the head seek latency is 10 ms and the controller overhead is 1 ms, and 100 MB/s. Assume the system is memory is good, like 16 GB and the processor is so fast. How long does it take to defragment the disk? Explain your results, if they look more or less than practically accepted.
Compute the time to duplicate a file that happened to be in three fragments: 20 MB, 15 MB and 5 MB.
Compute the burst length of a processor with on die L3 Cache is 64 MB divided into 1 ML (Mega Lines) connect to SDRAM via a 64-bit data bus.
What is the burst length? Why does it differ from system to another? What is the impact of the burst length on the memory bandwidth ?
What is the difference between SRAM and DRAM? DRAM and SDRAM? What is eDRAM? DDR memory ? DDR1, DDR2, etc.
What if we design a system with 7 I/O chips and the DMA has only 4 channels? What can we do?
DDR5 modules use 380 pins, although with 64GB capacity we need 36 address lines and hence 18 address pins. Why do we have this huge number of pins then?
In a fairly complex system, with DMA controllers, we need to design a logic that extracts four signals: MEMRD, MEMWE, IORD and IOWE from the less decoded signals MRQ, IORQ, RD and WE (sometimes R/W*), to match those of the DMA. Why is that ?
Why do we use memory hierarchy in general? Why not use a single that is cheap, fast, capacitive and non-volatile?
What is the idea behind using DMA chips? And how do they work to achieve the goal?
Which is faster for DMA, to transfer 1KB from memory to I/O, 1KB from I/O to memory, 1KB from memory to memory within the same chip or memory to memory in two different chips
What are the technologies used today for storage (memory)? And what are the factors that differentiate among them?
What do we mean by ECC memory? What is the difference from non ECC memory in price and use?
What do we mean by setup time? and hold time in the context of latching data
How do we know if the processor is in the supervisor mode or user mode? And how do we know if the processor is acknowledging an interrupt
An MC68008 is to run at full speed of 16 MHz with no wait, compute the maximum memory access time that works well, assuming only multiple of 10 available; 10, 20, 30, ...
An EPROM with 360 ns access time connect to an MC68008 with 10 MHz max and 50 address strobe latency, 10 ns data in step time via a 5 ns decoder. Compute the number of wait cycles for full speed operation
A 74LS139 2-to-4 code is used to partition the memory space of an MC68008 into 4 quadrant, ROM, RAM, PER (for Synchronous devices) and the 4th is reserved for future provision. Write the address range of the RAM. And compute the addresses per location in ROM if is 64 KB
Design a circuit that spilts the memory space of an MC68008 into 4 quadrants; ROM, RAMS, RAMU and SYNC. The synchronous subspace SYNC is to be divided into 4 quadrants: PIO, SIO, PTC and RES, where RES is to be reserved for future expansion. The following conditions have to be reported as bus error :
- Writing to read only memory
- Accessing unused partition
- Accessing RAMS while in the user mode
What is gate propagation delay, fan-in, fan-out, noise margin, signal rise time, signal fall time, data setup time, data hold time, head seek time, rotational latency time
A 555 timer is used to generate a 30 ms reset pulse using a resistor and a capacitor. Pick three sets for the following discrete values
R=2.7, 3.0, 3.3, 3.6, 3.9 4.7, 5.1, 5.6, 6.2, 6.8, 8.2, 9.1 KO
C=1.2, 1.5, 1.8, 2.2, 2.7, 3.3, 3.9 4.7, 5.6, 6., 8.2 µF
An MC68K (15MHz, 60ns Tclav & 10ns Tdicl) connects to an 80ns SRAM via a 2ns decoder. Compute the no wait and 1-wait frequency
An MC68K-16 is to run at full speed with zero wait, what are the worst case RAM and ROM access times?
To reset the MC68K µP, wee need to assert both RST* & HLT* some minimal time. This time requirements is to 100 ms on power up and only 10 µs if it is already powered. Why that much time on power up?
In complex printed circuit board (PCB), every chip has to have two capacitors between the power lines (VCC and GND), to filer ripples due to high frequency switching and to act as fast power supply when needed. Typically, a large electorate capacitor of 1 microfarad and a small ceramic capacitor of 1 nanofarad. Now since these two capacitors are in parallel, the total is equal to the sum, 1 microfarad plus 1 nanofarad is almost 1 microfarad (1.001 microfarad to be exact). This means that it is the capacitance that we are after. Explain why do we do this then?
A fellow engineer claims that he could build a Z80 system with 4 memory chips without using a decoder (or discrete logic decoding of course)? Can you figure if that is possible or not?
Design a decoding circuit for a Z80 system with 32KB ROM, 32KB RAM, A single 8-bit input port and a single 8-bit output port. Check the one in the slides and propose a minimal design
What happens when the processor executes the CHK #138, D0 ? where to go if it acknowledges level 3 interrupt and the VPA* gets asserted?
Assume that we execute the instruction: SUB.L D0, D1 then: BHI Target or BGT Target. What is the difference between the two outcomes ?
How does the processor know where to go if it acknowledges level 3 interrupt and the VPA* gets asserted?
How does the processor know where to go if it acknowledges level 5 interrupt and the DTACK* gets asserted while the lower data bus byte is $17 placed by the device being acknowledged ?
Some memory chips, SRAM in particular, assert DTACK* immediately when selected for a transaction, although they have access times from few to several tens of nano seconds. Isn't there a risk for false reading or writing?
The MC68000 asserts both AS* and DS* at the same time in reading, but delays DS* in writing. Why?
If the MC68000 is running a code with some time critical tasks, what can the code do to minimize the interruptions?
Compare the reset procedures of the Z80 and MC68000 processors
What is the main advantage of MC68008 over the MC68000? And which is more complex in terms of transistor count?
In an MC68000 based system, how does the external logic differentiate between data and code transactions? User and supervisor mode operation?
Name sources of interrupt that must be handled by the non-maskable interrupt service of the processor
In systems with RS232 serial communications, we find drivers with -12V and +12 V inputs. Why is that? And in some systems we have drivers but without those high voltage inputs. Explain
Explain why some systems have crystals with frequencies that look strange; like 32.7680 KHz, 1.8432 MHz, 2.4576 MHz or 14.7456 MHz
Figure out how to make the 555 timer work as a monostable multivibrator, to generate a 100 ms reset signal for the MC68000
What do we mean by read contention hazard ? What is the necessary condition for such a problem to occur ? And how to resolve it ?
When accessing a byte in memory, the MC68000 asserts UDS* if A0=0 and LDS* if A0=1. Why is that? Why not the other way around?
Why did Motorola phase out the synchronous data transfer support in later processors? And why did they use it in the early versions of MC68000 in the first place?
Discuss the conditions that may lead to read contention and explain how to resolve, stating the advantage and disadvantage of each
A system with a 16MHz MC68008 µP, 100 ns 16KB RAM, 300 ns 8KB ROM, 500 ns 3-port GIO is to run at full speed. Given that Address Strobe Latency is 60 ns, Data Latch Setup is 10 ns, Decoder Delay is 5 ns. Compute the number of wait cycles for each device if the GIO is connected to the synchronous bus, what if it is connected to the asynchronous bus?. Compute the maximum zero wait frequency.
With the MC68000 in mind, state the order of the signals activation in synchronous and asynchronous read and write operations
If we have chips with varying access times, such that one is fast and others need 1 and 2 wait cycles, what is the alternative to designing a DTACK* delay circuit? and what is the advantage and disadvantage of such alternative
An MC68000 running at 10 MHz, what is the average synchronous transaction time?
Draw an OR gate with two inputs, X and Y and output O. Connect X to O and X to a switch that makes it high or Low (1 or 0).
Set X=0, power up the gate and record the output as you toggle X; 0-1-0-1-0 ...
Set X=1, power up the gate and record the output as you toggle X; 1-0-1-0-1 ...
Explain the situation, and how one can make use of this
What is the purpose of each of the following devices or subsystems: PCB, CPU, GPU, MCU, SoC, SBC, SoM, MCM, ROM, PROM, EPROM, RAM, SRAM, DRAM, eDRAM, SDRAM, NVRAM, HDD, SSD, L1/L2/L3 Cache, DMA, VDRAM, DDR, RTC, VRM, PPM, PIO, PPI, VIA, PIA, SIO, DART, USART, ACIA, PTM, PCT, PCM, A/D converter, D/A converter, USB, CAN, PCI, PCIe, Oscillator, Crystal, Decoder, Encoder, Buffer
An MC68008 with 4 ns delay decoder, accessing 300 ns ROM and 100 ns RAM. The Address Strobe Latency is 60 ns, Data Latch Setup is 10 ns. If the processor is to run at 16 MHz, how many wait cycles does the ROM and RAM need?.what is the max frequency we can use if we are to use no wait at all?
Compare the reset procedures of the two processors Z80 & MC68000
Name 6 conditions one can report as erroneous via the BERR* signal
How do we know if the current transaction is for data or code? If it is for user or supervisor?
ROM and RAM chips have CS* and OE* control inputs, and both have to be asserted to read or write, we can keep CS* asserted and control OE* or the other way. What is the difference between these two methods?
Given one 3-to-8 decoder, three 2-input OR gate, two 2-input AND gate, design a decoding logic circuit that splits the memory space of an MC68008 processor such that ROM, RAM and PIO occupy the upper half of the lower quadrant. Make sure to report writing to ROM or accessing any unused space as bus error. Show the address map, partition size, number of addresses per memory location assuming 8 KB, 16 KB and 32 KB size each
Write a code that tests the 32-bit signed number in D0 and writes into D1 the values 0, 1, 2 or 3 for to reflect even positive, even negative, odd positive and odd negative respectively, without changing the content of D0
Which of the folding instructions can not be used to switch t the user mode and why
MOVE #data, SR
ANDI #data, SR
ORI #data, SR
EORI #data, SR
Assume we have the code segment below, where FARLOC is a label that is 66000 bytes ahead of this branch instruction, How to fix the Assembly error you are going to code
NXTLBL MOVEQ #$80, D0
Write a subroutine that returns the absolute value of a 32-bit signed number in D0
Describe precisely the function of the subroutine below
MULU D0, D0
LSL.L #2, D0
ADDQ.L #7, D0
How many cycles doe this code take?
MOVEQ #118, D1
ANDI.W #$7F, D0
MULU D0, D1
What are the possible addresses for the next fetch? After the first instruction
800 $6600 0084 BEQ INP
804 $4E71 NOP
Given A1=$001248, D1=$8421 and the instruction MOVE.W #$6A, 6(A1, D1.W). Show the effective addresses and the outcome of the execution
Given A1=$001240, D1=$8420 and the instruction MOVE.L #$64, 6(A1, D1.W). Show the effective addresses and the outcome of the execution
If D0=$10010, find the outcome of DIVU D0, D0 and MULU D0, D0. What if the initial state was D0=$10001?
Is it safe to say that the content of D0 is 0 after executing SUB.W D0, D0? Why or why not
What is the outcome of SUB.X D0, D0 is executed, where X is .B, .W and .L, and what is the outcome if the initial value of D0 is #$123456?
The MOVE, ADD and SUB have Quick mode. Why the AND , OR, EOR instructions do not?
How does the MC68000 processor differentiate between the two addressing modes: Absolute Long (2-word address) and Absolute Short (1-word address) in JMP and JSR instructions, although they do not use the 2-bit size field used in other instructions like MOVE?
Start with a small value in D0 and execute a loop that multiply D0 by itself several times, lets say 10. Explain why the value of D0 grows steadily then starts going up and down
What does this 2-line code do?
MOVEQ #-1, D0
LOOP DBRA D0, LOOP
Suppose we need to rotate right a long word in D0 26 bits, this takes 8+2n=8+2x26=60 cycles. Is there a way to do it faster? How fast?
Write a code that performs multiplication on two single precision floating point numbers in D1 & D2 and delivers the result in D3
There is only one conditional trap which is TRAPV. How do you turn TRAP #7 into a conditional trap? Like if it is TRAPZ, which executes if the outcome of the last operation was zero.
Why does it take the same amount of time to perform MULU #$N, Di for N=40 and N=48, but more time for any value in between?
What is the outcome of the two code segments below?
MOVEA.L #$1200, A0
MOVE.W #$40, (A0)+
MOVE.W #$60, (A0)+
MOVEA.L #$1200, A0
MOVE.W #$40, (A0)
MOVE.W #$60, (A0)
The code segment below is part of a program that runs fine, but when we remove the the third line and we get a runtime error. What kind of error is that? And what is the secret of that line? Could the same thing happen if we remove the second line? What about removing both the second and third?
MOVEA.L #$1200, A0
MOVE.B #$40, (A0)+
MOVE.B #$50, (A0)+
MOVE.W #$60, (A0)+
Assume that the four 16-bit unsigned integers at address $1200 are X, Y, P, and A. Where X & Y are the sides of a rectangle, and P & A are the perimeter & area. Write a code at address $1000 to compute P & A.
If the MC68000 processor is re-designed to implement 32-bit multiplication, how many cycles does it take if the source is data register? how many cycles does it take if the source is immediate longword?
Compare the instruction LSL.L #5, D0 with the sequence LSL.L #3, D0 then LSL.L #2, D0 . What are the outcomes? What are the differences ?
What do we mean by runtime errors and compile time errors? Give examples of both. Give example of an instruction that causes a runtime error but no compile time error?
Typically, when we need to do something many times, we use the loop mechanism, but using flat or unrolled sequences is sometimes better. Discuss both methods from the spatial and temporal performance points of view.
It is improper to place our code in the first bytes of the memory space in a complex system, while it is ok in a simple controller. Why?
What happens when the processor fetches the instruction word $4AFC?
Explain what happens if we use RTS where we have to use RTR and vice versa ?
Explain what is wrong with this subroutine
FUNC MOVEQ #20, D0
MOVEA #400, A0
PEA (8, A0, D0.L)
Which of the following is the slowest: JMP $7200 , JMP $18200 , JMP $FF8200 and why ?
Analyze the code segment below, then run it on the simulator to see if it does what you think
MOVE.L #$60001FFE, $2000
MOVE.L #$6000DFFE, $4000
Analyze the code segment below, then run it on the simulator to see if it does what you think
MOVE.L #$4EF84000, $2000
MOVE.L #$4EF82000, $4000
Why is BCLR #0, $1240 faster than BCLR #7, $12400 ?
Why is BCLR #0, D0 faster than BCLR #7, $1240 ?
There are instructions that load to/from USP, but none that does the same for SSP. How do we change its value then?
For the MOVE.S d(Ai, Xi.s), M, assume that d=$48 and either $1200 or $12000 (Short & Long absolute), and discuss the instruction length and execution time in cycles for all possible values of S and s each being W and L
For what range of values of D0 the instruction LSL.L D0, D1 has the same outcomes?
Given: D0=$91234. Show the state of the machine after executing CMPI.W #$1200, D0
Given: D0=$123456 & D1=$121244. Show the state of the machine after executing ABCD D1, D0
Given: D0=$123456 & D1=$121244. Show the state of the machine after executing ADD.L D1, D0? after executing SUB.L D1, D0? after executing SUB.L D0, D1
Given: D0=$12346789. Trace the changes in D0 after each instruction in the sequence: MOVEQ #49, D0 then LSR.L D0, D0 then SWAP D0
The instruction word %0110xxxx10100101 is for Bcc and it is fetched from address $12A4, the x'x stand for the branch condition, show all the outcomes of this instruction
Assume that M($1200)= $60004000, and we executed JMP $1200. Explain what happens
Assume a single word BRA is fetched from address $FFFF90. Show the range of addresses it can reach
Why the instruction wordS $6017 and $60FA are impossible to find in a code listing?
Why do we have ADDQ, SUBQ and MOVEQ when the ADD, SUB and MOVE can handle all the situations and literal values?
BSET and TAS can test and set a bit, and being single instructions mean atomic and can not be interrupted before completion. Can you find the difference for which TAS was implemented? Think of an application
Using JMP, absolute short, using one word, can transfer control from $7900 to $1100 but can not transfer control from $7900 to $8000 although the distance is much shorter, But BRA can do both easily using one word. Explain
Given: D1=$25000 & D2=$340000. What is the outcome of MULU D1, D2?
Given: D1=$5000 & D2=$34003. What is the outcome of executing DIVU D2, D1?
Consider the following current addresses and the targets next to them, and show all the possible instructions and sizes that can be used to achieve the transfer
Consider the code segments below, and explain why the BRA instruction of the first has two words while that of the second has only one.
TAR2 BRA TAR2
There are four ways (instructions) to transfer control from $FFA860 to $FFA820. Show all of them, and compare in terms of the instruction size, execution speed, along with portability
Given: A0=$9200 & D1=$8A40. Compute the source effective address of MOVE.L -64(A0, D1.W), D2
Trace this code with M($1200) = 0, 1, 2, 3, 4 and 5 to find what the final outcome is, and explain why it fails to deliver the right answer when M($1200) = 5
MOVE.W $1200, D0
MOVEQ #2, D1
LOOP MULU D0, D0
ADDQ.L #$2, D0
LSL.L #1, D0
DBRA D1, LOOP
MULU $1200, D0
MOVE.L D0, $1200
Given: D0=$1287A9. Show the content of D0 after executing ASR.W #3, D0
Given: MULU #524, D1 or MULU #507, D6, which one executes faster and by how much?
It does not make sense to have the EXT instruction following the MOVEQ instruction. Why?
The MC68000 is currently executing at address $6400 and want to transfer control to one of three locations: $A640, $12800 or $FF9840. Compare the JMP & BRA instructions handling of each target address, i.e. capability and number of words and cycles to complete
Arithmetic shift right and left are like dividing by 2 and multiplying by 2, so if we execute them as a sequence on some data, we should get the same value back. Try it on D0 while it has $6542, $6543 and $6540, using a single bit shift and 2 bit shifts. Explain any discrepancy
Estimate the MC68000ís CPI when running a code that computes 128 dot products of 16-entry arrays of 16-bit unsigned numbers
Perform the following multiplications for any value in D0, and compare the speeds and its relationship with the 0's and 1's patterns in each source
MULS #$1357, D0
MULU #$1357, D0
MULS #$7531, D0
MULU #$7531, D0
Consider: MULS #2048, D0 & MULS #2047, D0. Which is faster? By how much?
Consider: MULU #2048, D0 & MULU #2047, D0. Which is faster? By how much?
Why does SUB.L Di, Dj take 8 clock cycles while SUB.W Di, Dj and SUB.B Di, Dj take only 4 clock cycles each?Note that all are single word instructions, and need only a single bus cycle to fetch
We want to multiply the 16-bit integer in D0 by 32767. show a spatial and temporal comparison between the two code segments below; using a single instruction to directly multiply by 32767 or using a trick that multiplies by 32767+1 or 32768 and then subtract to adjust
MULU #32767, D0
MOVEQ #0, D1
MOVE.W D0, D1
MULU #32768, D0
SUB.L D1, D1
If D0=#$8420 and D1=#$7531, which is faster to find the product, to execute MULU D0, D1 or MULU D1, D0 ? by how much ?
Which of the following multiplications execute faster? by how much
MULU #128, D0
MULU #87, D1
MULU #36, D2
What is the size of the displacement of the branch in the following piece of code. Think good, this is tricky. Execute with D0 and D1 equal and not equal
SUB.W D0, D1
GOX MOVEQ #0, D0
In MC68K, JMP is unconditional but reaches anywhere in the space, Bcc is conditional but limited reach. Think of how to implement conditional jump like JEQ Label or JNE Label
Assume that SP=$12806, D0=$86 and the instruction PEA -8(PC, D0.W), or PEA (-8, PC, D0.W), is fetched from address $12400, show the outcome by stating the values of SP, D0, PC and any change in memory state
The 16-bit string 1101011101101001 is read from the address $124600. Is it Code or Data?
Why do we get divide-by-zero errors in run time, when the instruction DIVU D0/D1 is executed? Can Assemblers do something to prevent?
Show the content of D1 after executing the sequence of instructions:
MOVEQ #84, D1
Assume that the variable X is at $1200 and the variable Y is at $1202. Analyze the code below to answer the questions that follows.
MOVEA.L #$1200, A0
MOVE.W (A0)+, D0
MOVEQ #1, D1
LOOP MULU D0, D0
ADD.L #3, D0
DBRA D1, LOOP
LSR.L #2, D0
ADD.L #11, D0
MOVE.L D0, (A0)+
State the function Y(X). What are the final state of the registers? What is the content of $1202 if M($1200)=3?
Write a code that adds 8 decimals digits in D0 and D1 and stores the result in D1, then compare the space and time requirements with the longword binary addition
Write a function that computes the volume of a sphere whose radius is in the lower byte of D0 and returns the result in D0 after proper rounding
Show the content of D0 & D1 after executing DIVU D0, D1 for the following sets of values. Use the simulator to verify your answers
D0=$12340009 & D1=$72
D0=$1234FFF7 & D1=$79
D0=$1234000A & D1=$FFFFFF72
D0=$1234FFF8 & D1=$FFFFFE94
Given: D0=$1240000 & D1=$9569321. Show the state of the flags Z, N and X after executing TST.W D0 & TST.L D1
If the Bcc is fetched from address $48, show all the possibilities of the next address to fetch from
Given: D0=$985674. Show content of D0 after executing ASR.W D0 (i.e. Arithmetic Shift Right)?
Want to transfer control from $124680 to $124698. How many words do JMP and BRA need to make it
Check out the binary code the simulator generates for the following instructions, and ?verify the instruction encoding by looking at the instruction layout in the slides:
MOVEQ #$80, D7
MOVE.B A0, D5
MOVE.L #16, D4
MOVE.W #$126, (A0)+
MOVEA.L #1200, A1
MOVEA.L #12000, A1
Try ADD.B D0, D1 & ABCD D0, D1with the initial state D0=$98 & D1=$67 in each case. Comment on the results.
How does the MOVEM move a list of up to 15 registers to a memory location and restore it? what happen if the list is re-ordered? ?
Why does it take the same amount of time to multiply a number by 1, 2, 4, 8, 16, 32, etc. and take more to multiply by 3, 7, 15, 31, 63, etc.?? How does Booth's algorithm solve this issue?
Give an example that sets the V flag after the DIVU instruction, and explain what can be done based on it??
Why the quick flavor of instruction are implemented although they are very limited in range or addressing modes? and how do they achieve higher performance??
Write a code that evaluates the function Y = 8X^4 -7 assuming X and Y are locations in memory
Write a compact code that compares two 64-entry arrays of 16-bit numbers at addresses 1200 and 1600 ?and sets/resets a byte at 12800 if they are equal or otherwise
List all the privileged instructions, and the instructions that can switch to the user mode?
State the advantages and disadvantages of unrolling loops in codes??
Show various ways of clearing a data register and comment of the space and time requirement of each
How could the designers increase the range of the branch instruction without encoding more bits for the displacement??
The lower order byte of the branch and move quick instructions is a signed number, why is the range -128 to 127 in the move quick and only -128 to 126 (actually excluding the odd numbers) in the branch?
Later processors included support for 32-bit displacement for relative jump instructions like branch and branch to subroutine. What is the advantage in case of 32-bit address processor since it takes same space and time absolute jumps do?
Processors that support branch and branch to subroutine with d32 have two ways to figure the displacement size. The first is: if lower byte is none zero then it is d8 or else it is d16 unless it is $00 then it is d32. The second: if lower byte is d8 unless it is $00 then d16 or $FF then it is d32 (which is used by the MC68020, ... processors), Comment on the space and time requirement of both methods.
Why is it impossible to find the BRA instruction word $6084 at address $66?
Why does the single word Bcc range is -126 to 128 although the d8 or signed displacement value is in fact -128 to 127 and if we reject the odd values it becomes -128 to 126. Explain
Consider the two instructions BSET & TAS, both test a bit in memory and set a bit. BSET can do what TAS can do and even more; works on registers not memory only, and tests and sets any bit. What is the advantage of TAS over BSET?
Quick immediate encode 3-bits in the instruction to represent 1-8, where 000 stands for 8. Is there any advantage of do thing for the shift instructions? Instead of module 64 register representing the number of bit shifts, make the 00000 represent the 32. Any disadvantage?
Use the MC68K Simulator to compare four ways to compute the product X * Y, where X and Y are two 16-bit integers: X= $8421 & Y= $7EBF, in terms of number of cycles required
- Using the instruction MULU
- Using repetitive addition; Add X to an accumulator Y times
- Using Binary Shift/Add algorithm
- Using Booth's algorithm
Exchange; use various values with various patterns of 1's and 0's and repeat the study, and explain discrepancies in the execution times
What is the final outcome of the instruction sequence: MOVEQ #E6, D0, LSL.W #3, D0, SWAP D0
Discuss the advantages and disadvantages of having smaller or larger instruction sets, smaller and larger number of addressing modes.
What makes the MOVEQ and ADDQ instructions shorter and faster?
Why is the move instruction with data register direct mode faster than that with PC relative with index?
The MC68000 does not have barrel shifter, and yet we can perform any number of shifts from 1 to 32 in one instruction. Explain.
Compare the two MC68000 instructions: MOVEQ #$64, D0 & MOVE.L #$64, D0 spatially and temporally; i.e. in terms of the space they consume and the times they need to execute, although they do exactly the same thing.
Using Assembly language is tedious and time consuming. It is hardly good for even small projects. Why is it good to learn then? Name situations where using Assembly pays off.
After executing the two instructions below, what does the address $1841 have?
MOVEQ #-97, D0
MOVE.L D0, $1840 ; where X is your student number
The following three bytes: $C3 $80 $12 were found at address $1200 in memory of Z80 system. Given that Z80 is little endian and $C3 is the opcode of absolute jump instruction. Describe the outcome of fetching such 3-word instruction.
Can we perform image processing on a Z80 based computer system? Explain
How does the Z80 signal interrupt acknowledgment? And what is the impact of this design on input/output requests? And do we resolve this issue?
The Z80 µP implements block copy instructions: LDDR and LDIR. What is the advantage of such instructions when one can use a simple loop mechanism to achieve the task
One of the bits in the F register (Flags) in the Z80 µP has dual meaning P & V. How do we know what it means when we check it
What is the benefit of using The Z80 LDIR instruction since we can implement this in a loop mechanism?
Why do processors have many addressing modes? State the advantages and disadvantages
Compare processors in terms of alignment and endianness end explain what each means
Why do processor have limited number of registers with names, like 32 max, although some have 192 registers inside
Name addressing mode that are not allowed for destination, along with the reasons
Why the C flag gets set or reset even after some non arithmetic operations?
Why do modern processors have two states; system state and user state?
Explain the following terms: alignment, endianness, byte and word indexing
Some addressing modes are not allowed as destination for some reasons. What modes and what reasons?
How many address bit a processor need to access 64 TB? Assume byte indexing, word indexing, and longword indexing
Intel's AURORA processor can access up to 10 PB of byte indexed memory. How many address bits does the processor have?
What is the address space of a processor with 32-bit data bus & 32-bit address bus with whole word indexing?
Each of the following instructions take a single word in memory, but they vary in execution time; 4, 8 and 12 clock cycles. Explain briefly, the reason behind the similarity and differences
MOVE.W A0, D0
MOVE.L A0, D0
MOVE.B (A0), D0
MOVE.W (A0), D0
MOVE.L (A0), D0
Which addressing mode is faster to execute, the Post-Increment or Pre-Decrement? Why?
Choose the charge/discharge resistors to use with 2.2 µF capacitor to generate a 35 ms reset signal in a 5V system. Assume that the capacitor max current rating is 10V/20mA. Available: 1, 1.5, 2.7, 3.3, 4.7, 6.8, 9.1, 15, 18, 20, 27, 56 KΩ
Typically, MCUs & SoCs integrate CPUs and some other components. What is the difference then?
Processors require a power up reset signal for proper start, it suffices to have few microseconds, but we typically extend it to milliseconds in our designs. Why?
Complex microprocessor systems have so many buffers while basic ones do not. Why?
What does PIC mean? What does XIP mean? What does Cache block or line mean?
What is the purpose of using few address lines with peripheral chips? And why some chips use more or less of those lines
The input/output lines of an I/O NMOS chip are driven by darlington pairs and hence are capable of sinking and sourcing 10mA. If one of the output lines is used to drive a blue 3V LED, compute the value of current limiting resistor?
A Schmidt inverter is used to generate a 30 ms reset signal instead of the 555 Timer chip. Find the charge and discharge resistors values if the capacitor used is 2.2 µF. R's are available with fixed values: 1, 1.2, 1.5, 3.3, 4.7, 6.8, 8.2, 10 , 11, 12, 13, 15, 16, 18, 20 KO
To reset the MC68000, its RST* and HLT* controls must be asserted for a 100 ms., using a simple RC circuit, or a monostable like the popular 555 Timer. Why do we connect them separately? And why do we use 74LS05 not 74LS04 or 74LS14 which are also inverters.
A 50% duty cycle 3.6 GHz clock signal appeared non-inverted at the output of an inverter. Explain
Given a violet 4.4V LED and a 300 O current limiting resistor in a 8V system. Compute the power consumption of the LED if it is ON one tenth of the time
Flash memory wear leveling can be dynamic or static. Discuss the two mechanisms
An old DRAM chip has 19 address pins and 2 data pin, compute the min and max capacity in MB
Calculate the annual time shift of an RTC module, making reasonable assumptions regarding the crystal stability (accuracy or precision)
Compute the maximum annual time shift in seconds for an RTC with 4 ppm crystal accuracy.
A clock generator with 100 MHz / 5 ppm crystal is used in a controller. How many missing or extra cycles could there be in a second?
Multi is a prefix used often to describe computer capabilities, it seems appealing as it means doing more. Multi-programming, Multi-tasking, Multi-processing, and Multi-threading. Tell what every one is about?
Two LEDs in series with a resistor are driven by an out pin of a chip. It was noticed that they never lit when the output is made low or high. What could be the reason?
When an application edits data stored in flash memory, it takes long time to write back because the controller has to erase before writing and this take long time. And doing this over and over cause these blocks to wear faster than others. How does the flash memory controller handle those issues? Wear out of certain blocks faster than others and the erase/write latency.
List the components of a typical microprocessor system, make sure it works.
Have you ever thought why flash memories have limited erase/write cycles and hence wear out?
Why can one connect two LEDs of red, yellow? or green colors in series in a 5V system, but can not do that with white color LEDs?
The performance of hard disk drives in accessing large files, like videos, is better than small files, like text files, while that of flash drives are nearly the same for large and small files. Explain
What is the Fan-in ? Fan-out? What is the significance of such numbers? And how to address problems related with each?
Why do we use buffers to interface with the processor bus?
We connect LEDs through resistors in a common anode configuration. Why do we use resistors? and why common anode not common cathode?
If a chip has 26 VCC and 31 GND contacts, can we supply power to the chip through any subset of those contest/ or do we have to connect all?
There are SRAMs and DRAMs with two ports instead of one. Two parties can read and write almost concurrently. Find what kind of applications would need such a memory types?
There are many types of processing units around today; Central Processing Unit (CPU), Graphics Processing Unit (GPU),Tensor processing Unit (TPU), Accelerated Processing Unit (APU), General Purpose Graphic Processing Unit (GPGPU), Digital Signal Processor (DSP), Neural Network processor (NNP), Input/Output processor (IOP), etc.. Find what every one is about
LEDs are commonly used in design to indicate a status. Why do we drive LEDs using a TTL gate and not directly by VLSI chip? Why do we put them in common anode mode instead of common cathode? How do we calculate the current limiting resistor to be used?
Typically, computers connect to peripherals like printers, scanners, etc. via cables with few to several meters at most. What would happen if we extend the cables to run hundreds of meters?
Wear leveling is a technique used in Flash memory controllers, that writes always to fresh pages, even if editing an existing one, it writes to another page instead of writing to the same page. Why is that? what are the advantages??
Is the ROM and RAM naming proper? Why not??
What is the advantage designing with MCU instead of CPU? And what is the advantage of designing with SoC instead of MCU?
A 900 mWH / 6V Battery supplies current to a 4V / 25mA white LED, how long it takes the battery to get out of juice? What is the value of the current limiting resistor to be used?
Why do we cover the glass window of the EEPROM by a sticker after programming it and placing it in the socket?
SYSTEM STRUCTURE EXTRA
In digital logic, the consensus theorem states that XY+X'Z+YZ=XY+X'Z, which means the product term YZ is redundant and can be removed. The right hand side is less costly and we better go for to save. However, if the gates are ideal, have no propagation delay, this is fine, but in real life the two implementations are not exactly alike; they are logically equivalent but physically not, not only in the gates count, but the actual behavior at the signal level. Comment on the hazard and when we can or can not employ the reduced version
A 1 GHz square-wave signal, call it X, is fed to a logic gate L along with its complement X'. Assume that all gates have propagation delay of 0.2 ns. Sketch the output of the logic gate L if it is OR gate, AND gate, XOR gate, XNOR gate. Give a meaning for every case to reflect what it does. Repeat all cases assuming the propagation delay is 2 ns.
Can we run code in NOR and NAND Flash memories? Can we use any of them to replace the computer's main memory? Explain why or why not.
Using more Flash NAND chips to build higher capacity SSD, increases the speed and longevity. Explain
How many contact does a processor chip have? Given …
64-bit data,40-bit address, 24 input control signals & 28 output control signals
0.5 A ampacity wires to the well, consumes 20 W at 2.8 V and 3.8 GHz
How long it takes the battery of a laptop to run out of juice? Given …
3.6 V and 5.2 A 2-cell battery, 3.6 GHz, 2 V, 12 W and 1.4 Billion transistor processor
Compare the speeds of duplicating a 1 MB file on SSD: 0.02 ms & 4 GB/s, HDD: 20 ms & 150 MB/s
Compare the speeds of duplicating a 16 KB file on SSD: 0.03 ms & 3 GB/s, HDD: 15 ms & 150 MB/s
Compare the performance of duplicating files on an SSD and HDD, assuming Ls and Lh latencies and Ss and Sh transfer speeds. Assume a range of very small to very large files
Consider an HDD with 20 ms latency and 100 MB/s transfer rate. How long does it take to duplicate 5 files 100KB each, compared to duplicating a single file that is 500 KB?
Consider an SSD with 0.2 ms latency and 100 MB/s transfer rate. How long does it take to duplicate 5 files 100KB each, compared to duplicating a single file that is 500KB?
Compare the time required to duplicate a single 100MB file and duplicate ten 10MB files on a hard disk with 100 MB/s transfer rate, 4ms rotational latency and 3 head seek latency.
My storage requirement will never exceed 1 TB for 10 years to come, yet I prefer to buy 4 TB SSD over 2 TB SSD although it is more expensive. Why?
Name three reasons why the number of contacts in a high core count µP chip to be in 1000’s
Compare the speed of duplicating small, medium and large files (100 KB, 1 MB and 1 GB, respectively) on SSD (0.03 ms & 3 GB/s) and HDD (15 ms & 150 MB/s)
The HDD access time consists of three components; rotational latency, head seek latency and data transfer time. Discuss what each means, and what factors determine those values, and what ranges are available today for commercial systems and state of the art systems
Cache L1 is split (Code & Data) and typical implementations today are 64KB 2-way set associative each. However, the code section is single ported and data section is typically dual ported. Why ?
A µP with 4 classes of instructions; class A takes 3 cycles, B takes 4 cycles, C takes 6 cycles and D takes 8 cycles. In a typical mix we found these instruction counts: class A has 500, B has 200 instructions, C has 800 instructions and D has 100 instructions. Compute the CPI, IPC and MIPS at 100 MHz
CISC µPs have instructions that can perform operations on two operands using absolute mode, while RISC µPs do not, and have to load operands in register then write back the result. Does this mean CISC are much faster in doing such operations? Explain
A single core µP takes 3 ns to multiply 2 numbers. Which option is better to enhance the performance, to use a dual core version of this µP or to raise the clock by 10%? Justify
Why does take a single core µP running at 5 GHz around 3 ns to compute the dot product of two 2-element vectors of single precision floating point (SPFP) numbers, while it takes only 6 ns to compute the dot product of two 16-element vectors? 8 times the computational effort done in twice the amount of time
How much time does it take a commercial µP to multiply a pair of single precision floating point numbers? 10 pairs one after another? 100 pairs one after another?
We want to design a controller for traffic lights on a busy intersection. What kind of specs such a microprocessor will likely have regarding: data bus, address bus, number of cores, number of pins, power consumption, clock frequency and cost.
Compare the speeds of SSD and HDD in reading various file sizes: 3KB, 30KB, 300KB, 3MB, 30MB and 3GB assuming the specs are: 3GB/s and 10 µs latency, and 150MB/s and 10 ms latency. You better use an excel sheet
Instructions Per Cycle (IPC) is an architectural performance metric. Today, microprocessors can achieve IPC > 100 with less than10 cores, which means IPC > 10 per core. But the theoretical limit of any pipeline is IPC=1 no matter how deep or efficient it is. How do you think the cores achieve those numbers then?
Compare the price of discrete transistors with integrated transistors (in a modern processor chip for example)?
Consider a high and a low end workstation class processors, and give a reasonable estimates for the following factors: number of cores, number of contacts, number of memory channels, power consumption, L1/L2/L3 cache size, and price
What happens if a 4 GHz processor is run at 5 GHz for long time?
Give a price range estimate for the following classes of processors: Low cost controller, low end desktop, High end desktop and Workstation/Server
Early processors were clocked at 1 MHz, and then went up steadily. The rate at which clocking is increased slew down and we are almost stuck at 3 to 5 GHz. Why not clocking at 20 GHz as the roadmaps proposed once?
The Von Neumann architecture (or stored program machine) is sometimes called Princeton model has a problem pipelining the execution. Why? What is the solution?
What is the problem of having a single memory channel in a processor with 8 cores? And what is the solution?
How can a microSD card integrate 1 TB of storage on a small footprint? nearly few millimeters each side and less than one millimeter thick
Modern processors consume more power and yet they are far more efficient than the old ones. Compare Intel's 8080 and i7 to figure out
Find the address space of a µP with 52-bit address bus, assuming it is a byte-index, and assuming it is a longword-indexed (each longword, 32-bit word, has an index or address)
How long does it take a 20 GIPS µP class to get to the 100 GIPS performance?
Compute the number of power contacts in 300W/1.5V µP with 0.75A Ampacity well-die wires.
If it takes a single core µP 200 ns to find the dot product, how long would it take a similar but dual core µP ?
Why Moore's law on Silicon is about to cease? What are the alternatives?
A symmetrical clock signal of 1 GHz appeared non inverted at the output of an inverter. Explain
Processors gets power via two input contacts, VCC/GND (or VDD/GND). Modern processor have tens or even hundreds of such pairs. Why?
Transistor in processors are used as switches; when it is ON the voltage is quite low and hence the power consumption is low, and when it is OFF the current is quite low and hence the power consumption is low. Why power consumption goes higher by increasing the frequency?
Processors communicate with the other components like memory using pins or contacts; Address, Data and Control. The Intel 8080 for example has 40 pins: 16 for Address, 8 for Data and the rest for Control beside 2 for Power). Today's processors use 50 or less bits for Address, 64 bits for Data and 10's of bits for Control, then why do they have 2000 to 4000 pins or contacts?
Doubling the clock frequency does not double the performance, and doubling the number of cores in a processor or the number of processors in a system does not double the performance either. Why?
Make a list of advantages and disadvantages of SSD over HDD, and use number when possible
A high level language source code with 124 KLines compiled for CISC machine and generated an object code with 96,135 instructions. Give an estimate for the object code size if the same source code is compiled for another CISC machine
If your desktop processor is a high end one then it might be rated at 100W, and knowing that processors operate at 2V means it draws nearly 50A. How come then the circuit breaker of your home does not break even if rated at 20A?
Why do we connect a 100 nF ceramic capacitor ?between the power and ground of every IC on the motherboard?
Today, the cost of GFLOPS performance is only 2 cents. Use Moore's law to predict the cost in the mid 80's of the last century and then search to verify the answer
An input/output-intensive code runs to completion in a minute. How long will it take if we quadruple the clock?.
When we say 32-bit or 64-bit processor, we refer to the data bus, data path?, registers, execution unit ports, but when we say 32-bit or 64-bit application then we refer to the address used in coding; 32-bit means 32-bit address and 64-bit means higher implementations; 36-bit, 40-bit or 50-bit (as no processor has 64-bit address so far). In both cases going to 64-bit is faster. Explain
Aside from size, cost and power consumption, which has better performance in a real world system, a single sophisticated processor with 270 GFLOPS or 16 less complex processors with 20 GFLOPS each?
Processors have several registers as part of the data path, and they are faster the cache memory. Why limiting the number of registers to 8,16, 32 or 64? why not using 1024, 2048 or even more?
A program runs to completion in 10 minutes on a system with 3 GHz dual core processor. How long it takes it complete if we replaced the processor by a quad core of the same model and clock it at 4 GHz?
A processor design is implemented with three performance levels for notebooks, desktop and workstation; the dual core has 1.2 billion transistors, the quad core has 1.8 billion transistors and the octa core has 2.6 billions. Explain why the transistor count is not in proportion with the number of cores
How long does it take a specific class of processors to upgrade performance by one order of magnitude?
4K monitors are becoming popular, the resolution of such monitor is 2x2 times the Full High Definition (FHD) which is 1920x1080. ?or 3840x2160 pixels. And to display colors with good accuracy they use true color system; i.e. 24-bit per pixel (8-bit RED, 8-bit GREEN and 8-bit BLUE). And for comfort watch they use 60 frames per second as refresh rate. Compute the bandwidth of the channel that transfers data to the monitor
A bank has a 100 TB of data on a magnetic tape at the headquarter in Amman, and the branch in Irbid wants a copy of it. The two offices are connected to the Internet via a fast 1000 Mbps fiber connection (1000 Mbps uplink and 1000 Mbps downlink). The manager asked the computer center to send it over the link, but the engineer in charge said he had a replica of this data and he will send it on a motorcycle. Whose idea is better?
Today, 7nm process is in use by many fabs, with nearly 10 billion transistors per square centimeter?. What is the expected density after 5 years? after 10 years from now?
How long does it takes to copy 600 MB file from a HDD (150 MB/s, 5 ms latency) to an SSD (3 GB/s, 20 µs latency)? Consider the two scenarios; reading all in memory then writing and writing while reading. What if the file consist of three chunks in different locations (track, sector, plater) on the HDD?
How long it takes to copy a 3 GB video from one folder to another on a high speed SSD? how about HDD??
Copying two large video files from a DVD to an HDD, by takes much longer than performing the copy paste one file at a time. Explain? Who is to blame on this mess
Various flash memory media form factors; SD, miniSD, microSD cards, internal 2.5" SSD and M.2 SSD add-on card, have range of data transfer speeds; from 2 MB/s to 4 GB/s. Why is that?
An 4 GHz, 24 W , 16-core processor maintains an IPC of 48. Compute the power efficiency in terms of MIPJ
If we decide to build the A13 Bionic chip (8.5 billion transistors on 1cm x 1cm die) using discrete transistors, resistors, wires, etc.. How much it takes to do that? space, money, time, etc.?
Today's chips use 7 nm to build transistors and connections of chips. Compare the size of such a transistor with virus. And estimate the number of atoms in such a transistor, and the gate thickness in atoms
If you look into a microprocessor spec, you may find sources reporting various CPI values. How is that?, since we are talking about the same processor
Instruction sets vary in size, RISC can go as low as 30 instructions although some are 200 and CISC generally exceeds 200. The question is: what is the minimum? Can we design a computer with just 1 instruction? And hence no operation code and no decoding time and easy control logic. This is something to investigate and not a Yes/No question.
Today, we can pack ten billions of transistors on a small chip, and 10 such chips will match the number of neurons in a human brain, and yet this is way far from reaching its mental power. Why?
The first µP (2300 transistors on 3x4 mm die) was fabricated using 10µm process, while todays µP are fabricated using 5nm process. Assuming the process or technology node is a good measure of the transistor features.What should the current number of transistors of a chip with 120 square mm die ? Are we meeting this today?
Replacing the air in a hard disk drive by Helium, which 7 times lighter, allows the spindle to run faster with less power, and also packing more platters and hence increasing the capacity. If this is the case, why not letting the spindle and platters work in vacuum?