Software engineering 4F03 is one of the optional electives that can be taken by 4th year students across all streams of software engineering, computer science and mechatronics. 4F03 is an introduction to distributed computing systems, not quite proceeded by any other course in particular in any of these respective programs.
The course begins with an introduction to MPI, which stands for
Message
Passing
Interface. The implementation of which, used in this particular course, is the open source version OpenMPI. MPI is used for passing data between execution threads running as close as in separate threads on the same CPU, or as far as different CPUs in completely different computers within a single cluster. This allows these separate threads of execution to each individually compute part of a problem, and then combine them back together into a single result - effectively
distributing the problem (hence the name of the course).
MPI is a pretty simplistic tool for distributing problems, and can be learned in a relatively short amount of time. The majority of the course is about learning how to distribute a problem well such that it scales. The types of scaling learned about are weak and strong scaling, which are defined as follows:
- Strong - Analyzing computation time for a fixed problem size, while increasing the number of processes working on it. In other words, a problem is strongly scalable if throwing more workers at it makes it finish faster.
- Weak - Analyzing computation time for a fixed per processor problem size, increasing the number of processes and the problem size together. If the amount of time it takes to finish the problem stays roughly the same as it and the number of processes is scaled, the problem is weakly scalable.
These are known respectively as Amdahl's and Gustafson's laws of scaling/parallelizing processes. Some time is spent analyzing unique cases where the speedup of adding more processes is greater than the number of processes themselves (superlinear speedup). This is usually due to advantages gained from things like more effective caching as the data a single process has to deal with becomes smaller or more homogeneous.
The majority of the assignments in this course have to do with parallelizing problems, and then analyzing the results and determining their respective strong and weak scalabilities (compliance with Amdahl and Gustafson's laws).
When I took this course, it was taught by Ned Nedialkov, and the breakdown was as follows:
- Assignment #1 : 8%
- Assignment #2 : 12%
- Assignment #3 : 15%
- Midterm : 20%
- Project : 45%
The first assignment had to do with parallelizing the multiplication of a matrix of by a vector. Each process took
n number of rows and multiplied them by the column vector, and the results were collected on the root process. It was relatively easy, but somewhat difficult to test since we were only allowed to use machines we had access to (laptops, desktops. etc).
The second assignment was all about analyzing our algorithms from assignment #1 for strong and weak scalability, and commenting on whatever we found. For this assignment, we had access to the super computer known as Sharcnet in the ABB physics building. It too was not overly difficult if one carefully studied how to produce speedup/efficiency graphs for the strong and weak scaling tests.
The third assignment involved using the GNU Multiple Precision library. This library allows the production of integers whose size is longer than a single memory segment on a standard computer (32/64 bits) that automatically scale as number size increases. The operation to find the largest gap between any two prime numbers up to a certain size was parallelized and analyzed as in assignment #2 for bonus marks.
The midterm for this course was difficult. Many questions about the specifics of buffering in the MPI library were asked in relation to deadlock that can occur when it is used improperly. There were also questions asked about how broadcasting between processes works (not exactly intuitive) as well as a question about designing a parallel algorithm for distributing the computation of an image processing algorithm. Results were extremely poor, so the professor allowed for bonus work on the third assignment and project to make up for it.
The project was interesting. It involved creating a parallel algorithm for finding the number of unique numbers in a multiplication grid of arbitrary size
n by
n (i.e. in a 10 by 10 grid, there are 42 unique numbers). We were supposed to, in groups, do everything we could possibly think of to make this computation happen as quickly as possible. The problem is np-complete, meaning there isn't a formula to calculate the number. Some students chose to approximate it using advanced mathematical techniques they didn't really understand that could calculate it in a second. They didn't do as well, since they just implemented existing algorithms rather than thinking of their own. It was fun.
Overall, I thought this course was okay. After the midterm, the only thing left was the project - so class attendance dropped pretty much to nothing almost immediately. Topics covered included OpenMP (another parallelization library) and some interesting algorithms for parallelizing common problems like the Travelling Salesman problem. Personally, I think it would have been more beneficial for the course to focus on GPU programming, since that's something generally more accessible to the common developer and of interest outside of just the scientific community. I recommended it to the prof, and he seemed interested in adding it to the curriculum for future years.
If you really learn what little material there is, and put a lot of thought into how you do your assignments, you should do just fine.
P.S. The course is entirely in C. Make sure you know pointers and all that C stuff well.