I just finished taking the course Software Security from the University of Maryland via Coursera. It was a relatively easy course (at least if you know C) that gave an overview of the following areas: buffer overflows and other memory attacks, web security (including SQL injection, CSRF and XSS), secure design, static analysis, symbolic execution, fuzzing and penetration testing. The instructor, professor Michael Hicks, was one of the more pedagogical lecturers I have listened to, and the whole course was quite enjoyable.
The first part of the course deals with buffer overflows and related memory exploits, as well as various defences. For this part, it really helps if you know C, since all the examples are in C. Professor Hicks does a really good job explaining how buffer overflows work, including how the memory is laid out, the difference between the stack and the heap etc. I was familiar with buffer overflows, but format string attacks were new to me.
In week two, various schemes to protect against memory-bases attacks are explained. This includes stack canaries (magic numbers in certain positions on the stack to help detect overwritten memory), data execution protection (only code in certain regions of memory is actually executable), address space layout randomization (to make it harder for an attacker to know where to jump to) and control flow integrity (to make sure a jump to an instruction comes from where you expect it to come). Almost all of these techniques were new to me, and it was quite fascinating to learn about this continuing arms race of attacks and countermeasures.
The third week of the course is about security on the web, and this felt like the most relevant part for me. First SQL injection attacks are explained, then some countermeasures, for example the use of prepared statements. Next is a section on the risks of using hidden fields and cookies. Finally professor Hicks explains session hijacking, Cross-site Request Forgery (CSRF) and Cross-site Scripting (XSS). As with the section on memory attacks, professor Hicks does an excellent job explaining all the attacks – very pedagogical, with many helpful diagrams and animations.
Week four starts with threat modeling. For example, an adversary can be able to access the system via the internet, or can snoop on the traffic, or can be co-located (for example as malware reading key strokes). Next, some design principles are presented: favor simplicity, trust with reluctance, defend in depth, and monitor and trace. There is a risk of getting too abstract when describing design principles, but I think professor Hicks did a good job by giving examples of what they mean in practice. I also really liked the case study in the end, about design decisions taken when developing the Very Secure FTP Daemon.
Static Analysis AND Symbolic Execution
I was familiar with static analysis, but symbolic execution was new to me. Both are ways to analyze programs without running them. In the context of this course it is to avoid security vulnerabilities, but of course the techniques are equally applicable as ways to simply find bugs (whether security vulnerabilities or not).
In the section for static analysis, different concepts such as flow-, context- and path-sensitivity are explained. Symbolic execution attempts to find all possible inputs that can cause various parts of a program to be executed, and then tries to see if any of those values could cause an error. Some programs (like a parser) are easier to analyze in this way than others (for example, long-running programs with lots of external interactions), but it is quite impressive what can be achieved with this technique. A number of symbolic execution tools, like KLEE (used in project 3) are mentioned.
Fuzzing and Penetration Testing
Finally penetration testing and fuzz testing are explained. I have been impressed with fuzzing ever since I used it when I developed embedded VoIP systems. We used a homemade fuzzer to test our implementation of the H.323 protocol. The protocol was binary encoded, and we simple sent random bytes to it, which caused in to crash in several different ways. Even though we had tested the implementation before, we did not come up with all the weird cases that the random input data produced.
Even though I was familiar with both topics, it was nevertheless interesting to hear about the different tools that are available today.
There are six quizzes, one per week, with about 15 multiple choice questions in each (some with several correct answers). The questions test your understanding of the material from the video lectures. Some questions were quite easy, and some, especially where there were code snippets to answer questions about, were more difficult.
The quizzes are untimed, and you get two attempts on each quiz. There were always some things I got wrong, so I think I used up my two attempts on all of them. For some questions, the alternatives varied a bit between attempts, so you need to be careful. I like this method of allowing more than one attempt, since it makes you engage more with the material, which hopefully leads to you learning it better.
There are three projects, and for all of them you use virtual machines. You need to download VirtualBox and machine images. The instructions on how to do this are detailed and easy to follow.
The first project was the most difficult, but also the most fun. The virtual machine has a small C-program that has memory vulnerabilities, and the task is to exploit those. You have access to the source code and GDB, and the instructions guide you quite a bit on what to do. Nevertheless, being proficient with GDB, as well as knowing C and understanding memory layout and endian-ness is needed. In the end, by supplying he right input, you manage extract “secret” information – a lot of fun, and a great learning experience.
The second project uses BadStore.net, a web-site with numerous vulnerabilities running in the virtual machine. Here, the task is to create an SQL injection, and to log in as the administrator of the site. Again, there are detailed instructions to help you, as well as links to more documentation. While not as hard as the first project, it still took some effort for me, but it was also a lot of fun.
In the third project, the objective is to use a fuzzer (radamsa) and a symbolic executor (KLEE). While it was good to get some hands-on experience with those tools, the project itself was not much of a challenge. It mostly consists of running the tools and checking the output.
What Was Good?
The course is very well put together. Professor Hicks presentation style is calm and methodical (great for learning), and the videos alternate between regular presentation and voice-over segments. Each video is between 5 and 20 minutes, and the intro part on each is only a few seconds long (good when you watch many in a row).
Even though the course follows a schedule (which keeps me going), all the material is available from the beginning. This is good if you want to finish it in less than the allotted six weeks.
What Can Be Improved?
I would have liked to have the material in written form as well. It is much easier to review text than to re-watch videos. Also, a more detailed syllabus would have been good, so you know what you can expect to learn. At least this course had a written syllabus though, some other Coursera courses don’t have any, which is obviously worse (an introduction video is no substitute in my opinion).
There were also some interviews with people in the industry. In theory this is a great idea, but unfortunately I didn’t find the ones I watched very interesting. I have had the same experience for other courses (for example Financial Markets), so this problem is not unique to this course.
I watched all the videos on the train to and from work, using the Android Coursera app. It works quite well (and you can download the videos beforehand), but sometimes the in-video quizzes got repeated in other videos. Also, the app hung several times. This seems like problems that should have been solved by now.
All in all, a good course that didn’t take too much time to complete. Even though there is a lot more to learn on the subject of software security, it is still a good introduction that pretty much all developers would benefit from.
It also made me remember all the different kinds of problems you can have in C programs that simply disappear in other languages, and it made me happy that I have moved on from C/C++ to Java and then Python instead!