mihalis's blog

mihalis's picture

What is Systems Programming?

Systems programming is a special area of programming on UNIX machines. Most commands that have to do with System Administration tasks such as disk formatting, network interface configuration, module loading, kernel performance tracking, etc. are implemented using the techniques of Systems Programming. Additionally, the /etc directory, which can be found on all UNIX systems, contains plain text files that deal with the configuration of a UNIX machine and its services and are also manipulated using systems software.

You can group the various areas of systems software and related system calls in the following sets:

  • File I/O: this area deals with file reading and writing operations, which is the most important task of an operating system. File input and output must be fast and efficient but, above all, it must be reliable.
  • Advanced File I/O: apart from the basic input and output system calls, there are also more advanced ways to read or write to a file including asynchronous I/O and non-blocking I/O.
  • System Files and Configuration: this group of systems software includes functions that allow you to handle system files such as /etc/passwd and get system specific information such as system time and DNS configuration.
  • Files and Directories: this cluster includes functions and system calls that allow the programmer to create and delete directories and get information such as the owner and the permissions of a file or a directory.
  • Process Control: this group of software allows you to create and interact with UNIX processes.
  • Threads: when a process has multiple threads, it can perform multiple tasks. However, threads must be created, terminated and synchronized, which is the purpose of this collection of functions and system calls.
  • Server Processes: this set includes techniques that allow you to develop server processes, which are processes that get executed in the background without the need for an active terminal.
  • Interprocess Communication: this set of functions allows processes that run on the same UNIX machine to communicate with each other using features such as pipes, FIFOs, message queues, semaphores and shared memory.
  • Signal Processing: signals offer processes a way of handling asynchronous events, which can be very handy. Almost all server processes have extra code that allows them to handle UNIX signals using the system calls of this group.
  • Network Programming: this is the art of developing applications that work over computer networks with the help of TCP/IP and is not Systems programming per se. However, most TCP/IP servers and clients are dealing with system resources, users, files and directories so most of the times you cannot create network applications without doing some kind of Systems programming.

    The challenging thing with Systems Programming is that you cannot afford to have an incomplete program; you can either have a fully working, secure program that can be used on a production system or nothing at all. This mainly happens because you cannot trust end users and hackers!

    Want to learn about Systems Programming using Go?
    Get my book Go Systems Programming from Packt or from Amazon.com.

  • mihalis's picture

    Concurrency and Parallelism

    It is a very common misconception that Concurrency and Parallelism is the same thing, which is far from true! Parallelism is the simultaneous execution of multiple things whereas Concurrency is a way of structuring your components so that they can be independently executed when possible.
    It is only when you build things concurrently that you can safely execute them in parallel, when and if your operating system and your hardware permits it. The Erlang programming language did this a long time ago, long before CPUs had multiple cores and computers had lots of RAM.
    In a valid concurrent design, adding concurrent entities makes the whole system run faster because more things can run in parallel. So, the desired parallelism comes from a better concurrent expression and implementation of the problem. The developer is responsible for taking concurrency into account during the design phase of a system and benefit from a potential parallel execution of the components of the system. So, the developer should not think about parallelism but about breaking things into independent components that solve the initial problem when combined.
    Even if you cannot run your functions in parallel on a UNIX machine, a valid concurrent design will still improve the design and the maintainability of your programs. In other words, Concurrency is better than Parallelism!

    Want to learn more about the Go Concurrency model?
    Get my book Go Systems Programming from Packt or from Amazon.com.
    Or get my other book Mastering Go from Packt or from Amazon.com.

    mihalis's picture

    Copying a file in Go

    This blog post will show some of the ways that you can copy a file in Go.

    Using io.Copy()
    The simplest way to copy a file is by using the io.Copy() function. You can find the entire Go code at https://github.com/mactsouk/fileCopyGo/blob/master/ioCopy.go.
    The most important part of the utility is the next Go code:

    nBytes, err := io.Copy(destination, source)

    So, with just a single call, you can copy a file. Although this is fast, it does not give you any flexibility or any control over the whole process.

    Using ioutil.WriteFile() and ioutil.ReadFile()
    You can copy a file in Go by using ioutil.WriteFile() and ioutil.ReadFile(). You can find the entire source file at https://github.com/mactsouk/fileCopyGo/blob/master/readWriteAll.go.
    The most important part of readWriteAll.go is the next two Go statements:

    input, err := ioutil.ReadFile(sourceFile)
    err = ioutil.WriteFile(destinationFile, input, 0644)

    The first statement reads the entire source file whereas the second statement writes the contents of the input variable to a new file.
    Notice that reading the entire file and storing its contents to a single variable might not be very efficient when you want to copy huge files. Nevertheless, it works!

    Using os.Read() and os.Write()
    The last technique uses os.Read() for reading small portions of the input file into a buffer and os.Write() for writing the contents of that buffer to the new file. Notice that the size of the buffer is given as a command line argument, which makes the process very flexible.
    You can find the entire code at https://github.com/mactsouk/fileCopyGo/blob/master/cpBuffer.go.
    The most important statements of the implementation of the Copy() function are the next:

    buf := make([]byte, BUFFERSIZE)
    n, err := source.Read(buf)
    _, err := destination.Write(buf[:n])

    The first statement creates a byte slice with the desired size. The second statement reads from the input file whereas the third statement writes the contents of the buf buffer to the destination file.

    Want to learn more about File I/O in Go?
    Get my book Go Systems Programming from Packt or from Amazon.com.

    Want to be able to benchmark File I/O operations?
    Get my book Mastering Go from Packt or from Amazon.com.

    mihalis's picture

    The Go Garbage Collector (GC)

    Garbage Collection is the process of freeing memory space that is not being used. In other words, the garbage collector sees which objects are out of scope and cannot be referenced any more and frees the memory space they consume. This process happens in a concurrent way while a Go program is running, not before or after the execution of a Go program. The operation of the Go GC is based on the tricolor algorithm.

    Strictly speaking the official name for the algorithm used in Go is tricolor mark-and-sweep algorithm, can work concurrently with the program and uses a write barrier. This means that when a Go program runs, the Go scheduler is responsible for the scheduling of the application and the garbage collector as if the Go scheduler had to deal with a regular application with multiple goroutines!

    The core idea behind this algorithm belongs to Edsger W. Dijkstra, Leslie Lamport, A. J. Martin, C. S. Scholten and E. F. M. Steffens and was first illustrated on a paper named On-the-fly garbage collection: an exercise in cooperation.

    The primary principle behind the tricolor mark-and-sweep algorithm is that it divides the objects of the heap into three different sets according to their color, which is assigned by the algorithm. The objects of the black set are guaranteed to have no pointers to any object of the while set. However, an object of the white set can have a pointer to an object of the black set because this has no effect on the operation of the GC! The objects of the grey set might have pointers to some objects of the while set. Last, the objects of the white set are the candidates for garbage collection.

    So, when the garbage collection begins, all objects are white and the garbage collector visits all the root objects and colors them grey – the roots are the objects that can be directly accessed by the application, which includes global variables and other things on the stack – these objects mostly depend on the Go code of a particular program. After that, the garbage collector picks a grey object, makes it black and starts searching if that object has pointers to other objects of the white set. This means that when a grey object is being scanned for pointers to other objects, it is colored black. If that scan discovers that this particular object has one or more pointers to a white object, it puts that white object to the grey set. This process keeps going for as long as there exist objects in the grey set. After that, the objects in the white set are unreachable and their memory space can be reused. Therefore, at this point the elements of the white set is said to be garbage collected.

    Go allows you to manually initiate a garbage collection by putting a runtime.GC() statement in your Go code. However, have in mind that runtime.GC() will block the caller and it might block the entire program, especially if you are running a very busy Go program with many objects. This mainly happens because you cannot perform garbage collections while everything else is rapidly changing as this will not give the garbage collector the opportunity to clearly identify the members of the while, black and grey sets! This garbage collection status is also called garbage collection safe-point.

    Want to learn more about the Go Garbage Collector? Get my book Mastering Go at https://www.packtpub.com/networking-and-servers/mastering-go.
    Want to start writing UNIX system tools? Get my book Go Systems Programming at https://www.packtpub.com/networking-and-servers/go-systems-programming or from Amazon.com (https://www.amazon.com/Go-Systems-Programming-Master-programming/dp/1787...).

    mihalis's picture

    Top 20 of words in my Go code

    I wanted to find out the most popular words in my Go code, so I have decided to use the UNIX shell to do so.

    $ find dir -name "*.go" -exec cat {} \; | tr ' ' '\n' | sed '/^$/d' | sed 's/[ \t]*//' | sed 's/ //g' | sort | uniq -c | sort -rn | head -20

    At the time of writing this, the output of the previous command was the following:

    1815 {
    1368 :=
    831 }
    628 err
    608 }
    573 =
    552 func
    488 if
    434 nil
    368 !=
    298 package
    298 import
    295 )
    292 (
    290 main()
    290 main
    284 "fmt"
    267 ==
    251 }
    241 for

    So, the most popular word is err, which makes perfect sense! Then it was func, if, nil as well as package and import.

    After that I used Excel to make a simple graph.



    Subscribe to RSS - mihalis's blog