Pipes and intro to unix utilities

We just learned about redirecting output to files using the > operator. In addition to redirecting a data stream to a file, we can also intercept that stream of information and perform another operation on it.

To do this we use the | operator which we call a pipe.

Pipes allow a user to string together a series of commands, a "command pipeline", and there are many useful utilites that are commonly installed on UNIX systems.

The use of these many small programs is only clear when we use it in concert with pipes, so we're going to learn about them at the same time.

cat

In the redirection exercise we wrote the contents of the command ls -F into a file called filelist. When we checked to see if it worked, we opened the file up in nano. That didn't take very long, but it can be a pain if you need to look through the contents of a number of files.

Now, we didn't need to edit filelist, right? We just wanted to look at it. This is the perfect job for cat!

cat dumps the contents of a file into stdout (by default).

Try it out on filelist to see what happens.

$ cat filelist
Desktop/
Documents/
Downloads/
Music/
Pictures/
Public/
Templates/
Videos/

Time to pipe!

Remember wc -l? We used it to count the lines in filelist. We did:

$ wc -l filelist
8 filelist

But instead of doing it this way, we can also pipe the contents of filelist to wc.

Try it out!

$ cat filelist | wc -l
8

What just happened?

We used cat to dump the contents of filelist to the screen (stdout). But then, instead of printing the contents, we intercepted them with the pipe and instead fed them into wc.

Skip filelist

We used > to redirect the contents of ls -F, then used cat to dump the contents of filelist and then piped those contents to wc. Are all of these steps necessary?

No! How about:

ls -F | wc -l
9

Any output can be piped to (nearly) any other program.

grep

grep is your best friend, you just don't know it yet. grep does stand for something, but it's long and confusing, so just accept that grep is grep.

grep searches through text files and streams for matches. It is one of the most powerful tools in the UNIX toolbox. It's also 42 years old. And we still use it. It's that good.

Try it out by piping the contents of ls -F and grepping for "Do"

$ ls -F | grep Do
Documents/
Downloads/

Exercise

There are obviously two files/folders that contain Do that grep has matched. But what if there were hundreds? How can we count the number of results from a grep?

Use ls, grep and any tools we've already learned about to get the command line to spit out the number of files/folders that contain Do in their title.

sort

In order to learn about sort, we need something to sort. We could download a file using the web browser, but why would we? Simpler to use wget on the command line!

wget https://raw.githubusercontent.com/barbagroup/essential_skills_RRC/master/resources/copa_america_goals

After wget finishes, use ls to check and make sure that the file has downloaded.

Ok! We have downloaded a list of goals scored in the 2016 Copa America, let's take a look at what the file contains:

$ cat copa_america_goals 
1 Miku
1 Neymar
1 Robinho
3 Sergio Aguero
2 Charles Aranguiz
3 Lucas Barrios
1 Edgar Benitez
2 Miller Bolanos
1 Andrew Carrillo
1 Douglas Costa
1 Christian Cueva
2 Angel Di Maria
1 Roberto Firmino
1 Jose Gimenez
1 Derlis Gonzalez
4 Paolo Guerrero
1 Nelson Haedo Valdez
2 Gonzalo Higuain
1 Mauricio Isla
2 Raul Jimenez
2 Marcelo Martins Moreno
1 Gary Medel
1 Lionel Messi
1 Jeison Murillo
1 Javier pastore
1 Claudio Pizarro
1 Ronald Raldes
1 Cristian Rodriguez
1 Marcos Rojo
1 Salomon Rondon
1 Alexis Sanchez
1 Thiago Silva
1 Martis Smedberg-Dalence
2 Enner Valencia
4 Eduardo Vargas
3 Arturo Vidal
2 Matias Vuoso

The first column is goals, then first names, then last names. And of course, some players only have one name. How many players scored 4 goals? We can grep for that, which will definitely work, but we can also sort the list easily using the sort command. Try it out!

$ cat copa_america_goals | sort
1 Alexis Sanchez
1 Andrew Carrillo
1 Christian Cueva
1 Claudio Pizarro
1 Cristian Rodriguez
1 Derlis Gonzalez
1 Douglas Costa
1 Edgar Benitez
1 Gary Medel
1 Javier pastore
1 Jeison Murillo
1 Jose Gimenez
1 Lionel Messi
1 Marcos Rojo
1 Martis Smedberg-Dalence
1 Mauricio Isla
1 Miku
1 Nelson Haedo Valdez
1 Neymar
1 Roberto Firmino
1 Robinho
1 Ronald Raldes
1 Salomon Rondon
1 Thiago Silva
2 Angel Di Maria
2 Charles Aranguiz
2 Enner Valencia
2 Gonzalo Higuain
2 Marcelo Martins Moreno
2 Matias Vuoso
2 Miller Bolanos
2 Raul Jimenez
3 Arturo Vidal
3 Lucas Barrios
3 Sergio Aguero
4 Eduardo Vargas
4 Paolo Guerrero

And we see that at the bottom of the sorted list there are two players who scored 4 goals in the Copa.

Now, sorting goal scorers by last name seems a little strange if we care about the number of goals scored. Let's save the list of goals but sort it by the number of goals. How should we do that?

$ cat copa_america_goals | sort > copa_goals_sorted

And remember, there's no output to the screen (stdout) because we redirected it to a new file. We can cat the new file to make sure it worked as we expect.

$ cat copa_goals_sorted 
1 Alexis Sanchez
1 Andrew Carrillo
1 Christian Cueva
1 Claudio Pizarro
1 Cristian Rodriguez
1 Derlis Gonzalez
1 Douglas Costa
1 Edgar Benitez
1 Gary Medel
1 Javier pastore
1 Jeison Murillo
1 Jose Gimenez
1 Lionel Messi
1 Marcos Rojo
1 Martis Smedberg-Dalence
1 Mauricio Isla
1 Miku
1 Nelson Haedo Valdez
1 Neymar
1 Roberto Firmino
1 Robinho
1 Ronald Raldes
1 Salomon Rondon
1 Thiago Silva
2 Angel Di Maria
2 Charles Aranguiz
2 Enner Valencia
2 Gonzalo Higuain
2 Marcelo Martins Moreno
2 Matias Vuoso
2 Miller Bolanos
2 Raul Jimenez
3 Arturo Vidal
3 Lucas Barrios
3 Sergio Aguero
4 Eduardo Vargas
4 Paolo Guerrero
$ cat copa_goals_sorted | grep Alexis
1 Alexis Sanchez