A The Shell

A.1 Why do you need to know shell commands?

The command line has many great advantages that can make you a more efficient and productive data scientist. Janssens (2014) has nicely summarized the strengths of command lines in five points:

  • The command line is agile
  • The command line is augmenting
  • The command line is scalable
  • The command line is extensible
  • The command line is ubiquitouos

A.2 Shebang Line

Like in R console, you can interact with the R console line by line or you can package all your R scripts in a file. For command line, you can also combine several shell commands into a script file.

For a shell script file, you need a shebang line, which is the character sequence consisting of the characters number sign and exclamation mark (#!) at the beginning of a script. This line would indicate to the shell engine which interpreter (language environment) is needed to parse the script file. A shell script often takes a shebang line as below.

#!/bin/sh
#!/bin/bash

A.3 Basic Shell Commands

The most basic commands are listed below:

  • pwd (print working directory). Shows directory or “folder” you are currently operating in. This is not necessarily the same as the R working directory you get from getwd().
  • ls (list files). Shows the files in the current working directory. This is equivalent to looking at the files in your Finder/Explorer/File Manager. Use ls -a to also list hidden files, such as .Rhistory and .git.
  • cd (change directory). Allows you to navigate through your directories by changing the shell’s working directory. You can navigate like so:
    • go to subdirectory foo of current working directory: cd foo
    • go to parent of current working directory: cd ..
    • go to your “home” directory: cd ~ or simply cd
    • go to directory using absolute path, works regardless of your current working directory: cd /home/my_username/Desktop. Windows uses a slightly different syntax with the slashes between the folder names reversed, \, e.g. cd C:\Users\MY_USERNAME\Desktop.
      • Pro tip 1: Dragging and dropping a file or folder into the terminal window will paste the absolute path into the window.
      • Pro tip 2: Use the tab key to autocomplete unambiguous directory and file names. Hit tab twice to see all ambiguous options.
  • Use arrow-up and arrow-down to repeat previous commands. Or search for previous commands with CTRL + r.
  • which Show the full path of a shell commands
    • which python: Check which version of python your system uses
    • which r: Check which version of R your system uses
  • cp Copy files and directories
  • rm Remove files and directories
  • mv Move files and directories
  • mkdir Make directories

A.5 References

If you are interested in more functions and potentials of shell commands, I would highly recommend the book Data Science at the Command Line.

References

Janssens, J. (2014). Data science at the command line: Facing the future with time-tested tools. " O’Reilly Media, Inc.".