Chapter 15 Organizing Files

This unit shows how we can utilize Python to organize files on the hard drive, e.g., traversing the directory, copying, renaming, moving, or compressing files automatically.

15.1 shutil

The shutil (shell utility) module helps us copy, move, rename, and delete files in Python.

To copy file and directory:

  • shutil.copy(): to copy a single file
  • shutil.copytree(): to copy an entire folder and every folder and file contained in it

To remove file and directory:

  • os.unlink(): to delete the file
  • os.rmdir(): to delete an empty folder
  • shutil.rmtree(): to delete a non-empty folder and all files and folders it contains

To move file and directory:

  • shutil.move(): to move the file or folder at the path source to the path destination

All the shutil file/directory operation functions will return a string of absolute path of the new files/directories locations.

import shutil

Ways to get the path root or the current working directory:

from pathlib import Path
print(Path.home())
/Users/alvinchen
import os
print(os.getcwd())
/Users/alvinchen/Library/CloudStorage/Dropbox/NTNU/Programming_Linguistics/Programming_Linguistics_bookdown
t = shutil.copy('demo_data/corp-alice.txt', Path.home())
os.unlink(t) # clean up
t = shutil.copytree('demo_data', Path.home()/'demo_data')
t
PosixPath('/Users/alvinchen/demo_data')
os.rmdir(t)
[Errno 66] Directory not empty: '/Users/alvinchen/demo_data'
#shutil.rmtree(t)

Be very careful when using these “removing” functions. It is often a good idea to run these data-removing functions with these calls commented out and with print() calls added to double check the file/directory names to be deleted.

for f in Path(Path.home()/'demo_data').glob('*.txt'):
  print(f)
/Users/alvinchen/demo_data/data-chinese-poem-big5.txt
/Users/alvinchen/demo_data/corp-alice.txt
/Users/alvinchen/demo_data/chinese_big5.txt
/Users/alvinchen/demo_data/data-chinese-poem-utf8.txt
/Users/alvinchen/demo_data/dict-ch-idiom.txt
/Users/alvinchen/demo_data/chinese_utf8.txt
/Users/alvinchen/demo_data/data-sentences.txt
/Users/alvinchen/demo_data/chinese_gb2312.txt
/Users/alvinchen/demo_data/data-chinese-poem-gb2312.txt
  # os.unlink(f)
## clean up the earlier copied folder
shutil.rmtree(Path(Path.home()/'demo_data'))

Because the data-removing functions in shutil irreversibly delete files and folders, they can be dangerous to use. Another third-party module, send2trash, can be much safer because it will send files and folders to the computer’s trash or recycle bin instead of permanently deleting them. For beginners, this can be very helpful for life-saving files/folders.

Exercise 15.1 Combine the first page of each PDF in a directory into one new PDF.