Script programming Task 1
Consider the following situation:
You are given a directory from a webserver that contains multiple HTML files as well as multiple files of other types (e.g. images). The directory may contain subdirectories. You are asked to clean up this directory and its subdirectories by removing all unused files. A file is considered unused if it is not linked in any of the HTML files or if it is not specified explicitly that this file is in use. A file can be linked in a HTML file by either a href or img class="img-fluid" src. It does not matter whether the link is within a commented-out section of the HTML file. Do not remove subdirectories.
The files you remove must be stored in a user-specified directory. You need to maintain the directory structure when storing the removed files (e.g. if you remove two files a/b/a.txt and a/a.txt, both files must be stored under their directories).
You also need to create statistics output for the files that have been removed. The statistics output must be in text form and must contain, for each file ending, the number of files and the overall size of files with that ending. In addition, the number and size of all removed files needs to be reported. For this purpose, fileendings are not case-sensitive (i.e. .gif and .GIF are considered the same for statistics purposes). The statistics must also contain the name of the directory on which the cleanup script has been run.
T1.1: Write a Perl script to automate this task. The script must be run as follows:
cleanup.pl [CleanupDir] [RubbishBinDir] [ListOfUsedFiles]
where
[CleanupDir ] is the name of the directory to clean up.
[RubbishBinDir ] is the name of the directory where removed files are to be stored.
[ListOfUsedFiles ] is a list of 0 or more files that are in use and should not be deleted.
Your script must handle errors such as missing parameters, missing files, missing directories, missing access permissions, or input/output errors gracefully, that is, it must not crash on encountering them. Depending on what is appropriate, your script may handle errors by displaying a warning and continuing, by displaying an error message and aborting, or by taking other actions (e.g. creating a missing directory). Your script should try to fix problems, if this is possible.
T1.2: Discuss, in at most 300 words, why script programming is particularly suited for this problem, and describe two possible extensions of your script and how they could be achieved.
Assessment criteria for Task 1.1
1. The script will be run against the directory www/ in the provided archive example.zip. The script will be run in the directory example/ as follows:
rm -Rf report.txt rubbish-bin/
cleanup.pl www/ rubbish-bin/ index.html > report.txt
After the script has run, the contents of the directory www/ should be exactly like the contents of the directory www-after/, the directory rubbish-bin/ should exist and have the same contents the directory has now, and the file report.txt must exist and contain the same text as it contains now.
2. Your script will be tested against several other directories, using different combinations of parameters that are allowed according to the description.
3. Your script will be manually inspected for code quality.