Programming with Perl in a Unix Environment

Write a set of Perl/shell scripts to perform the following computations:

- First of all, download the following files in a directory of your choice
	(hint: create a new directory in your home dir)

http://www.di.unito.it/~botta/ribosome_biogenesis_fasta.txt
http://www.di.unito.it/~botta/Hs.data

	(hint: use the wget command (type 'man wget' to see options and syntax)

- Write a Perl script that reads the file ribosome_biogenesis_fasta.txt and
  generate a new file ribosome.txt in which sequence data are written on a 
  single line

- From the ribosome.txt file extract all sequences that 
  contain a TATA box
	(hint: split the file in many files each containing a single sequence,
	 then use grep to select those files conitaining a sequence with a TATA
	 box, concatenate the selected files)

- From the ribosome.txt file extract all sequences that 
  do not contain a TATA box
	(hint: split the file in many files each containing a single sequence,
	 then use grep to select those files conitaining a sequence without a TATA
	 box, concatenate the selected files)

- From the ribosome.txt file extract all sequences that 
  are longer than 300 bases

- From the ribosome.txt file extract all sequences that 
  are shorter than 300 bases

- From the Hs.data file extract all pieces of information about gene Hs.4. The
  format of the file is the following: info about a gene starts in a line with
  the Unigene name (ID Hs.## where ## is a number) and terminates in a line with
  only 2 slashes characters (//). You have to write a script that print only
  and all lines containing info about gene Hs.4

- Modify the previous script in such a way that it can be used to extract
  info about any of the genes in the file.

- Modify the previous script in such a way that it extract only info about
  ID, TITLE, GENE, EXPRESS and PROTSIM
	(hint: use grep to filter out non-interesting lines or to keep only
	 interesting lines)


- Write a Perl program that ask the user a Unigene id (e.g, Hs.10) and print
  the GENE name as output