[K12OSN] OT -- Not LTSP, but Linux Scripting Question

Tue Mar 22 19:41:33 UTC 2005

Dear Petre,

Thank you greatly for your time.  The script for checking for errors
will be VERY helpful.  I know that when I wrote the orig. post, I had
some ideas, but thought it would be a bit clunky.  (And it was, I
finished working on it last night -- that itch that had to be scratched)

I did add two lines to your error checker, for the line counter thing. 
At the very begining I just added:
   $linecount = 0;
   while (<>) {
      $linecount++;

It looks like I am going to have to look into perl scripting some more. 
I will definitly be using your scripts as a starting point.  Thanks
again.

And just as a comparison, here is what I found to work as a bash script.
 The perl script is so much nicer.   (removed all the code for "error
checks")

#!/bin/bash
# File to spilt Progress Report into multible files then combine them
# back together using a provided text file as the guide.

# Variable that I may need to change by hand, 
# but don't want to have to do on the command line
outputsuffix="04-05_Mid"
basedir="/home/kevin/Documents/PAVCS/Progress_Reports"

# assign better names to the arguements provided on the command line
teachername=$1
outputdirname=$1
inputfile=$2
parsefile=$1.txt
outputdir=$basedir/$outputdirname
tempdir=$outputdir/temp

mkdir -p $tempdir
echo "  Spliting $inputfile now please wait ...."
echo " "
pdftk "$inputfile" burst output "$tempdir/$teachername"_%03d.pdf
echo "  $inputfile split. Now combining files ..."
echo " "
# Find the number of lines in a $parsefile
noline=`wc -l $parsefile | cut -d" " -f1`
# Need to repeat the combine step for each row of the $parsefile
x=1   # initalize x for my counter
while [ "$x" -le "$noline" ]; do
   studentname=`sed -n ''$x'p' $parsefile | cut -d"," -f1` 
   stupg1=`sed -n ''$x'p' $parsefile | cut -d"," -f2`
   stupg2=`sed -n ''$x'p' $parsefile | cut -d"," -f3`
   stupg3=`sed -n ''$x'p' $parsefile | cut -d"," -f4`
   stupg4=`sed -n ''$x'p' $parsefile | cut -d"," -f5`
   echo "  Preparing report for $studentname ..."
   # Need to run a different pdftk command based on the number of pages
   if [ "$stupg4" != "" ]; then
       pdftk "$tempdir/$teachername"_"$stupg1".pdf \
       "$tempdir/$teachername"_"$stupg2".pdf \
       "$tempdir/$teachername"_"$stupg3".pdf \
       "$tempdir/$teachername"_"$stupg4".pdf \
       cat output "$outputdir/$studentname"_"$outputsuffix".pdf
     elif [ "$stupg3" != "" ]; then
       pdftk "$tempdir/$teachername"_"$stupg1".pdf \
       "$tempdir/$teachername"_"$stupg2".pdf \
       "$tempdir/$teachername"_"$stupg3".pdf \
       cat output "$outputdir/$studentname"_"$outputsuffix".pdf
     elif [ "$stupg2" != "" ]; then
       pdftk "$tempdir/$teachername"_"$stupg1".pdf \
       "$tempdir/$teachername"_"$stupg2".pdf \
       cat output "$outputdir/$studentname"_"$outputsuffix".pdf
     elif [ "$stupg1" != "" ]; then
       echo "  Every student should have at least 2 pages."
       echo "  $studentname has only 1 page, I will stop now."
       echo " "
       exit
     else
       echo "  $studentname does not have any pages, I will stop now"
       echo " "
       exit
   fi 
   echo "      $studentname completed."
   echo " "
   x=`expr $x + 1` # Increase counter by 1
done
# Need to clean-up the temp directory
rm -rf $tempdir/
# This is just here for troubleshooting
echo " "
echo "$0 script ran successully for $teachername class"

On Tue, 22 Mar 2005 08:44:01 -0600
Petre Scheie <petre at maltzen.net> wrote:

> I suggest you create two scripts: one to check the text file for any
> errors--spaces, non-three digit numbers, whatever--and the other to
> actually do the pdftk stuff once you have a clean text file.  I do
> alot of this kind of stuff and I've found that it is much more
> efficient to make sure you've got a clean file to work with, fix any
> problems beforehand, and then do your 'batch' process, than it is to
> try to write one script that will do merging for 50 lines, discover a
> problem, skip that line but be able to tell you it had a problem, or
> bail/die, in which case you have to fix the problem, start all over
> with the merging except you don't want to do the lines that you got
> through successfully on the first run, so you have to figure out where
> the error was... and so on; you get the idea.  So, with that in mind,
> I wrote two quick & dirty perl scripts that should do most of what you
> want. It could probably be done in a shell script, but it would be
> harder (which is how perl came about).
> 
> Script 1 looks for errors in the text file, as you described:
> 
> #!/usr/bin/perl -w
> # script written for Keven Squire on the K12LTSP list
> # make sure no lines have any spaces
> 
> $errors = 0;
> while (<>) {
>    chomp;
>    if ($_ =~ / /) {                      # check for spaces
>       print "$_ has a space in it\n";
>       $errors++;
>    }
>    (@array) = split(',',$_);
>    for ($i=1; $i <= $#array; $i++) {
>      if (length($array[$i]) != 3) {      # check for numbers that
>      aren't 3 digits
>        print "Field $i on line $_ is not three digits\n";
>        $errors++;
>      }
>    }
>    # Uncomment the next line if you want to display a running tally of
>    the errors# print "Errors is $errors\n";
> }
> ($errors > 0) && print "There were $errors errors\n";
> 
> ########### end script1 ################
> 
> Run this against your text file ('script1 textfile.txt') and it will
> tell you of any errors it finds and where.  You could put in a line
> counter to make locating the errors a bit easier to get to.  Fix the
> errors, run this script again, and repeat until you get no errors. 
> Then run script 2:
> 
> #!/usr/bin/perl -w
> 
> while (<>) {
>    $sourcefiles = "";
>    chomp;
>    (@array) = split(',',$_);
>    for ($i=1; $i <= $#array; $i++) {
>      $sourcefiles = $sourcefiles." ".$array[0]."_.".$i."pdf";
>    }
>    print "The input string will be $sourcefiles\n";
>    # system("/path/to/pdftk $sourcefiles cat output $array[0].pdf");
> }
> 
> For each line in the text file, this will split up the fields, create
> the pdf file names, and put it all into one string for use with the
> pdftk command.  I have the last line commented out because you should
> do a dry run with this first to make sure the $sourcefiles string will
> be what you want.  I don't have pdftk so I couldn't really test it,
> but the print command on the penultimate line will show what will be
> passed to pdftk.  HTH
> 
> Petre
> 
> 
> Kevin Squire wrote:
> > First, I apologize for the OT nature of the post, but I am sure many
> > of you will know / have done something like this.  Also, I really
> > did not know where else to post the question.  If you know somewhere
> > better, feel free to let me know. :-)
> > 
> > The Asst. Prin. has asked me do something very tedious (I did set
> > myself up for it, but I could use the "brownie points"), and I need
> > some help with the script that I am writing to make it less tedious.
> >  I have done
> > a fair bit of scripting, but nothing this advanced, so I need some
> > help.
> > 
> > Some general info:  Each teacher right now has a single MS Word
> > document with every one of his/her students progress reports.  (i.e.
> > I have one file called squire_pr.doc that is 116 pgs for 56
> > students).  The AP whats them to be a single document (2 or 3 pgs)
> > per student. He does not care if they stay in .doc format or not, as
> > long as they still look the same.  
> > 
> > So I have taken my squire_pr.doc and printed it to PDF
> > (squire_pr.pdf) so that I could use a program called pdftk (
> > http://freshmeat.net/projects/pdftk/ ) and split it up into 116
> > single page documents (each one called squire_###.pdf). Then I can
> > use the same program to join the appropriate pages back together
> > again (so squire_001.pdf and squire_002.pdf becomes
> > smithJ_04-05_mid.pdf).
> > 
> > I want to put together a script that will automate this stuff (to a
> > certain point).  The teacher sends me two files, the 1 large pdf
> > file and a text file with student name and the page numbers of the
> > PDF file that make that student's report.  Usually it will be 2
> > pages, but sometimes it will be 3 or maybe even 4.  The text file
> > would look something like:
> > 
> > smithJ,001,002
> > mouseM,003,004
> > gatesW,005,006,007
> > 
> > I have already done the basics on the script -- setting up
> > variables, assigning directories, making sure the correct files
> > exist already, etc.
> >  But I don't know how to (1) get the script to read from the text
> >  file,
> > (2) verify that the text file has now spaces and all numbers are in
> > ### format (3) assign variable to each field in the text file (4)
> > repeat for every line in the text file.
> > 
> > Some info/example/notes from my script:
> > =============================================
> >   $inputfile is the 1 big PDF file
> >   $tempdir/$teachername_%03d.pdf part creates a bunch of single
> >   PDF's
> >        with the name squire_001.pdf, squire_002.pdf, etc.
> >   $parsefile is the text file with student names, and page numbers 
> >        from the PDF that make up there report
> > ==========
> > pdftk $inputfile burst $tempdir/$teachername_%03d.pdf
> > 
> > # Now the hard part :-)
> > # Need to read the $parsefile and verify that:
> > #   there are no spaces and that all numbers are in ### format
> > #   if not just give an error of $prasefile has error (adding a line
> > #   number would be nice but not necessary
> > # and then assign the following:
> > #   $studentname from field 1
> > #   $stupg1      from field 2
> > #   $stupg2      from field 3
> > #   $stupg3      from field 4 for those that have 4 fields
> > #   $stupg4      from field 5 for those that have 5 fields
> > 
> > # then run the command 'pdftk INPUTFILES cat output COMBINEDFILE'
> > # for every single line in the text file
> > # where INPUTFILES would be $tempdir/$teachername_$stupgN.pdf where
> > # N could be 1,2,3 or 4 depending on what was found in text file
> > 
> > NOTE -- sorry this got so long, I hope it all makes sense.  And
> > Thank you in advance for you effort. 
> > 
> > 
> 
> _______________________________________________
> K12OSN mailing list
> K12OSN at redhat.com
> https://www.redhat.com/mailman/listinfo/k12osn
> For more info see <http://www.k12os.org>

--