February 15, 2012

Well, I found a bunch of PDF documents on my disk today, which I wanted converted to JPEG. Now, Debian replaced ImageMagick in the past for GraphicsMagick, which is supposedly a bit faster and leaner than ImageMagick. So first you need to install graphicsmagick — or rewrite the script to use /usr/bin/convert instead.

The script basically takes every .PDF you have in your current working directory, creates a sub-directory, and then extracts each page of the PDF into a single JPEG image in that subdirectory.

#!/bin/bash

# Needs graphicsmagick
[ ! -x /usr/bin/gm ] && exit 1

for file in $PWD/*.pdf; do
        sudo mkdir $PWD/${file%*.pdf}
        sudo chown -R nobody.users $PWD/${file%*.pdf}
        sudo gm convert $PWD/$file 
           JPEG:"$PWD/${file%*.pdf}/${file%*.pdf}%02d.jpg"

        number="$( echo ${file%*.pdf} | cut -d. -f3 )"
        title="$( echo ${file%*.pdf} | cut -d. -f4 | sed "s,., ," )"
        series="$( echo ${file%*.pdf} | cut -d. -f1-2 | sed "s,., ," )"

        # Create the ComicInfo.xml file
        cat << EOF | sudo tee ${file%*.pdf}/ComicInfo.xml &amp;>/dev/null
<?xml version="1.0"?>
<ComicInfo xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Series>$series</Series>
  <Number>$number</Number>
  <Title>$title</Title>
</ComicInfo>
EOF

done

#!/bin/bash

# Needs graphicsmagick

[ ! -x /usr/bin/gm ] && exit 1

for file in $PWD/*.pdf; do

sudo mkdir $PWD/${file%*.pdf}

sudo chown -R nobody.users $PWD/${file%*.pdf}

sudo gm convert $PWD/$file

JPEG:"$PWD/${file%*.pdf}/${file%*.pdf}%02d.jpg"

number="$( echo ${file%*.pdf} | cut -d. -f3 )"

title="$( echo ${file%*.pdf} | cut -d. -f4 | sed "s,., ," )"

series="$( echo ${file%*.pdf} | cut -d. -f1-2 | sed "s,., ," )"

# Create the ComicInfo.xml file

cat << EOF | sudo tee ${file%*.pdf}/ComicInfo.xml &>/dev/null

<?xml version="1.0"?>

<Series>$series</Series>

<Number>$number</Number>

<Title>$title</Title>

</ComicInfo>

EOF

done

What I had to google for was basically on how to actually pad the output number. According to the man-page of gm, you just put %02d (or %03d, depending on how much pages your PDFs have at the max) in the desired output file name.

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29

Day: February 15, 2012

Convert a bunch of PDF documents to JPEG