Google Drive Backup using Rclone

For 8 TB of data storage on Google Drive, plus my own Google organization, I am paying £30/month, which is a pretty good deal.

I wanted to use this space for backing up my NAS, but it was proving difficult. The program I was recommended for Linux backup, Duplicati, was not the best for this purpose. My backup runs would not complete, they would be slow, and full of syncing errors.

Until I discovered rclone.

rclone was much better than Duplicati at backing up my NAS to Google Drive.

This is the script I use to sync the NAS content to Google Drive: https://github.com/wordswords/dotfiles/blob/568a8768154de8609a01b26560373ec5ca0eab85/bin/rclone-backup.sh

It works just like rsync, but with a progress indicator. I’ve added it to my crontab and it syncs everything up weekly.

Sleep Tracking

I really like my Withings Mattress-Based Sleep Tracker.

I keep unusual sleeping hours a lot of the time, as I’m undergoing treatment, and a lot of the medication I’m taking effect my sleep. Even when I’m relatively healthy, some of the medication I’m on for a long-term condition also affects my sleep.

What’s more, I enjoy a non-standard sleeping schedule. I find I get more work done in the early morning or late at night. As I work from home 100% this is not as problematic as it used to be.

It is still good to monitor your sleep though, when you are ill or like me, when you have odd sleeping hours.

Like a lot of these smart health things, it is not perfect. Sometimes it will create erroneous data, and then correct itself a few days later, such as when I was in hospital, and it registered someone sleeping in my bed 😀 My wife was quite surprised when I told her. Thankfully for both of us it corrected itself a day later.

Also it comes up with all this wake-up time and time spend in bed not sleeping, data which I don’t really trust. I don’t think it’s reliable enough. The core metric, amount of time asleep, is fairly accurate though, within a few hours. When averaged over 1-2 weeks it does give you a good general idea of how much sleep you’re getting.

Sleep is so important for recovery and feeling well. You can compare the statistics of two fortnights, one when I was being pumped full of steroids and hardly sleeping at all, and one when I was relatively medication free and relaxed:

Not Relaxed:

As you can see, I was getting max about 3 hours a night.

Relaxed:

Although the latest week is not full, it’s a much more healthy 6 hours or so on average.

I find if I am logging 2-3 hours a night sleep for more than a couple of weeks, something is seriously wrong, however I might feel at the time, and I should see a medical professional about it. It is easy to forget this when you might be energised and “doing all the things” but there is a danger that you might be manic or hyperactive and that really isn’t good long term. So this is why I recommend the sleep tracker.

Home Network Design

Did some network design for my home network. This will ensure that my new Virgin Media 1100mbps/50mps speed is fully utilised without any bottlenecks and will futureproof things up to at least 2.5Gbs/speed connections.

I should be able to load balance both internet connections to utilise them both simultaniously in the future, but for now I want to be able to get greater than 1000mbit/s speed on a single device otherwise there is not much point in speed increases past 1000mbits. Also there is a 200-300mbits speed increase possible from my existing VM hardware that is not utilised in the current network.

I will be using this method to extract the full speed from the VM hub: https://tech.msh100.uk/virgin/networking/2020/10/17/virgin-media-greater-than-gbit/

Since I have drawn up the diagram, there have been a few improvements suggested – using fibre and fibre modules instead of CAT 7 and more expensive modules to convert to CAT 7.

Working with Jenkinsfiles in VIM

Using Jenkinsfiles in repositories for declarative pipelines, which is an example as infrastructure as code, is absolutely the right thing to do, in my opinion.

However the tooling for debugging and error checking Jenkinsfiles is currently quite difficult.

Setup Jenkinsfile linter

First of all you need to make sure to be able to access the Jenkinsfile linter. This linter is processed on the Jenkins server itself, unusually. There is a configuration option that requires adminstrator access to the Jenkins server.

There are a number of options to access the linter, but Jenkins officially recommends the ssh server option with preshared ssh public keys as outlined here:

https://www.jenkins.io/doc/book/pipeline/development/

Once you have set it up, you can pipe Jenkinsfiles through ssh to the linter and it will lint the Jenkinsfile for you:

ssh jenkins.dev.mycompany.co.uk -p1337 declarative-linter < Jenkinsfile

(change jenkins.dev.mycompany.co.uk and the port number to match your configuration of course)

And it will output the errors in the file.

Jenkinsfile syntax highlighting in VIM

VIM won’t automatically syntax highlight Jenkinsfiles because they don’t have an extension that it recognises. This is easily fixed with the following autocommand added to your .vimrc though:

autocmd BufRead,BufNewFile Jenkinsfile set filetype=groovy

This will ensure that Jenkinsfiles have some measure of syntax highlighting when loaded, as they are valid Groovy files.

Jenkinsfile linting on save in VIM

Once you have the linter running on a ssh port (see above) you can add this line to your .vimrc to have the linter run whenever a Jenkinsfile is saved, allowing you to debug your changes when you save them:

autocmd BufWritePost Jenkinsfile !ssh jenkins.dev.myompany.co.uk -p1337 declarative-linter < %

(change jenkins.dev.mycompany.co.uk and the port number to match your configuration of course)

This setup should make it significantly easier to spot and fix errors in your Jenkinsfile before it gets to the git commit stage.

The Persuit of Happiness and the Hill-Climbing Problem

It seems that for a lot of people, perhaps the vast majority, they cannot expect to be 100% happy all the time. Even those incredibly successful celebrities and incredibly rich CEOs and politicians are not happy 100% of the time.

So is it a good idea to seek 100% happiness right away? Or should we always try and select the ‘least worst’ situation? If we keep making progress and optimse our happiness levels through constantly selecting the ‘least worst’ situation whenever we have a descision then it seems to me we will maximise our happiness potential in the long run.

Once we have reached the plateau of happiness for our chosen path though, and we are not getting any happier or there is no way to continue improving our happiness, then the only way of potentially improving our happiness is to do something radical and leap to another approach.

For those with training in A.I. or optimisation algorithms, I see this conceptually as see this as a hill-climbing local optimsation plateau problem. You can optimise locally your happiness level, but you need to take a risk when you have plateau’d in your happiness along your current path, to do something completely different and jump to another hill which might offer a higher level of happiness in the long run, although it may not. The amount of risk you take is dependant on the knowledge you have (the heuristics) that the next ‘hill’ or path you take will lead to a better outcome. But.. if it won’t, and all the paths available to you have been analysed, and you will not make yourself happier, you have won at the happiness problem. You are the happiest you can make yourself and no amount of seeking happiness improvements in your life will help.

This can be extended to job hopping, relationships.. of course there is an amount of risk as we never have a perfect heuristic but the concept is the same. Through constantly hopping hills and constantly seeking 100% happiness you can make yourself less happy than you thought. Maybe you were never born to be 100% happy? Maybe none of us ever were, we can just do the best with the oppertunities we were given.

I don’t know if I explained it well but this does a better job:

https://en.wikipedia.org/wiki/Hill_climbing

Setting up Enlightenment window manager on Ubuntu 21.10

Enlightenment is a very underrated window manager choice, in my opinion. It still looks very pretty and is very configurable. There is a packaging bug with the latest version of Enlightement in the 21.10 repo which means it doesn’t work correctly. The fix is to set the suid bit on the ‘enlightenment_system’ binary. So:

  1. sudo apt install enlightenment
  2. sudo find /usr/lib -type f -name enlightenment_system -exec sudo chmod 4755 {} \;
  3. Log out of Ubuntu
  4. Click the settings cog in the bottom right to change your Window manager to enlightenment
  5. Log on and the settings wizard will help you configure Enlightenment

How to fix Plex Media Servers ‘Various Artists’ problem using Beets.io and other tools

The Problem

As of the day of posting, Plex Media Server has a suboptimal handling of Various Artists compilations. Unless you follow BOTH of these rules, described clearly on the Plex support here under ‘Various Artists’, then they won’t appear collected under a particular album, they will be spread out filed under the different artists that created them. I think it should only require one of these rules, but it is not up to me. The problem is that release metadata available from the online metadata databases, often doesn’t follow these rules – lots of DJ mix compilations for example, do not set albumartist tag to Various Artists from the online metadata databases.

Also you may want to adopt a different file structure for your music than what is recommended and essential for Plex. I would agree that ONE of these measures should be required, and I would argue that setting album artist would be enough for Plex to figure out this problem, but you also need to structure your music also under the way they suggest, you won’t get the desired result of compilations being collected as a single album and in order.

My Solution

Firstly, you’re going to have to manually add the ‘Various Artists’ content to the ‘albumartist’ tag for all tracks in a compilation that doesn’t have them. Currently I’ve only found a way to do this by hand, would be very interested in any automated approaches, or if someone wants to write a beets.io plugin that would be amazing. You can do this with any tag editor, but I wrote my own solution which is faster for what I want to do and easy to use on my existing Beets.io setup and database which has a lot of these problem compilations already. Because I am using Windows I wrote a batch file which uses several third party tools to fix the metadata fast. You will have to download and setup the tools seperately, details are in the batch file comments.

Secondly, you’re going to have to make sure that Beets.io moves your files into the way that Plex Media Server recommends. This is my current Beets.io config file, the important bits here are the file rename and move structure, as per the support page, it has to be something along the lines of:


$albumartist/$album/$artist - $title

My Process

  1. ‘beet ls <query string> to identify tracks that need fixing. ‘beet ls fabric’ for example, got almost all of my dozens of Fabric mix CD tracks, which sped up the renaming significantly than using a GUI editor to go in and manually change each album. You have to make sure that the query string selects only the tracks you want to modify, unless you want to go back and edit those few tracks that got caught in the process later.
  2. I send that query string to my batch file which goes through the ‘beets ls -p ‘ full path results iteratively and runs a couple of command line tag editors on them, the first works for all MP3s and the second for all FLACs, e.g. ‘.\tag-various-artists.bat fabric’
  3. ‘beet update’ to update my Beets.io database with the new metadata I’ve changed with the batch script.
  4. ‘beet dup -d’ to delete any duplicates that have arisen.
  5. ‘beet move -p’ to preview the file moves into the new directory structure that is required by Plex.
  6. When I’ve scanned through and am happy with the file moves – ‘beet move’ to do the moving.
  7. I then backup my files through robocopying to an external drive, but this is not essential.
  8. Make sure you have checked ‘prefer existing metadata’ in the Plex options for your music library.
  9. Set Plex to rescan your libraries metadata.

Tools

The batch file I use, my beets.io config and other tools I use to organise my music collection on Windows 10.

The Results

All my Fabric mix CDs are now properly collected under one ordered album each on Plex:

Home Cinema Setup

I have been gradually adding parts to my PC so that it functions better for watching movies and gaming. Here are the components that I have used:

Monitor – BenQ EX3501R 35″ Ultra WQHD Curved Monitor

This monitor is great for watching movies and gaming as it is ultra-wide at an aspect radio of 21:9, 3440×1440, a VA panel with high contrast, and the monitor is slightly curved so that you get a cinematic experience.

Surround Sound Speakers – Logitech Z906 5.1 Speakers

This is a good set of 5.1 surround sound speakers with a separate subwoofer. I connect it from my motherboard’s optical out via ADAT for the best connection. I have the rear speakers mounted on the wall behind me, and the front speakers on the desk in front of me angled towards my seat.

Phillips Hue Bulbs and Dynamic Lighting

I have three Phillips Hue bulbs near the monitor which I have added to a ‘Hue Entertainment Group’ which means their colours can be rapidly changed with applications that use the Phillips Hue Entertainment API. One such application is ‘hueDynamic for Hue’ which is a Windows 10 application that monitors the colours on the pixels at the edges of the monitor, and changes the lights next to each edge according to the colours on that edge. This means I can have some dynamic lighting when watching movies.

Source of 4K Video

Even though my monitor doesn’t quite support 4K, using VLC means I can adjust the aspect ratio so that it will re scale the 4K video onto the monitor’s 3440×1440 dimensions, making 4K the best fit for my monitor. However my wireless connection is not currently fast enough to stream 4K video, so I use the ‘4K Video Downloader’ app to download free 4K videos from YouTube so I can play them on VLC.

Putting It All Together

(Click on diagram to enlarge)

So, I download 4K videos from Youtube, play them on VLC which adjusts the 4K content to fit my monitor. The audio goes to my surround sound setup which creates a virtualised surround sound output to my speakers, and the hueDynamic app running in the background on my PC samples the colours are the left and right areas of my monitor, and dynamically changes the colour of the Hue bulbs as the video plays. This creates an immersive ‘Home Cinema’ experience.

 

Node.js HTTPS server with self-signed certificate creation on openssl 0.9.8zh with node.js 7.10.0

I couldn’t find a concise guide to setting this up quickly so thought it was worth a post. To quickly get something working and create a https server using the above versions of openssl and node.js, do the following:

 Generate self-signed server certificate with no password

sudo openssl req -x509 -newkey rsa:2048 -keyout ./csr.pem -out server.crt -days 3001 -nodes

Use this node.js code to setup a server quickly

const https = require('https');
const fs = require('fs');

const options = {
key: fs.readFileSync('csr.pem'),
cert: fs.readFileSync('server.crt')
};

https.createServer(options, (req, res) => {
res.writeHead(200);
res.end('hello world\n');
}).listen(8000);

Go to https://localhost:8000 and accept the certificate, you should see ‘hello world’

Switching Audio between two Soundcards

headphones_accept

If you have two soundcards – maybe a normal soundcard for your speakers and maybe a headset with its own audio interface – you will want some way of switching all audio between the two. This is an great little open-source tool to do just that. It works on Mac, and Windows 7 up to 10:

https://soundswitch.codeplex.com/

Get rid of the ‘most visited sites’ Grid on Chrome

Not that I particularly look at anything weird, but when I’m showing someone my computer I don’t want to have my ‘most visited sites’ pop up when I fire up Chrome. The following Chrome extension will get rid of the default loading page, and just display a blank page:

https://chrome.google.com/webstore/detail/empty-new-tab-page/dpjamkmjmigaoobjbekmfgabipmfilij

A* Algorithm implementation in Python

pathfinding

Lately I’ve had the idea of creating a text-based Roguelike in C++. This lead me on to think about the game AI experiments that I worked during my degree in Computer Science and A.I.. Essential to game AI is the notion of pathfinding, or finding a path from ‘A’ to ‘B’, past any obstacles that get in the way. One way to do this is to use the A* algorithm. I decided to implement an A* pathfinding algorithm for possible use in a Roguelike later, and chose the pseudocode from the Wikipedia example to implement.

The program finds a path from the top right hand corner to the top left, avoiding impassable ‘7’ obstacles. The ‘*’ are the steps along the path. The algorithm is guaranteed to find the shortest path between the goal and the start, which means it can optimally solve any solvable maze, given time.

This is a sample board with obstacles setup:

00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000777000000000000000
00000000000000000000000000000000777000000000000000
00000000000000000000000000000000777000000000000000
00000000000000000000000000000000777000000000000000
00000000000000000000000000000000777000000000000000
00000000000000000000000000000000777000000000000000
00000000000000000000000000000000777000000000000000
00000000000000000000000000000000777000000000000000
00000000000000000000000000000000777000000000000000
00000000000000000000000000000000777000000000000000
00000000000000000000000000000007777000000000000000
00000000000000077777777777777777700000000000000000
00000000077777777777777777777777700000000000000000
00000077777777777700000000000000000000000000000000
77777777777000000000000000000000000000000000000000
77777777000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
77777777777777777707777777777777777777777777777777
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
70777777777777777777777777777777777777777777777777
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
77777777777777777777700000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000
00000000000000000000777777777777777777777707777777
00000000000000000000700000000000000000000000000000
00000000000000000000700000000000000000000000000000
00000000000000000000700000000000000000000000000000
00000000000000000000700000000000000000000000000000
00000000000000000000700000000000000000000000000000
00000000000000000000700000000000000000000000000000
00000000000000000000700000000000000000000000000000
00000000000000000000700000000000000000000000000000

This is the path found (the ‘*’s):

**000000000000000000000000000000000000000000000000
0***********************************00000000000000
00000000000000000000000000000000777*00000000000000
00000000000000000000000000000000777*00000000000000
00000000000000000000000000000000777*00000000000000
00000000000000000000000000000000777*00000000000000
00000000000000000000000000000000777*00000000000000
00000000000000000000000000000000777*00000000000000
00000000000000000000000000000000777*00000000000000
00000000000000000000000000000000777*00000000000000
00000000000000000000000000000000777*00000000000000
00000000000000000000000000000000777*00000000000000
00000000000000000000000000000007777*00000000000000
00000000000000077777777777777777700*00000000000000
00000000077777777777777777777777700*00000000000000
00000077777777777700000000000000000*00000000000000
77777777777000000000000000000000000*00000000000000
77777777000000000000000000000000000*00000000000000
00000000000000000000000000000000000*00000000000000
00000000000000000000000000000000000*00000000000000
00000000000000000000000000000000000*00000000000000
00000000000000000000000000000000000*00000000000000
00000000000000000000000000000000000*00000000000000
000000000000000000******************00000000000000
777777777777777777*7777777777777777777777777777777
000000000000000000*0000000000000000000000000000000
0******************0000000000000000000000000000000
7*777777777777777777777777777777777777777777777777
0*****************************00000000000000000000
00000000000000000000000000000**0000000000000000000
000000000000000000000000000000**000000000000000000
0000000000000000000000000000000*000000000000000000
0000000000000000000000000000000***0000000000000000
000000000000000000000000000000000*0000000000000000
000000000000000000000000000000000**000000000000000
0000000000000000000000000000000000**00000000000000
77777777777777777777700000000000000***000000000000
0000000000000000000000000000000000000**00000000000
00000000000000000000000000000000000000*00000000000
00000000000000000000000000000000000000**0000000000
000000000000000000000000000000000000000****0000000
000000000000000000007777777777777777777777*7777777
000000000000000000007000000000000000000000**000000
0000000000000000000070000000000000000000000*000000
0000000000000000000070000000000000000000000***0000
000000000000000000007000000000000000000000000**000
0000000000000000000070000000000000000000000000*000
0000000000000000000070000000000000000000000000***0
000000000000000000007000000000000000000000000000*0
000000000000000000007000000000000000000000000000**

Here is source code

Amit’s A* pages were incredibly useful in developing this.

(Perhaps one day I will do a flashy JavaScript version!)

Setting up Kindlefire HDX for Development under Ubuntu 12.04

amazon_kindle_fire_hdx1

I wanted to get a Kindlefire HDX running under Ubuntu 12.04 with adb.

First I needed to setup the udev rules:

1. Edit /etc/udev/rules.d/51-android.rules as root, and add the following line (create this file if it does not exist):

SUBSYSTEM=="usb", ATTRS{idVendor}=="1949", MODE="0666"

2. Change the permission of this file by executing the following command as root:

chmod a+r /etc/udev/rules.d/51-android.rules

3. Reload the rules by executing the following command as root:

udevadm control --reload-rules

4. Run these commands to restart adb:

adb kill-server
adb start-server

5. Now when I run

lsusb

I can see the device listed.

6. Next I needed to enable adb access on the Kindlefire HDX device itself by going to Settings -> Device -> Enable ADB.

7. Finally I could run:

adb devices

within Ubuntu and have it recognise the Kindlefire HDX.

Android Debug Bridge failing to detect emulators under OSX

Android

I’ve been working on a project at the BBC where we are using the Android command-line tools from the Android Developer Tools, to spin up and terminate series of emulators. I noticed a big problem under OSX where ‘adb devices’ was failing to register emulators occasionally when we started them up, without any error message, even though they were loaded and quite clearly running in a window on OSX. This was a real problem for our project because we needed absolute parity between emulator process being launched and subsequently being detected by adb.

We switched to using adb with emulators in an Ubuntu 12.04 VM running under OSX, and we had no further problems with our setup. Emulators will now be programatically launched and torn down by our monitoring application. We now have an array of emulators which we can deploy to at will, which is very useful.

I don’t know what has caused this problem, my only hunch is that the Android toolkit was probably developed in a very Linux-heavy environment, and so adb on Linux was probably their first testing platform. All I can say is that Linux is much more stable than OSX, even as a VM, for Android emulation.

256 Color VIM on Crunchbang Waldorf

256 Colours in VIMTo get 256 colors working within terminator in Crunchbang Waldorf, I had to do the following:

  1. Add to ~/.bashrc
    export TERM=xterm-256color
  2. Install a 256 color VIM colorcheme, see desert256 for example.
  3. Add the following to ~/.vimrc:
    set t_Co=256
    set t_AB=^[[48;5;%dm
    set t_AF=^[[38;5;%dm
    

    ‘t_Co’ specifies exactly how many colours VIM can use. The other two lines seem to be Debian-specific color code escape sequences.

  4. If you want 256 color VIM for your root user when you sudo edit, then edit /usr/share/vim/vimrc and copy across your settings from your local ~/.vimrc and ~/.vim to this global environment.

Converting a single M2V frame into JPEG under OSX

I needed to view a single frame of a m2v file that had been encoded by our designers for playing out on TV. The file name was .mpg but in actuality it was a single .m2v frame renamed to be a .mpg. Windows Media Player classic used to display the frame fine when I opened the file normally, under Windows XP. However now I have switched to a Mac, I have found that Quicktime and VLC refused to display the single frame. I couldn’t find a video player that would open the single frame. So I resorted to the command line version of ffmpeg, which I installed via macports, to convert this single frame to a jpg file to view as normal. This line worked a treat:

ffmpeg -i north.mpg -ss 00:00:00 -t 00:00:1 -s 1024x768 -r 1 -f mjpeg north.jpg

Where ‘north.mpg’ was the m2v file, and ‘north.jpg’ was the output jpeg.

And this:

find -name *.mpg -exec ffmpeg -i {} -ss 00:00:00 -t 00:00:1 -s 1024x768 -r 1 -f mjpeg {}.jpg ;

Will go through all the mpg files in the current directory and below, and create their jpeg single frame equivalents, ie: for north.mpg it will create north.mpg.jpg.

Java 1.6 on RHEL4

After I wrote a Java application in JDK 1.6, I was stuck for a while when I realised that the target deployment machine was Red Hat Enterprise Linux 4. RHEL4 does not support Java 1.6 in its default configuration.

Luckily I found this article on the CentOS wiki which included instructions on how to install Java 1.6 on CentOS 4. Remembering that RHEL4 and CentOS 4 are almost identical, I tried the method supplied, and it worked. This is the page with the method:

http://wiki.centos.org/HowTos/JavaOnCentOS

JSoup Method for Page Scraping

Soup bowl

I’m currently in the process of writing a web scraper for the forums on Gaia Online. Previously, I used to use Python to develop web scrapers, with the very handy Python library BeautifulSoup. Java has an equivalent called JSoup.

Here I have written a class which is extended by each class in my project that wants to scrape HTML. This ‘Scraper’ class deals with the fetching of the HTML and converting it into a JSoup tree to be navigated and have the data picked out of. It advertises itself as a ‘web spider’ type of web agent and also adds a 0-7 second random wait before fetching the page to make sure it isn’t used to overload a web server. It also converts the entire page to ASCII, which may not be the best thing to do for multi-language web pages, but certainly has made the scraping of the English language site Gaia Online much easier.

Here it is:

import java.io.IOException;
import java.io.InputStream;
import java.io.StringWriter;
import java.text.Normalizer;
import java.util.Random;
import org.apache.commons.io.IOUtils;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

/**
* Generic scraper object that contains the basic methods required to fetch
* and parse HTML content. Extended by other classes that need to scrape.
*
* @author David
*/
public class Scraper {

        public String pageHTML = ""; // the HTML for the page
        public Document pageSoup; // the JSoup scraped hierachy for the page


        public String fetchPageHTML(String URL) throws IOException{

            // this makes sure we don't scrape the same page twice
            if(this.pageHTML != ""){
                return this.pageHTML;
            }

            System.getProperties().setProperty("httpclient.useragent", "spider");

            Random randomGenerator = new Random();
            int sleepTime = randomGenerator.nextInt(7000);
            try{
                Thread.sleep(sleepTime); //sleep for x milliseconds
            }catch(Exception e){
                // only fires if topic is interruped by another process, should never happen
            }

            String pageHTML = "";

            HttpClient httpclient = new DefaultHttpClient();
            HttpGet httpget = new HttpGet(URL);

                HttpResponse response = httpclient.execute(httpget);
                HttpEntity entity = response.getEntity();

                if (entity != null) {
                    InputStream instream = entity.getContent();
                    String encoding = "UTF-8";

                    StringWriter writer = new StringWriter();
                    IOUtils.copy(instream, writer, encoding);

                    pageHTML = writer.toString();
                    
                    // convert entire page scrape to ASCII-safe string
                    pageHTML = Normalizer.normalize(pageHTML, Normalizer.Form.NFD).replaceAll("[^\p{ASCII}]", "");

                }

                return pageHTML;
        }

        public Document fetchPageSoup(String pageHTML) throws FetchSoupException{
            
            // this makes sure we don't soupify the same page twice
            if(this.pageSoup != null){
                return this.pageSoup;
            }
            
            if(pageHTML.equalsIgnoreCase("")){
                throw new FetchSoupException("We have no supplied HTML to soupify.");
            }

            Document pageSoup = Jsoup.parse(pageHTML);

            return pageSoup;
        }
}

Then each class subclasses this scraper class, and adds the actual drilling down through the JSoup hierachy tree to get what is required:

...
this.pageHTML = this.fetchPageHTML(this.rootURL);
this.pageSoup = this.fetchPageSoup(this.pageHTML);

// get the first  section on the page
Element forumPageLinkSection = this.pageSoup.getElementsByAttributeValue("id","forum_hd_topic_pagelinks").first();
// get all the links in the above 
section Elements forumPageLinks = forumPageLinkSection.getElementsByAttribute("href"); ...

I’ve found that this method provides a simple and effective way of scraping pages and using the resultant JSoup tree to pick out important data.

Disabling Control-Enter and Control-B shortcut keys in Outlook 2003

At work, I still have to use Windows XP and Outlook 2003. I don’t particually mind this, except when I draft an email to someone and accidently I press Control-B instead of Control-V. Control-B will go ahead and send your partially composed email, resulting in some embarassment as you have to tell everyone to disregard it.

So I wanted to remove the ‘send email’ shortcut keys in Outlook 2003. There are two ways of doing this, one involves editing your group policy, which is something only my IT administration team can do, and I didn’t want to have to involve them. The other way is by making a change to your registry, which I will describe here.

  1. Open up regedit, and browse to the following registry key: HKEY_CURRENT_USER -> Software -> Policies -> Microsoft -> office -> 11.0 -> outlook
  2. Then create a new key called: “DisabledShortcutKeysCheckBoxes”.
  3. Under that key, create two new String Values:
    Name: CtrlB Data: 66,8
    Name: CtrlEnter Data: 13,8
  4. Then restart Outlook and those keys will be disabled.

Click on the thumbnail below to see what the finished edit should look like:

Directory names not visable under ls? Change your colours.

There is a problem I frequently encouter on Redhat/Fedora/CentOS systems with the output of the ls command. Under those distributions, the default setup is to display directories in a very dark colour. If you usually use a white foreground and a black background on your terminal client (such as Putty) then you will struggle to read the names of the directories under Redhat-based distributions.

There are two soloutions that I have used:

1. Change the colour settings in Putty

If you use Putty, ticking ‘Use System Colours’ here changes the “white foreground, black background” default into a “white background, black foreground”. This way you can at least read the console properly, good for a quick fix. You can also save these settings in putty to be the default for the host that you are connecting to, or even all hosts.

2. Change the LS_COLORS directive temporarily in the shell.

Alternatively, you can ask the ls command to display directories and other entries in colours that you specify. You could add these lines to the bottom of your .bashrc to make these changes permanent, or if you are using a shared machine, just copy and paste the following lines into the terminal and they will change the colours to a reddish more visable set, until you logout. :

alias ls='ls --color' # just to make sure we are using coloured ls
LS_COLORS='di=94:fi=0:ln=31:pi=5:so=5:bd=5:cd=5:or=31:mi=0:ex=35:*.rpm=90'
export LS_COLORS

(Original source for this particular LS_COLORS combo: http://linux-sxs.org/housekeeping/lscolors.html)

Find large files by using the OSX commandline

To quickly find large files to delete if you have filled your startup disk, enter this command on the OSX terminal:

sudo find / -size +500000 -print

This will find and print out file paths to files over 500MB. You can then go through them and delete them individually by typing rm “<file path>”, although there is no undelete so make sure you know you won’t miss them.

Finding files in Linux modified between two dates

You use the ‘touch’ command to create two blank files, with a last modified date that you specify – one with a date of the start of the range you want to specify, and the second with a date at the end of the range you want to specify. Then you reference to those two files in your find command:

touch /tmp/temp -t 200604141130
touch /tmp/ntemp -t 200604261630
find /data/ -cnewer /tmp/temp -and ! -cnewer /tmp/ntemp

Writing simple email alerts in PHP with MagpieRSS

I wrote an email alerter that sends me an email whenever the upcoming temperature may dip below freezing. It uses the Magpie RSS reader to pull down a 3 day weather forecast that is provided for my area in RSS form by the BBC weather site. It then parses this forecast and determines if either today’s or tomorrow’s weather may dip below freezing. If it might, it sends an email to my email address to warn me.

I scheduled this script to run every day by adding it as a daily cron job on my web host. You can set this up for any web hosts that support cron jobs.

items) != 3){
                $message .= 'Error: problem parsing BBC weather feed';
        }
        $i=0;
        foreach ($rss->items as $item) {
                $href = $item['link'];
                $title = $item['title'];
                preg_match('/Min Temp:.+?-*d*/',$title,$mintemp);
                preg_match('/Max Temp:.+?-*d*/',$title,$maxtemp);
                $mintemp[0] = str_replace('Min Temp: ','',$mintemp[0]);
                $maxtemp[0] = str_replace('Max Temp: ','',$maxtemp[0]);
                $mins[$i] = (int)$mintemp[0];
                $maxs[$i] = (int)$maxtemp[0];
                $i++;
        }

        // freezing warnings

        if($mins[0] < 0){
                $message .= "Today's temperature in W3 may go below freezing, anything down to ".$mins[0];
        }
        if($mins[1] 

You can right click on this link and ‘save as’ to download the script.

Converting week numbers to dates

Here is some python code I adapted from this stackoverflow post to get the first day of a week specificed by a week number. This method includes leap year and summer time differences.

import time
def weeknum(num,year):
	instr = str(year)+" "+str(num-1)+" 1"
	print time.asctime(time.strptime(instr,'%Y %W %w'))

Here is me exectuting the code in Python’s IDLE shell:

See that the first week of 2009 actually started in 2008, but by the end of that week we are in 2009.

MediaMonkey allows you to transfer music from any computer onto your guest iPhone

MediaMonkey is a popular free media player for Windows. It has a great feature that allows you to transfer to and from an iPhone that is not registered with your computer. Normally only one iTunes install can be associated with your iPhone, but MediaMonkey allows you another way to transfer music and audio files with a ‘guest’ iPhone. Check it out, it works:

MediaMonkey

Recording Game Videos on Windows 7

This is just a quick note to remind myself how I did this.

  • Hypercam2 is a good, free, video recorder that can cope with recording game videos. It’s freely available from http://www.hyperionics.com/hc/downloads.asp – just make sure when you install it you don’t tick on the spyware toolbar installation options.
  • My motherboard has a 5.1 digital soundcard built in. However the only way I can record off the soundcard is to plug in a standard audio cable from the speaker out (green) to the microphone in (orange).
  • The soundcard switches off the headphone output when it detects a speaker attached to the speaker out, so you have to go to the recording options in Windows 7 and right click on the microphone in. It will give you an option to ‘Monitor this input using the headphones’ – which will allow you to listen to anything coming into the microphone socket through the headphone socket on the front on my PC.
  • In hypercam, set the sound to record from the default input device, set the frame rate to 10/10
  • Record using the ‘select window to record from’ option, select the game window, and use the F2 button to start and stop the recording.
  • The video will be output in AVI format, but you can transcode or convert it into a quicktime MOV file for editing in iMovie, or you can use windows movie editor, which is free and quite good.

Restoring Ubuntu 10.4’s Bootloader, after a Windows 7 Install

I installed Windows 7 after I had installed Ubuntu 10.4. Windows 7 overwrote the Linux bootloader “grub” on my master boot record. Therefore I had to restore it.

I used the Ubuntu 10.4 LiveCD to start up a live version of Ubuntu. While under the LiveCD, I then restored the Grub bootloader by chrooting into my old install, using the linux command line. This is a fairly complex thing to do, and so I recommend you use this approach only if you’re are confident with the linux command line:

# (as root under Ubuntu's LiveCD)

# prepare chroot directory

mkdir /chroot
d=/chroot

# mount my linux partition

mount /dev/sda1 $d   # my linux partition was installed on my first SATA hard disk, on the first parition (hence sdA1).

# mount system directories inside the new chroot directory

mount -o bind /dev $d/dev
mount -o bind /sys $d/sys
mount -o bind /dev/shm $d/dev/shm
mount -o bind /proc $d/proc

# accomplish the chroot

chroot $d

# proceed to update the grub config file to include the option to boot into my new windows 7 install

update-grub

# install grub with the new configuration options from the config file, to the master boot record on my first hard disk

grub-install /dev/sda

# close down the liveCD instance of linux, and boot from the newly restored grub bootloader

reboot

Windows 7 Gaming on my Macbook

I have a 2006/2007 Core 2 Duo 2.6ghz white macbook, that I use regularly for internet, music, watching films, itunes and integration with my iPhone.

I wanted to turn my desktop PC into a ‘work only’ Ubuntu Linux machine, so that I don’t get distracted when I’m supposed to be doing something else.

But I still have a lot of PC games that I wanted to play on the Macbook, so I decided to try and setup a windows environment to play games on using Bootcamp 2.0 to create a dual-boot OSX/Windows 7 configuration.

It turns out it works really well. The Macbook runs Windows 7 64-bit edition fine, and although the integrated graphics card isn’t designed to run modern games very well, you can get a good gaming experience from small indie games and the older type of PC RPGs that I tend to play. My macbook got a 3.5 rating on the windows experience index for graphics, which is sufficient for many PC games.

First you need to partition your macbook’s HD using the Bootcamp assistant, in the OSX utilities section. Make sure you have your first OSX installation DVD to hand, the one that came with your Macbook. I chose to split the hard drive into two equally sized partitions. Then just place your W7 DVD in the drive, and Bootcamp takes care of the rest.

Once W7 is installed, you can access the Bootcamp menu on startup by holding down the option key. This brings up a menu where you can select to boot into OSX or Windows.

When you start W7 for the first time, you can install the windows driver set for your Macbook that Bootcamp provides you. Insert your OSX installation DVD 1, and run the setup.exe that is located in the Bootcamp folder. This will install native windows drivers for your Macbook hardware.

The only change I needed to make for my macbook, was to install the latest 64bit Realtek drivers for Vista/Windows 7, which are located on the Realtek website. This will fix any sound problems you might have while playing games.

Now don’t expect to run the latest 3D games, but if you’re happy enough with slightly older, classic, indie or retro games, you can get a good gaming experience on Windows 7 from your macbook. It does well with plenty of the indie games available on Value’s Steam distribution network.

WordPress HTML edit mode inserts BR tags sometimes when you add a carriage return..

This is something that was quite annoying today, as I was struggling to use WordPress 2.9.2 to align some pictures in the HTML mode of editing a page, on a client’s website.

It turns out that WordPress was adding BR tags sometimes when I hit return.. and sometimes not. The annoying thing was, although the BRs were outputted in the resultant WordPress site, the BRs were not visible in the WordPress HTML edit mode itself.. meaning they were invisible and undetectable until I viewed the resultant website source and finally figured it out.

WordPress does insert some formatting tags now and then, it seems, but I would have thought it would tell you about the tags that would change the page layout! Apparently not. Anyway, something to be aware of for WordPress gurus..

Edit:

I don’t have time to report this as a bug, but this is the stack I’m using for anyone interested:

Browser: Google Chrome for Mac (5.0.342.9 beta)
TinyMCE Advanced Editor Plugin for WP (3.2.7)
Wordpress 2.9.2

The beta of Google Chrome is a bit unstable, although it may not be the source of the problem.

Forkbombs and How to Prevent Them

A forkbomb is a program or script that continually creates new copies of itself, that create new copies of themselves. It’s usually a function that calls itself, and each time that function is called, it creates a new process to run the same function.

You end up with thousands of processes, all creating processes themselves, with an exponential growth. Soon it takes up all the resources of your server, and prevents anything else running on it.

Forkbombs are an example of a denial of service attack, because it completely locks up the server it’s run on.

More worryingly, on a lot of Linux distributions, you can run a forkbomb as any user that has an account on that server. So for example, if you give your friend an account on your server, he can crash it/lock it up whenever he wants to, with the following shell script forkbomb:

:(){ :|:& };:

Bad, huh?

Ubuntu server 9.10 is vulnerable to this shell script forkbomb. Run it on your linux server as any user, and it will lock it up.

This is something I wanted to fix right away on all my linux servers. Linux is meant to be multiuser, and it has a secure and structured permissions system allowing dozens of users to log in and do their work, at the same time. However when any one user can lock up the entire server, this is not good for a multiuser environment.

Fortunately, fixing this on ubuntu server 9.10 is quite simple. You limit the maximum number of running processes that any user can create. So the fork bomb runs, but hits this ceiling, and eventually stops without the administrator having to do anything.

As root, edit this file, and add the following line:

/etc/security/limits.conf

...
...
*               soft    nproc   35

This sets the maximum process cap for all users, to be 35. The root user isn’t affected by this limit. This limit of 35 should be fine for remote servers that are not offering users gnome, kde, or any other graphical X interface. If you are expecting your users to be able to run applications like that, you may want to increase the limit to 50, and although this will increase the time forkbombs will take to exit, they should still exit without locking up your server.

Alternatively, you can setup an ‘untrusted’ and ‘trusted’ user groups, and assign that 35 limit to the untrusted users, giving trusted users access to the trusted group, which does not have that limit. Use these lines:

/etc/security/limits.conf

...
...
@untrusted               soft    nproc   35
@trusted               soft    nproc   50

I’ve tested these nproc limits on 8.10 and 9.10 ubuntu-server installs, but you should really test your own servers install, if possible, by forkbombing it yourself as a standard user, using the bash forkbomb above, once you’ve applied the fix. The fix is effective as soon as you’ve edited that file, but please note that you have to logout, and log back in again as a standard user before the new process cap is applied to your user account.

How to remove nano, vim and other editors’ backup files out of a directory tree

gardening for science..

Linux command-line editors such as nano and vim often, by default, create backup files with the prefix of “~”. I.e, if I created a file called /home/david/myfile, then nano would create a backup in /home/david/myfile~. Sometimes it doesn’t delete them either, so you’re left with a bunch of backup files all over the place, especially if you’re editing a lot on a directory tree full of source code.

Those stray backup files make directory listings confusing, and also add unnecessary weight to the commits on source control systems such as svn, cvs, git.. etc. If you’re working on a programming team with other people, then it causes further problems and confusion, because person A’s editor can accidentally load person B’s backup file.. etc etc. Nightmare.

So instruct your editor, or the programming team you’re working with, not to drop these backup files. You can configure most editors to change the place where the editor drops its backup files, so you could store all your backup files in a subdirectory of your home directory, for example, if needed. However I always set my editors not to leave backup files about.

Once you know that new backup files will not be created, view the current list of backup files, along with the user that created them.. so you know who’s been creating the backup files and when, etc:

find . -name '*~' -type f -exec ls -al {}  ;

Then archive the stray backup files, with this command:

find . -name '*~' -type f -exec mv -i {} ./archived-backups ;

That will find all backup files in the current directory and below, and move them all to a subdirectory in the current directory called ‘archived-backups’. This is a fairly safe find/exec command, because with the -i switch, mv will not ‘clobber’. This means If you have two backup files, one in /opt/code/index~ and one in /opt/code/bla/bla/index~, they will not ‘clobber’, or overwrite each other automatically when moved into the new directory. You will be informed of any conflicts present so you can resolve them yourself.

However in practice I usually omit the ‘-i’ switch and let them clobber each other, because I usually end up deleting the ./archived-backups/ directory very quickly after that anyway.

Changing the default “From:” email address for emails sent via PHP on Linux

I’ve had to solve this problem a couple of times at least, and it’s quite a common task, so I thought I’d document it here.

When you send emails to users of your site through using the PHP mail() function, they will sometimes turn up in the mailbox of customers of your site with the following from address:

From: Root <root@apache.ecommercecompany.com>

This makes absolutely no sense to your customers, and often they will think it is spam and delete it. Often, the decision will be made for them by their web mail host, such as hotmail.com or googlemail.com, and they will never even see the email. You don’t want this to happen.

Writing email templates that appear “trustworthy” and have a low chance of being mislabled as spam by the webmail companies, is quite a difficult task, and there’s quite a bit to know about it. However it is quite easy to change the default “From:” email address that PHP sends your emails on as, and that will definitely help.

Assuming you’re running a linux server using sendmail, all you have to do is this.

First create an email address that you would want the customers to see, through editing the /etc/aliases files and running the command newaliases. I created an email address called customer-emails@ecommercecompany.com.

Then change the following sendmail_path line in your php.ini file to something like this:

/etc/php.ini
...
sendmail_path = /usr/sbin/sendmail -t -i -F 'customer-emails' -f 'Customer Emails <customer-emails@ecommercecompany.com>'
...

Broken down, those extra options are:
-F 'customer-emails' # the name of the sender
-f 'Customer Emails <customer-emails@ecommercecompany.com>' # the email From header, which should have the name matching the email address, and it should be the same email address as above

Then restart apache, and it should load the php.ini file changes. Test it by sending a couple of emails to your email address, and you should see emails sent out like this:

From: Customer Emails <customer-emails@ecommercecompany.com>

Shell scripts for converting between Unix and Windows text file formats

I’ve been using these shell scripts I wrote to convert between unix and windows text file formats. They seem to work well without any problems. If you put them in the /usr/sbin/ directory, they will be accessible on the path of the linux admin account root.

/usr/sbin/unix2win
#!/bin/bash
# Converts a unix text file to a windows text file.
# usage: unix2win <text file to convert>
# requirements: sed version 4.2 or later, check with sed --version
sed -i -e 's/$/r/' $1

/usr/sbin/win2unix
#!/bin/bash
# Converts a windows text file to a unix text file.
# usage: win2unix <text file to convert>
cat $1 | tr -d '15' | tee $1 >/dev/null

I use these scripts with the combination of find and xargs to convert lots of log files into windows format with the following command. However this type of command can be dangerous, so don’t use it if you don’t know what you’re doing:

find sync-logs/ -name '*.log' -type f | xargs -n1 unix2win

PHP Sample – HTML Page Fetcher and Parser

Back in 2008, I wrote a PHP class that fetched an arbitary URL, parsed it, and coverted it into an PHP object with different attributes for the different elements of the page. I recently updated it and sent it along to a company that wanted a programming example to show I could code in PHP.

I thought someone may well find a use for it – I’ve used the class in several different web scraping applications, and I found it handy. From the readme:

This is a class I wrote back in 2008 to help me pull down and parse HTML pages I updated it on
14/01/10 to print the results in a nicer way to the commandline.

- David Craddock (contact@davidcraddock.net)

/// WHAT IT DOES

It uses CURL to pull down a page from a URL, and sorts it into a 'Page' object
which has different attributes for the different HTML properties of the page
structure. By default it will also print the page object's properties neatly
onto the commandline as part of its unit test.

/// FILES

* README.txt - this file
* page.php - The PHP Class
* LIB_http.php - a lightweight external library that I used. It is just a very light wrapper around CURL's HTTP functions.
* expected-result.txt - output of the unit tests on my development machine
* curl-cookie-jar.txt - this file will be created when you run the page.php's unit test

/// SETUP

You will need CURL installed, PHP's DOMXPATH functions available, and the PHP 
command line interface. It was tested on PHP5 on OSX.

/// RUNNING

Use the php commandline executable to run the page.php unit tests. IE:
$ php page.php

You should see a bunch of information being printed out, you can use:
$ php page.php > result.txt

That will output the info to result.txt so you can read it at will.

Here’s an example of one of the unit tests, which fetches this frontpage and parses it:

**++++
*** Page Print of http://www.davidcraddock.net ***
**++++


** Transfer Status
+ URL Retrieved:
http://www.davidcraddock.net
+ CURL Fetch Status:
Array
(
    [url] => http://www.davidcraddock.net
    [content_type] => text/html; charset=UTF-8
    [http_code] => 200
    [header_size] => 237
    [request_size] => 175
    [filetime] => -1
    [ssl_verify_result] => 0
    [redirect_count] => 0
    [total_time] => 1.490972
    [namelookup_time] => 5.3E-5
    [connect_time] => 0.175803
    [pretransfer_time] => 0.175812
    [size_upload] => 0
    [size_download] => 30416
    [speed_download] => 20400
    [speed_upload] => 0
    [download_content_length] => 30416
    [upload_content_length] => 0
    [starttransfer_time] => 0.714943
    [redirect_time] => 0
)

** Header
+ Title: Random Eye Movement  
+ Meta Desc:
Not Set
+ Meta Keywords:
Not Set
+ Meta Robots:
Not Set
** Flags
+ Has Frames?:
FALSE
+ Has body content been parsed?:
TRUE

** Non Html Tags
+ Tags scanned for:
Tag Type: script tags processed: 4
Tag Type: embed tags processed: 1
Tag Type: style tags processed: 0


+ Tag contents:
Array
(
    [ script ] => Array
        (
            [0] => Array
                (
                    [src] => http://www.davidcraddock.net/wp-content/themes/this-just-in/js/ThemeJS.js
                    [type] => 
                    [isinline] => 
                    [content] => 
                )

            [1] => Array
                (
                    [src] => http://www.davidcraddock.net/wp-content/plugins/lifestream/lifestream.js
                    [type] => text/javascript
                    [isinline] => 
                    [content] => 
                )

            [2] => Array
                (
                    [src] => 
                    [type] => 
                    [isinline] => 1
                    [content] => 
                 var odesk_widgets_width = 340;
                var odesk_widgets_height = 230;
                
                )

            [3] => Array
                (
                    [src] => http://www.odesk.com/widgets/v1/providers/large/~~8f250a5e32c8d3fa.js
                    [type] => 
                    [isinline] => 
                    [content] => 
                )

            [count] => 4
        )

    [ embed ] => Array
        (
            [0] => Array
                (
                    [src] => http://www.youtube-nocookie.com/v/Fpm0m6bVfrM&hl=en&fs=1&rel=0
                    [type] => application/x-shockwave-flash
                    [isinline] => 
                    [content] => 
                )

            [count] => 1
        )

    [ style ] => Array
        (
            [count] => 0
        )

)




**----
*** Page Print of http://www.davidcraddock.net Finished ***
**----

If you want to download a copy, the file is below. If you find it useful for you, a pingback would be appreciated.
code-sample.tar.gz

Config files for the Windows version of VIM

Today I encountered problems configuring the windows version of the popular text editor VIM, so I thought I’d write up a quick post talk about configuration files under the Windows version, if anyone becomes stuck like I did. I use Linux, OSX and Windows on a day-to-day basis, and VIM as a text editor for a lot of quick edits on all three platforms. Here’s a quick comparison:

Linux

Linux is easy because that’s what most people who use VIM run, and so it is very well tested.

~/.vimrc – Configuration file for command line vim.
~/.gvimrc – Configuration file for gui vim.

OSX

OSX is simple also, as it’s based on unix:

~/.vimrc – Configuration file for command line vim.
~/.gvimrc – Configuration file for gui vim.

Windows

Windows is not easy at all.. it doesn’t have a unix file structure, and doesn’t have support for the unix hidden file names, that start with a ‘.’, ie: ‘.vimrc’, ‘.bashrc’, and so on. Most open-source programs like VIM that require these hidden configuration files, and have been ported over to windows, seem to adopt this naming convention: ‘_vimrc’, ‘_bashrc’.. and so forth. So:

_vimrc – Configuration file for command line vim.
_gvimrc – Configuration file for gui vim.

Renaming configuration files from “.” to “_” wouldn’t make much difference on its own. You’d have to rename your files, but.. big deal. It’s not much of a problem.

Another, more tricky, problem you may encounter however, is that there’s no clear home directory on windows systems. Each major incarnation of windows seems to have a slightly different way of dealing with user’s files.. from 2000 to XP, a change, from XP to Vista, there is a change. I haven’t tried VIM on W7 yet, but it seems similar to Vista in structure, so this information may actually be consistent to W7.

The Vista 64 version of VIM I have, looks in another place for configuration files. For a global configuration file, it looks in “C:Program Files”. Yes.. “C:Program Files”. According to Vista 64’s version of VIM.. that’s the exact directory where I installed VIM. This is clearly not right. What’s happening is that the file system on windows is different to the unix-type file systems, and the VIM port is having problems adapting. The real VIM install directory is C:Program Filesvim72. Because VIM is looking for a global configuration file in “C:Program Files_vimrc”, it’ll never find it.

Now you could override this with a batch file that sets the right environmental variables on startup, or you could change the environmental variables exported in windows, but I prefer to have a user-specified configuration file in my personal files directory, as it’s easier to backup and manage. If you wanted to specify the environmental variables yourself, which I’m guessing many will, the two environmental variables to override are:

$VIM = the VIM install directory, not always set properly, as I mentioned.
$HOME = the logged in user’s documents and settings directory, in windows speak this is also where the ‘user profile’ is stored, which is a collection of settings and configurations for the user. The exact directory will depend on which version of Windows you’re running, and if you override the HOME folder, you may have problems with other programs that rely on it being static.

On my Windows Vista 64 install:

$VIM = “C:Program Files”
$HOME = “C:UsersDave”

You can see what files VIM includes by running the handy command

vim -V

at a command prompt; it will go through the different settings and output something similar to this:

Searching for "C:UsersDave/vimfilesfiletype.vim"
Searching for "C:Program Files/vimfilesfiletype.vim"
Searching for "C:Program Filesvim72filetype.vim"
line 49: sourcing "C:Program Filesvim72filetype.vim"
finished sourcing C:Program Filesvim72filetype.vim
continuing in C:UsersDave_vimrc
Searching for "C:Program Files/vimfiles/afterfiletype.vim"
Searching for "C:UsersDave/vimfiles/afterfiletype.vim"
Searching for "ftplugin.vim" in "C:UsersDave/vimfiles,C:Program Files/vimfiles,C:Program Filesvim72,C:Program Files/vimfiles/after,C:UsersDave/vimfiles/after"
Searching for "C:UsersDave/vimfilesftplugin.vim"
Searching for "C:Program Files/vimfilesftplugin.vim"
Searching for "C:Program Filesvim72ftplugin.vim"
line 49: sourcing "C:Program Filesvim72ftplugin.vim"
finished sourcing C:Program Filesvim72ftplugin.vim
continuing in C:UsersDave_vimrc
Searching for "C:Program Files/vimfiles/afterftplugin.vim"
Searching for "C:UsersDave/vimfiles/afterftplugin.vim"
finished sourcing $HOME_vimrc
Searching for "plugin/**/*.vim" in "C:UsersDave/vimfiles,C:Program Files/vimfiles,C:Program Filesvim72,C:Program Files/vimfiles/after,C:UsersDave/vimfiles/after"
Searching for "C:UsersDave/vimfilesplugin/**/*.vim"
Searching for "C:Program Files/vimfilesplugin/**/*.vim"
Searching for "C:Program Filesvim72plugin/**/*.vim"
sourcing "C:Program Filesvim72plugingetscriptPlugin.vim"
finished sourcing C:Program Filesvim72plugingetscriptPlugin.vim
sourcing "C:Program Filesvim72plugingzip.vim"
finished sourcing C:Program Filesvim72plugingzip.vim
sourcing "C:Program Filesvim72pluginmatchparen.vim"
finished sourcing C:Program Filesvim72pluginmatchparen.vim
sourcing "C:Program Filesvim72pluginnetrwPlugin.vim"
finished sourcing C:Program Filesvim72pluginnetrwPlugin.vim
sourcing "C:Program Filesvim72pluginrrhelper.vim"
finished sourcing C:Program Filesvim72pluginrrhelper.vim
sourcing "C:Program Filesvim72pluginspellfile.vim"
finished sourcing C:Program Filesvim72pluginspellfile.vim
sourcing "C:Program Filesvim72plugintarPlugin.vim"
finished sourcing C:Program Filesvim72plugintarPlugin.vim
sourcing "C:Program Filesvim72plugintohtml.vim"
finished sourcing C:Program Filesvim72plugintohtml.vim
sourcing "C:Program Filesvim72pluginvimballPlugin.vim"
finished sourcing C:Program Filesvim72pluginvimballPlugin.vim
sourcing "C:Program Filesvim72pluginzipPlugin.vim"
finished sourcing C:Program Filesvim72pluginzipPlugin.vim
Searching for "C:Program Files/vimfiles/afterplugin/**/*.vim"
Searching for "C:UsersDave/vimfiles/afterplugin/**/*.vim"
Reading viminfo file "C:UsersDave_viminfo" info
Press ENTER or type command to continue

Notice how it does pull in all the syntax highlighting macros and other extension files correctly, which are specified in the .vim files above.. but it doesn’t pull in the global configuration files that I’ve copied also to C:Program Filesvim72_gvimrc and C:Program Filesvim72_vimrc. However, it does pickup the files I copied to C:UsersDave.. both the C:UsersDave_vimrc and C:UsersDave_gvimrc are picked up, although VIM will normally read ‘_gvimrc’ when the gui version of VIM is run (called gvim).

To see exactly what those environmental variables are being set to, when you’re inside the editor, issue these two commands, and their values will be show in the editor:

:echo $HOME
:echo $VIM

It seems to make sense for me – and perhaps you, if you’re working with VIM on windows – to place my _vimrc and _gvimrc files configuration files in $HOME in Vista. They are then picked up without having to worry about explicitly defining any environmental variables, creating a batch file, or any other hassle.

You can do this easily by the following two commands:

:ed $HOME_vimrc
:sp $HOME_gvimrc

That will open the two new configuration files, side by side, and you can paste in your existing configuration that you’ve used in Linux, and windows will pick them up the next time you start VIM.

Bacula Scheduling

Bacula Logo

Bacula is a great open-source distributed backup program for Linux/UNIX systems. It is separated into three main components:

  • One ‘Director’ – which sends messages to the other components and co-ordinates the backup
  • One or more ‘File Demons’ – which ‘pull’ the data from the host they are installed from.
  • One or more ‘Storage Demons’ – which ‘push’ the data taken from the file demons into a type of archival storage, IE: backup tapes, a backup hard disc, etc

I found it extremely versatile yet very complicated to configure. Before you configure it you have to decide on a backup strategy; what you want to backup, why you want to back it up, how often you want to back it up, and how you are going to off-site/preserve the backups.

I can’t cover everything about Bacula here, so I thought I’d concentrate on scheduling. You will need to understand quite a lot about the basics of Bacula before you’ll be able to understand scheduling, so I recommend reading up on the basics first.

I had the most problems with the scheduling. In the end I chose to adopt this schedule:

  • Monthly Full backup, with a retention period of 6 months, and a maximum number of backup volumes of 6.
  • Weekly Differential backup against the monthly full backup, with a retention period of 1 month, and a maximum number of backup volumes of 4.
  • Daily Incremental backup against the differential backup, with a retention period of 2 weeks, and a maximum number of backup volumes of 14.

This means that there will always be 2 weeks of incremental backups to fall back on which depend on a weekly differential which depends on a monthly full. This strategy aims to save as much space as possible – there is no redundancy. This means that if a backup fails, especially a monthly or weekly, it will have to be re-run immediately.

The backup volumes will cycle using this method; they will be reused once the maxium volume limits are hit. Also, if you run a backup job from the console, it will revert to the ‘Default’ pool, so you will have to explicitly define either the daily incremental, weekly differential or the monthly full pools.

Here is my director configuration:

Job {
  Name = "Backup def"
  Type = Backup
  Client = localhost-fd
  FileSet = "Full Set"
  Storage = localhost-sd
  Schedule = "BackupCycle"
  Messages = Standard
  Pool = Default
  Full Backup Pool = Full-Pool
  Incremental Backup Pool = Inc-Pool
  Differential Backup Pool = Diff-Pool
  Write Bootstrap = "/var/bacula/working/Client1.bsr"
  Priority = 10
}
Schedule {
  Name = "BackupCycle"
  Run = Level=Full Pool=Full-Pool 1st mon at 1:05
  Run = Level=Differential Pool=Diff-Pool mon at 1:05
  Run = Level=Incremental Pool=Inc-Pool mon-sun at 1:05
}
# This is the default backup stanza, which always gets overridden by one of the other Pools, except when a manual backup is performed via the console.
Pool {
  Name = Default
  Pool Type = Backup
  Recycle = yes                     # Bacula can automatically recycle Volumes
  AutoPrune = yes                   # Prune expired volumes
  Volume Retention = 1 week         # one week
}
Pool {
  Name = Full-Pool
  Pool Type = Backup
  Recycle = yes           # automatically recycle Volumes
  AutoPrune = yes         # Prune expired volumes
  Volume Retention = 6 months
  Maximum Volume Jobs = 1
  Label Format = "Monthly-Full-${Year}-${Month:p/2/0/r}-${Day:p/2/0/r}-${Hour:p/2/0/r}-${Minute:p/2/0/r}" 
  Maximum Volumes = 6
}
Pool {
  Name = Inc-Pool
  Pool Type = Backup
  Recycle = yes           # automatically recycle Volumes
  AutoPrune = yes         # Prune expired volumes
  Volume Retention = 1 month
  Maximum Volume Jobs = 1
  Label Format = "Weekly-Inc-${Year}-${Month:p/2/0/r}-${Day:p/2/0/r}-${Hour:p/2/0/r}-${Minute:p/2/0/r}" 
  Maximum Volumes = 4
}
Pool {
  Name = Diff-Pool
  Pool Type = Backup
  Recycle = yes
  AutoPrune = yes
  Volume Retention = 14 days
  Maximum Volume Jobs = 1
  Label Format = "Daily-Diff-${Year}-${Month:p/2/0/r}-${Day:p/2/0/r}-${Hour:p/2/0/r}-${Minute:p/2/0/r}" 
  Maximum Volumes = 7
}

I also recommend the excellent O’Reilly book – “Backup and Recovery” which comprehensively covers backups, and has a chapter on Bacula.

Automated Emails on Commiting to a Subversion Repository Using Python

At work I’ve written a couple of scripts that send out emails to the appropriate project team when someone checks in a commit to the project subversion repository. Here are the details.

Firstly, you will need a subversion hook setup on post-commit. The post-commit hook needs to be located in SVNROOT/YOURPROJECT/hooks where YOURPROJECT is your svn project name, and SVNROOT is the root directory where you are storing the data files for your subversion repository.

Substitute projectmember1,projectmember2 etc.. in the post-commit script below for the email addresses of the people to be notified when someone commits a change to the project.

#!/bin/sh

# POST-COMMIT HOOK
#
# The post-commit hook is invoked after a commit.  Subversion runs
# this hook by invoking a program (script, executable, binary, etc.)
# named 'post-commit' (for which this file is a template) with the
# following ordered arguments:
#
#   [1] REPOS-PATH   (the path to this repository)
#   [2] REV          (the number of the revision just committed)
#
# The default working directory for the invocation is undefined, so
# the program should set one explicitly if it cares.
#
# Because the commit has already completed and cannot be undone,
# the exit code of the hook program is ignored.  The hook program
# can use the 'svnlook' utility to help it examine the
# newly-committed tree.
#
# On a Unix system, the normal procedure is to have 'post-commit'
# invoke other programs to do the real work, though it may do the
# work itself too.
#
# Note that 'post-commit' must be executable by the user(s) who will
# invoke it (typically the user httpd runs as), and that user must
# have filesystem-level permission to access the repository.
#
# On a Windows system, you should name the hook program
# 'post-commit.bat' or 'post-commit.exe',
# but the basic idea is the same.
#
# The hook program typically does not inherit the environment of
# its parent process.  For example, a common problem is for the
# PATH environment variable to not be set to its usual value, so
# that subprograms fail to launch unless invoked via absolute path.
# If you're having unexpected problems with a hook program, the
# culprit may be unusual (or missing) environment variables.
#
# Here is an example hook script, for a Unix /bin/sh interpreter.
# For more examples and pre-written hooks, see those in
# the Subversion repository at
# http://svn.collab.net/repos/svn/trunk/tools/hook-scripts/ and
# http://svn.collab.net/repos/svn/trunk/contrib/hook-scripts/

REPOS="$1"
REV="$2"
EMAILS="projectmember1email,projectmember2email,projectmember3email,projectmember4email"
LOG="svnlook log $REPOS -r$REV"
/usr/bin/python /usr/local/scripts/svncommitemail.py $EMAILS $REPOS `whoami` $REV "`$LOG`" > /tmp/svncommitemail.log

Secondly you will need this python script:

Edit the ‘fromaddr’ variable to equal your configuration manager’s email address (probably your own!).

#!/usr/bin/env python

import smtplib
import sys
import getopt
import time
import datetime
import string

def sendMail(subject, body, TO, FROM):
    HOST = "localhost"
    BODY = string.join((
        "From: %s" % FROM,
        "To: %s" % TO,
        "Subject: %s" % subject,
        "",
        body
        ), "rn")
    server = smtplib.SMTP(HOST)
    server.set_debuglevel(1)
    server.sendmail(FROM, [TO], BODY)
    server.quit()

def send(alias,rev,username,repo,changelog):

        today = datetime.date.today()
        fromaddr = 'Configuration.Management@YOURCOMPANY.COM'
        subject = """Subversion repository """+repo+""" changed by """+username+""" on """+str(today)

        aliases = alias.split(',')
        for alias in aliases:
                body = """
Hello,

        This is an automated email to let you know that subversion user: '"""+username+"""' has updated repository """+repo+""" to version """+rev+""". The changelog (might be empty) is recorded as:

"""+changelog+"""

Please contact subversion user: '"""+username+"""' in the first instance if you have any questions about this commit.

Thanks,
Configuration Management
        """
                sendMail(subject,body,alias,fromaddr)


argv = sys.argv
argc = len(sys.argv)
if argc == 6:
        alias = argv[1]
        repo = argv[2]
        username = argv[3]
        rev = argv[4]
        changelog = argv[5]
        send(alias,rev,username,repo,changelog)
else:
        print "Usage: "+argv[0]+"     "

Now once you have this all in place, test it by creating a a test file in the repository, and commiting it. If you issue a “tail -f /tmp/svncommitemail.log” on the box where your subversion project repository is located, you should be able to see what happens when people commit to the project repository.

If it is setup correctly, you will see emails being fired off to all interested parties with information about the svn commit.

Character encoding fix with PHP, MySQL 5 and ubuntu-server

For some reason, under ubuntu-server, my default MySQL 5 character encoding was latin1. This caused no end of problems with grabbing data from the web, which was not necessarily in latin1 characterset.

If you are ever in this situation, I suggest you handle everything as UTF-8. That means setting the following lines in my.cnf:

[mysqld]
..
default-character-set=utf8
skip-character-set-client-handshake

If you already have tables in your database that you have created, and they have defaulted to the latin1 charset, you’ll be able to tell by looking at the mysqldump SQL:

DROP TABLE IF EXISTS `ARTISTS`;
CREATE TABLE `ARTISTS` (
.. some col declarations..
) ENGINE=MyISAM AUTO_INCREMENT=4519 DEFAULT CHARSET=latin1;

See here this artists table has been set to default charset of latin1 by mysql. This is bad. So what I recommend is:

1. Dump the full database structure + data to a file using mysqldump
2. Substitute ‘latin1’ for ‘utf8’ universally on that file using your favourite text editor
3. Import the resultant file into mysql using the mysql -uroot -p -Dyourdb < dump.sql method

Then everything will be in utf8, and your character encoding issues will be solved 🙂

NVIDIA GeForce4 MX 420 under Ubuntu Dapper Drake

My GeForce4 MX 420 didn’t work properly with OpenGL under a fresh install of Dapper Drake. Fixing it, however, proved to be really easy:

1) Install the nvidia package:

$ apt-get install nvidia

2) Edit /etc/X11/xorg.conf and replace the ‘driver’ where it says ‘nv’ with “nvidia”:

Section “Device”
Identifier “NVIDIA Corporation NV17 [GeForce4 MX 420]”
#Driver “nv”
Driver “nvidia”
BusID “PCI:1:0:0”
EndSection

3) Kill X with control-alt-backspace

4) Login again and test if it works by running glxgears and glxinfo on the console