Saturday, 19 June 2010

Synchronize your Foobar2000 playlists with your Android device (or any other MP3 player)

I recently decided I wanted to synchronize some of my music with my Nexus One. I quickly discovered that the Nexus One didn't support WMA audio files. I also discovered there weren't any out-of-the-box solutions for syncing audio files and playlists with the flexibility I required.

So I decided to write my own between two World Cup matches. I felt the script might be useful to other people and I couldn't find a suitable place to put it. So I am dumping it here.

You will need the following software installed to use this script.

Foobar2000
Com Automation Server for Foobar2000
Python 2.6.*
Win32 Extensions for Python
FFmpeg

All my music files are currently in WMA format. This will change in the future but until then the script converts the files into MP3 using FFmpeg. This can be changed if you need by adjusting the function "convert_file".

Copy and paste the code below into a file called syncplayer.py. You may need to edit some of the settings at the top of the file. If you have Python files correctly associated on your PC then you should be able to doubleclick the file and the sync will run whilst displaying a log on the console.

Enjoy!


"""
SyncPlayer.py - A Python script for synchronizing music files between
Foobar2000 and an MP3 device.

Copyright (C) 2010 Blair Sutton

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see .
"""

########################
#### SETTINGS BELOW ####
########################

# this is an array of foobar2000 playlist names you want synched
#
playlists = [ "NexusOne", "Driving Music" ]

# this is the path to your android/mp3 player music folder once mounted.
# the converted files will be placed here.
#
destination_root = r"f:\Music"

# this is the path to your android/mp3 player playlist folder. m3u files will
# place here.
#
playlist_root = r"f:\Music\Playlists"

# this is your target conversion format.
#
destination_ext = ".mp3"

# this is how many path levels including your source audio file are synced to the
# device. i.e. artist/album/audio file
#
path_depth = 3

# change these paths to reflect where your converters are installed
#
ffmpeg_path = r"C:\Program Files\FFmpeg-0.6-svn-23607\bin\ffmpeg.exe"

####################
#### CODE BELOW ####
####################

tag_list = ["Author","Title","UserRating","UserServiceRating","WM/AlbumArtist","WM/AlbumTitle",
"WM/Category","WM/Composer","WM/Conductor","WM/ContentDistributor","WM/ContentGroupDescription",
"WM/EncodingTime","WM/Genre","WM/GenreID","WM/InitialKey","WM/Language","WM/Lyrics","WM/MCDI",
"WM/MediaClassPrimaryID","WM/MediaClassSecondaryID","WM/Mood","WM/ParentalRating","WM/Period",
"WM/ProtectionType","WM/Provider","WM/ProviderRating","WM/ProviderStyle","WM/Publisher",
"WM/SubscriptionContentID","WM/SubTitle","WM/TrackNumber","WM/UniqueFileIdentifier",
"WM/WMCollectionGroupID","WM/WMCollectionID","WM/WMContentID","WM/Writer","WM/Year"]

from win32com.client.gencache import EnsureDispatch
from os.path import basename, splitext, exists
from os import sep, makedirs
from subprocess import call
from urlparse import urlparse
from logging import StreamHandler, Formatter, DEBUG, INFO, getLogger

log = getLogger("Foobar.MP3PlayerSync")
log.setLevel(DEBUG)
lh = StreamHandler()
lh.setFormatter(Formatter("%(levelname)s|%(asctime)s|%(filename)s:%(lineno)d|%(message)s"))
log.addHandler(lh)

log.info("Connecting to foobar2000 COM server...")

prog_id = "Foobar2000.Application.0.7"
fb2k = EnsureDispatch(prog_id)

def main():
fb2k_playlists = [ i for i in fb2k.Playlists if i.Name in playlists ]
if fb2k_playlists:
for pl in fb2k_playlists:
sync_playlist(pl)
log.info("Completed Sync")

def sync_playlist(sync_playlist):
log.info("Syncing playlist '%s'..." % sync_playlist.Name)
tracks = sync_playlist.GetTracks()

m3u_lines = ["#EXTM3U"]
for t in tracks:
m3u_lines.append(t.FormatTitle("#EXTINF:%length_seconds%, %artist% - %title%"))
source_path = urlparse(t.Path).netloc
m3u_lines.append(sync_file(source_path))

create_m3u(sync_playlist.Name, m3u_lines)

def sync_file(source_path):
parts_all = source_path.split(sep)
parts = parts_all[-path_depth:]

filenameext = parts[path_depth-1]
(filename, ext) = splitext(filenameext)

parts_new_path = [destination_root]
parts_new_path.extend(parts[0:path_depth-1])
destination_folder = sep.join(parts_new_path)
parts_new_path.append(filename + destination_ext)
destination_path = sep.join(parts_new_path)

if not exists(destination_folder):
log.info("Creating folder: '%s'..." % destination_folder)
makedirs(destination_folder)

if not exists(destination_path):
convert_file(source_path, destination_path)

return destination_path

# For my purposes I needed to convert my WMA files (without DRM!) to MP3 so
# my Nexus could understand them. You could skip this bit if your source and
# destination codecs are the same on your devices.
#
def convert_file(input_file, output_file):
log.info("Synching: '%s' -> '%s'" % (input_file, output_file))
command = """"%s" -i "%s" -ac 2 -ar 22050 -ab 192k "%s" """ % (ffmpeg_path, input_file, output_file)
log.debug("Converter command line: '%s'" % command)
try:
retcode = call(command, shell=False)
except OSError, e:
log.critical("Converter execution failed: ", e)

log.info("Copying media tags over..")
wmp = EnsureDispatch("WMPlayer.OCX")
if_m = wmp.newMedia(input_file)
of_m = wmp.newMedia(output_file)
for tag in tag_list:
if not of_m.isReadOnlyItem(tag) and if_m.getItemInfo(tag) != "":
of_m.setItemInfo(tag, if_m.getItemInfo(tag))


def create_m3u(playlist_name, m3u_lines):
if not exists(playlist_root):
log.info("Creating folder: '%s'..." % playlist_root)
makedirs(playlist_root)

m3u_path = "%s\\%s.m3u" % (playlist_root, playlist_name)
log.info("Creating m3u playlist: '%s'..." % m3u_path)
f = open(m3u_path, "w+")
f.write("\n".join(m3u_lines))
f.close()


if __name__ == "__main__":
main()

Friday, 5 September 2008

Powershell Performance

Recently, I have grown fond of Powershell. As someone who is also responsible for a little administration from time to time it quickly caught my eye as a language that could solve many mundane problems quickly and succinctly. Having originally come from a UNIX background I could see the Powershell Development Team had taken the best features from Korn Shell and Perl then combined them with the .NET framework to provide a very powerful tool.

However, it's not all a bed of roses and this will become clear as you read on.

I have recently been analysing various files containing financial tick data. Typically there are around two million lines in a file and each contains a comma delimited string with a date, time, price and amount traded of a particular stock. For this analysis I needed to extract a single column from this file and save it in a new file. The new file actually being used as input for GNU Octave.

The Problem

This task is typical and can easily be done with a tool such as Perl or Awk. However, since I am trying to use more Powershell recently, I felt obliged to see how well it could do the job.

To start, I created a sample data file, testdata.csv, by using the following: -

PS> Set-Content -Path testdata.csv -Value ("1/1/2008, 09:00:00, 100, 1`n" * 2000000)

What I need to extract is the third column. In each case, I used Measure-Command to also measure script execution time: -

PS> Get-Content testdata.csv | % { ($_.Split(","))[2].Trim(" ") } > testdata.out
TotalSeconds : 1896.8136657

Clearly, I kept busy whilst the command ran but it was initially quite a surprise. One should ask how much of this time is spent on simply sending the data down Powershell's object pipeline?

PS> Get-Content testdata.csv > testdata.out
TotalSeconds : 1292.2079663

So perhaps this is a deficiency of Powershell; one they might improve with V2. Fortunately, in this case we are dealing with a CSV file so we can improve performance using Import-Csv. Here is another attempt: -

PS> Set-Content testdata.out (Import-Csv .\testdata.csv -Header D,T,P,V | % { $_.P })
TotalSeconds : 333.3965842

The Perl Way

Slightly better but still seems poor and what do you do if your input file is delimited with more than a single character? I thought I should test the same problem using Perl.

PS> Copy-Item testdata.csv testdata; perl -nibak -e 's/[ \t]+//g; print \"\".(split(/,/, $_))[2].\"\n\"' testdata.out
TotalSeconds : 42.8753894

A considerable improvement! Then I began thinking would it be possible to include Perl within the Powershell pipeline. Unfortunately, this is not a simple case of placing Perl after the pipe character '|' since Powershell will not connect to the STDIN and STDOUT of a normal process. One needs to open a stream to STDIN of a Perl process and feed it Powershell's pipeline '$_' object. Furthermore, the STDOUT of the Perl process needs to be collected and sent back through to Powershell as a pipeline object.

A Powershell Function

My first attempt produced the following Powershell function: -

Function Perl-Filter() {
BEGIN {
$si = New-Object System.Diagnostics.ProcessStartInfo
$si.FileName = "C:\perl\bin\perl.exe"
$si.Arguments = @'
-ne "s/[ \t]+//g; print ''.(split(/,/, $_))[2].\"\n\""
'@
$si.UseShellExecute = $false
$si.RedirectStandardOutput = $true
$si.RedirectStandardInput = $true
$p = [System.Diagnostics.Process]::Start($si)
}
PROCESS {
$p.StandardInput.WriteLine($_)
$p.StandardInput.Flush()
}
END {
$p.StandardInput.Close()
Write-Output $p.StandardOutput.ReadToEnd()
$p.WaitForExit();
}
}

This Powershell function starts by spawning a Perl process and redirecting its STDIN and STDOUT streams. During the processing stage, data is flushed into Perl's STDIN and finally all data from the STDOUT stream is sent back down Powershell's pipeline via the echo command. Note the following will not work: -

PS> Get-Content .\testdata.csv | Perl-Filter > testdata.out

One problem with this function is that all of Perl's output is kept in memory until the process ends. It would be nice to Read data from the Perl process during the processing stage and send it down Powershell's pipeline as it is ready. Sadly, due to a known problem with .Net's StreamReader implementation a Read or Peek will block if the Stream has not had any data sent through it. The only workaround I know of is to start a separate thread to manage the Stream and this is where Powershell V1 has its limitations.

Another problem is it simply hangs because Perl is blocked from writing to the STDOUT stream once this pipe buffer is full. This usually is set to around 8KB.

A Powershell Cmdlet

So, how does one allow a pipeline to access another process's STDIN and STDOUT streams? Well the answer appears to be that one has to write a Cmdlet using C# or VB.Net.

The general layout is similar to that of the function above. One must implement three main methods BeginProcess, ProcessRecord and EndProcess each corresponding to the BEGIN, PROCESS and END blocks above. Building a new Powershell Cmdlet is made very simple by using David Aiken's Visual Studio Template.

I chose to follow a similar structure to the Powershell Function above spawning my process in a begin block but also starting a special thread that monitors the STDOUT of this process. The thread looks for a line delimiter sequence in the stream and as these are discovered records are broken off and pushed into an ObjectQueue. Here is an excerpt from the thread: -


List<char> dataQueue = new List<char>();
char[] separatorCharArray = Separator.ToCharArray();
char[] buffer = new char[16 * 1024];
int totalBytesRead = 0;
int totalObjectsQueued = 0;
int bytesRead;

while ((bytesRead = p.StandardOutput.Read(buffer, 0, buffer.Length)) > 0)
{
totalBytesRead += bytesRead;

for (int i = 0; i <= bytesRead - 1; i++)
dataQueue.Add(buffer[i]);

// pump out any complete objects
bool completeMatch = true;
int index;
while ((index = dataQueue.IndexOf(separatorCharArray[0])) > 0)
{
// skip if not enough chars to complete match
if (dataQueue.Count < index + separatorCharArray.Length)
continue;

// check it's a complete match, add if it is
for (int i = 1; i <= separatorCharArray.Length - 1; i++)
completeMatch &= dataQueue[index + i] == separatorCharArray[i];

if (completeMatch)
{
oq.Enqueue(new string(dataQueue.GetRange(0, index).ToArray()));
totalObjectsQueued++;
dataQueue.RemoveRange(0, index + separatorCharArray.Length);
}
}
}

// enqueue any remaining chars
if (dataQueue.Count > 0)
{
oq.Enqueue(new string(dataQueue.ToArray()));
dataQueue.Clear();
}

The ObjectQueue is then de-queued to the Powershell pipeline during the ProcessRecord and EndRecord stages. Very simply as: -

    while (ObjectQueue.Count != 0)
WriteObject(ObjectQueue.Dequeue());

The advantage of this structure is that one can choose processes that output data in other forms; perhaps even binary files.

Finally measuring the commands output: -

PS> Get-Content .\testdata.csv | Get-ProcessPipe -ProcessPath perl.exe -Arguments '-ne "s/\s//g; print q().(split(/,/, $_))[2].qq(\n)" ' > testdata.out
TotalSeconds : 1683.0998587

This is only a small improvement on the original .Net's string method above.

Conclusion

There are many advantages to using Powershell for many administration tasks. However, when processing more than 100,000 objects through its pipeline your task's performance will take a big hit. This is perhaps where it is worth falling back on more tried and tested tools such as Perl or Python and using traditional techniques. Powershell's object layer is very powerful; however, it would be nice if it could detect if it was processing either an object stream or text stream and behave accordingly.

I felt having access to a non-Powershell process as a Cmdlet might be useful so I have posted my code here on Google if you would like to play with it. Please note it has not been tested thoroughly and is bound to have many bugs.