Decoding Morse, the computer way

Jeremy Keith’s been sent a Morse message and is wondering what it says. Well, I now know what it says, cos I wrote a program to decode it. First, I went and got the mp3 of the message from Odeo and loaded it into Audacity. (No, not Jokosher; I’m at work, and there’s no Jokosher port for Windows. Audacity is still a free software audio editor, though, so being able to just download it is very handy indeed.) Audacity can read mp3s; I resampled it to 48000Hz (which made the peaks clearer) and made it mono (rather than stereo). At this point I could theoretically have decoded it by looking at the waveform: as you can see from this Audacity screenshot, that’s clearly the Morse code .... .. .-.-.- (short sound is a dot, long sound is a dash; that decodes to “HI.”) morse However, that’d be boring and laborious, and what if the message was five hours long, eh? You wouldn’t want to transcribe that by hand. So, get the computer to do it! I didn’t want to write an mp3 parser, though, and because I was on a Windows box I didn’t have GStreamer available which would have helped with this. So, I exported the sound from Audacity as a WAV file, and then went off and got sox, the Swiss Army knife of audio converters and another open source program. Sox can convert a sound sample to its “data” format, which is a load of lines that look like this:

               0  -6.1035156e-005
  2.2675737e-005  -3.0517578e-005
  4.5351474e-005                0
  6.8027211e-005  -3.0517578e-005
  9.0702948e-005   3.0517578e-005
   0.00011337868  -3.0517578e-005
   0.00013605442                0
   0.00015873016  -3.0517578e-005
    0.0001814059  -3.0517578e-005

where each line represents one sound frame; the first number is “number of seconds since the beginning of the sample”, and the second is “loudness of this frame”. So, that’s the answer! The loudness will be high where there was a sound and low where there wasn’t. I needed to go through the data and find all the loud bits and all the quiet bits; a long quiet bit is a space, a long loud bit is a dash, and a short loud bit is a dot. No problem: quick bit of Python scripting:

fp = open("jezza4.dat")
data = [int(abs(float(x.split()[1])) > 0.01) for x in fp.readlines()[2:]]

# count all the runs
counts = []
current = -1
count = 0
for i in data:
  if i != current:
    current = i
    count = 0
  count += 1

# now remove all short runs, which also removes the -1 row!
counts = [x for x in counts if x[1] >= 15]

# and reaggregate everything
counts2 = []
current = -1
count = 0
for i in counts:
  if i[0] != current:
    current = i[0]
    count = 0
  count += i[1]

mystr = ""
for x in counts2:
  if x[0] == 0:
    if x[1] > 350:
      mystr += " "
  elif x[0] == 1:
    if x[1] > 500:
      mystr += "-"
      mystr += "."
print mystr

and that printed out the Morse code for the message in dots and dashes! Nice and easy. Quick trip over to a Morse code converter to read it, and there we go. Message decoded. Good one, Tom. Oh, you want to know what it said? - --- ..- --. .... .-.. ..- -.-. -.- .-.-.- - .-. .- -. ... -.-. .-. .. -... . .. - -.-- --- ..- .-. ... . .-.. ..-. .-.-.- .-- . .-.. .-.. -.. --- -. . - --- -- .- -. - .... --- -. -.-- --..-- - .... --- ..- --. .... .-.-.-

I'm currently available for hire, to help you plan, architect, and build new systems, and for technical writing and articles. You can take a look at some projects I've worked on and some of my writing. If you'd like to talk about your upcoming project, do get in touch.

More in the discussion (powered by webmentions)

  • (no mentions, yet.)