From noobs to nerds, everyone has been told about neural network. But did you ever play with?
Let’s avoid the MNIST tutorial (handwritten digit recogntion), too boring, and choose something more exciting : music recognition ! Style, instrument, notes, effects, all of this would be nice to extract from mp3 !
Step 1 : find data
We need a lot of data for any kind of recognition. Of course we need music, and also a massive pack of instrument sample . Let’s start by classifying our datas, each instrument in a folder of it’s name. Step2 : generate more datas.
1500 samples is not that too bad, but what about getting a lot more ? In visual recognition, we would slightly distort our sample image, let’s do the same on our musical samples.
Now apply chorus, reverb and then filter for getting more realistic samples. Step 3 : prepare datas
As working on wav file would be painful, let’s transform our sample into representative images, for say spectrogram. As this project has never been documented, we will have to try or to combine different visualisation parameters.
Using Sox, we will get a DFT spectrogram, and use Hamming windows for the frequency analysis, and have to try the Dolph windows for dynamic analysis. For an human eye, the « peak spectrometer » seems also representative, let’s keep an eye on it.
I’m currently using a sh script for crawling music folder, i’ll release it soon here.
Step 4 : Build a brain I : Skull
Now we have to setup the tools to contain and manage our neural network.
I tried TensorFlow, wich works fine but lacks by it’s user-interface, in combination with mnisten, in order to package custom datas in MNIST format.
But even geeks needs some kindness, so we will use NVIDIA’s Digits and their Caffe fork, wich implement multiple neural network sample and tools for data formatting, the whole thing managed via a local webpage.
Building digits and caffe wasn’t so easy, here’s a short list of command to get it work on a fresh ubuntu 14.04.
Warning : Training new models on caffe/digits without CUDA is very slow. Our case took more than a week on ImageNet !
If you’re not the owner of a CUDA-compatible nvidia graphic card, you’ll have to spend from 50$ to 25000$ to go further, or use pretrained models.
Step 5 : Build a brain II : external wiring Let see our goals again : Get Mp3 file as input, Output MIDI file. In the same time, Caffe get images as input and output raw log.
We will basically need 2 programs, let’s say inputter, for spectrographing a mp3, and launching our tests. The second, outputter, will assume the conversion to MIDI format.
Step 6 : Build a brain III, or 470 ! Take a look on the parameters we want to extract for each instrument note :
– Note duration (attack – sustain – release events)
– Pitch (or note)
– Velocity / Volume
– Instrument sub-category / style
Many of these parameters can easily be calculated when we use monophonic tracks, but in our polyphonic case we will rely on our neural networks from the A to the Z, we came for that ! Sadly, the amount of combination still exceed what a poor computer like ours can do (it would take a 47 million category model to achieve the aimed resolution below ).
// hypothetical way
//Let’s split our job.
//First, what about using one model per instrument ? Better, let’s split their job in 3 :
// Score, instrumentation and effects.
//Score will detect around : 40 notes * 3 events * 8 volumes = 960 cats #may vary for polyphonic instrument
//Instrumentation : 20 sub instruments * 50 filters = 1000 cats
//Effects : 12 delays * 5 reverbs * 15 filters = 900 cats
let’s first try to recognize drum and percussions hits. Results in 15 hours ^^ »
Still training, and get 92% accuracy … Looks promising !
Save & reboot.
Result : slow fps. might have only 4k DMA memory ?
Second attempt : What was that default driver ?
It seems that wheezy aldready provided us a mz61581-overlay.dtb file.
Let’s try again the official guide, but now ommit the rm / wget part.
Maybe this old driver would fit better ? (quick answer, no)
Mayber the new raspbian provided mz-overlay.dtb would be better (no again)
Third attempt : Notro our savior !
Working on the previous version of the MZ61581-PI-EXT, i had poor performance until using notro’s work. Let’s take a look on his wiki: fbtft_device route : tried
« sudo modprobe fbtft_device name=mz61581 » but mz61581 is only an overlay, not a fbtft official device.
« sudo modprobe fbtft_device name=tontec35_9481 » and « sudo modprobe fbtft_device name=tontec35_9486 », only get the backlight working.
Mz61581 is not one of these !
Forth attempt : woohoo!
Dtoverlay route: To use the new spi bcm2835 with dma support, let’s disable old spi enabling in /boot/config.txt :
Be sure to use a 32Kb buffer version of mz61581-overlay.dtb !
2) Rpi Driver : With config.txt : dtparam=mz61581,debug=32 Eurêka, we new have a good fps. Sadly, there is some weird effect on the display, coor distorsion, strange transparent waves drawn over the image.
[ 3.809940] gpiomem-bcm2835 3f200000.gpiomem: Initialised: Registers at 0x3f2 00000
[ 3.859648] spi spi0.1: setting up native-CS1 as GPIO 7
[ 3.953568] spi spi0.0: setting up native-CS0 as GPIO 8
[ 4.233484] ads7846 spi0.1: touchscreen, irq 484
[ 4.243750] fbtft: module is from the staging directory, the quality is unkno wn, you have been warned.
[ 4.247379] fb_s6d02a1: module is from the staging directory, the quality is unknown, you have been warned.
[ 4.278306] input: ADS7846 Touchscreen as /devices/platform/soc/3f204000.spi/ spi_master/spi0/spi0.1/input/input1
[ 4.297086] fbtft_of_value: width = 320
[ 4.308640] fbtft_of_value: height = 480
[ 4.320960] fbtft_of_value: buswidth = 8
[ 4.333180] fbtft_of_value: debug = 32
[ 4.350623] fbtft_of_value: rotate = 270
[ 4.361306] fbtft_of_value: fps = 30
[ 4.371700] fbtft_of_value: txbuflen = 32768
[ 4.797830] fb_s6d02a1 spi0.0: Display update: 12591 kB/s (23.825 ms), fps=0 (0.000 ms)
[ 4.812592] graphics fb1: fb_s6d02a1 frame buffer, 480×320, 300 KiB video mem ory, 32 KiB DMA buffer memory, fps=33, spi0.0 at 128 MHz [ 324.706973] fb_s6d02a1 spi0.0: Display update: 12671 kB/s (23.673 ms), fps=20 (49.978 ms)
[ 324.766897] fb_s6d02a1 spi0.0: Display update: 12662 kB/s (23.692 ms), fps=16 (59.906 ms)
2) Rpi Driver : With config.txt : dtparam=mz61581,fps=50,debug=32
Let’s see how to install a very quick splashscreen / bootsplash for your raspberry (or almost every distro having initrd compiled in kernel).
For this tutorial we will work on a Raspberry Pi 2, on debian wheezy, and a 3.5″ TFT (tontec mz61581), supported with « God-Like » notro’s firmware.
The process will be similar for others LCD, as long as they are supported by notro’s firmware. You can even adapt this tutorial without LCD, but mirroring a single screen can be hazardous !
Principe : To achieve the quickest display, we will first write raw image to the framebuffer(s). As soon as the LCD is loaded, plymouth is displayed to get a nicer and evolutive display. While plymouth is diplayed, we will start mirroring our screens with fbcp.
To save some precious seconds, all of this will be launched trough initramfs.
Bootsplash and splashscreen are quite redundant, you may want to use only one of them !
It’s up to you to install your LCD and the wonderful fbcp !
First, let’s install the required programs :
sudo apt-get install plymouth-drm ffmpeg
Initramfs will allow us to preload some devices and programs before the normal startup operations. In practice, it’s a compressed file loaded at boot as a file system. Let’s all praise JAH and rasbian devs, we don’t have no more to compile kernel for using it !
Make initramfs image
sudo update-initramfs -c -k $(uname -r)
Tell the bootloader to use the generated image. Add this to /boot/config.txt:
Use the filename reported by the update-initramfs command.
Setup loading of fbtft modules
Next let’s try to start sooner our LCD display, including it’s modules to initramfs.
First, we must list the modules that should be included and loaded in the file : /etc/initramfs-tools/modules
You may have to change the fb_s6d02a1 according to your LCD. See your /etc/modules for inspiration !
To be sure that the SPI master is available before fbtft module is loaded, let’s script the loading.
It’s now time to prepare your image with your favorite image editor, and save 2 versions of them as png, one in 640×480, one in your LCD native resolution, 480×320 in our case.
(note that ratio don’t match, deal with that !)
Now we need to translate them in a format that match /dev/fb*, RGB565 seems ok for our color depht. As no graphic editor did include this format, let’s use ffmpeg to convert these png to raw RGB565.
Including files in our initramfs is a bit tricky on first sight, as we have to create a hook script in order to do this.
We will also use this script to include fbcp in our initramfs.
Create « /etc/initramfs-tools/hooks/cpimg.sh » with this content, and make it executable :
As you can see, we are copying our image in the /etc initramfs directory (maybe not the best location, but we can rely on it’s existence).
If you don’t want screen mirroring, remove all fbcp related lines !
Last step, tell the system to display our images. Let’s edit our previous « /etc/initramfs-tools/scripts/init-top/spi » script and add at the bottom :
You may also want to write « logo.nologo » in cmdline.txt to avoid the raspberry logo display at boot.
If you have previously include fbcp via our hook script, the only thing we’ve got to do is to launch fbcp, at the good time. Too early, the LCD won’t be initialized, too late, plymouth would appear only in one screen, being duplicated after, and that may not be enough to fullfill our thirst of perfection !
A correct way of launching fbcp is to create a launcher script in the /etc/initramfs-tools/scripts/init-bottom directory. So let’s do it !
case $1 in
Give this script the name of your choice, and make it executable.
Putting all together
Update initramfs (add -v to get more info)
sudo update-initramfs -u
Reboot, and tadaaa !
The transition between splashscreen and bootsplash is quite rough in the demo video. In practice you may achieve a soft transition using a splashscreen derivated from your choosen plymouth theme.
You may get a smaller initramfs size by compressing your raw image. Might be usefull for larger resolutions.
Give money to Notro and Tasanakorn, they deserve it !