Monthly Archives: février 2016

Neural Network – Let’s dub

From noobs to nerds, everyone has been told about neural network. But did you ever play with?

Let’s avoid the MNIST tutorial (handwritten digit recogntion), too boring, and choose something more exciting : music recognition ! Style, instrument, notes, effects, all of this would be nice to extract from mp3 !

Step 1 : find data
We need a lot of data for any kind of recognition. Of course we need music, and also a massive pack of  instrument sample . Let’s start by classifying our datas, each instrument in a folder of it’s name.
Step2 : generate more datas.
1500 samples is not that too bad, but what about getting a lot more ? In visual recognition, we would slightly distort our sample image, let’s do the same on our musical samples.
Now apply chorus, reverb and then filter for getting more realistic samples.
Step 3 : prepare datas
As working on wav file would be painful, let’s transform our sample into representative images, for say spectrogram. As this project has never been documented, we will have to try or to combine different visualisation parameters.
Using Sox, we will get a DFT spectrogram, and use Hamming windows for the frequency analysis, and have to try the Dolph windows for dynamic analysis. For an human eye, the « peak spectrometer » seems also representative, let’s keep an eye on it.

I’m currently using a sh script for crawling music folder, i’ll release it soon here.

Step 4 : Build a brain I : Skull
Now we have to setup the tools to contain and manage our neural network.
I tried TensorFlow, wich works fine but lacks by it’s user-interface, in combination with mnisten, in order to package custom datas in MNIST format.
But even geeks needs some kindness, so we will use NVIDIA’s Digits and their Caffe fork, wich implement multiple neural network sample and tools for data formatting, the whole thing managed via a local webpage.

Building digits and caffe wasn’t so easy,  here’s a short list of command to get it work on a fresh ubuntu 14.04.

 Without CUDA

cd $HOME
sudo apt-get install python-pil python-numpy python-scipy python-protobuf \ python-gevent python-flask python-flaskext.wtf gunicorn python-h5py \
libgflags-dev libgoogle-glog-dev libopencv-dev \
libleveldb-dev libsnappy-dev liblmdb-dev libhdf5-serial-dev \
libprotobuf-dev protobuf-compiler libatlas-base-dev \
python-dev python-pip python-numpy gfortran
sudo apt-get install –no-install-recommends libboost-all-dev
git clone https://github.com/NVIDIA/DIGITS.git digits
export $DIGITS_HOME=$HOME/digits
wget https://bootstrap.pypa.io/get-pip.py
sudo python get-pip.py
cd $DIGITS_HOME
sudo pip install -r requirements.txt
cd $HOME
git clone –branch caffe-0.14 https://github.com/NVIDIA/caffe.git
export CAFFE_HOME=${HOME}/caffe
sudo apt-get install
cd $CAFFE_HOME
cat python/requirements.txt | xargs -n1 sudo pip install
cd $CAFFE_HOME

sed -i ‘$aCPU_ONLY := 1’ Makefile.config
make all
make test
make runtest
sudo pip install –upgrade Flask-WTF
cd $DIGITS_HOME

With CUDA

CUDA_REPO_PKG=cuda-repo-ubuntu1404_7.5-18_amd64.deb && wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/$CUDA_REPO_PKG && sudo dpkg -i $CUDA_REPO_PKG

ML_REPO_PKG=nvidia-machine-learning-repo_4.0-2_amd64.deb && wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1404/x86_64/$ML_REPO_PKG && sudo dpkg -i $ML_REPO_PKG

sudo apt-get update
sudo apt-get install digits

Warning  : Training new models on caffe/digits without CUDA is very slow. Our case took more than a week on ImageNet !
If you’re not the owner of a CUDA-compatible nvidia graphic card, you’ll have to spend from 50$ to 25000$ to go further, or use pretrained models.

Step 5 : Build a brain II : external wiring
Let see our goals again : Get Mp3 file as input, Output MIDI file. In the same time, Caffe get images as input and output raw log.
We will basically need 2 programs, let’s say inputter, for spectrographing  a mp3, and launching our tests. The second, outputter, will assume the conversion to MIDI format.

Step 6 : Build a brain III, or 470 !
Take a look on the parameters we want to extract for each instrument note :
– Note duration (attack – sustain – release  events)
– Pitch (or note)
– Velocity / Volume
– Instrument sub-category / style
– Filter
– Reverb
Many of these parameters can easily be calculated when we use monophonic tracks, but in our polyphonic case we will rely on our neural networks from the A to the Z, we came for that ! Sadly, the amount of combination still exceed what a poor computer like ours can do (it would take a 47 million category model to achieve the aimed resolution below ).

// hypothetical way
//Let’s split our job.
//First, what about using one model per instrument ? Better, let’s split their job in 3 :
// Score, instrumentation and effects.
//Score will detect around  : 40 notes * 3 events * 8 volumes = 960 cats  #may vary for polyphonic instrument
//Instrumentation : 20 sub instruments * 50 filters = 1000 cats
//Effects : 12 delays * 5 reverbs * 15 filters = 900 cats

let’s first try to recognize drum and percussions hits. Results in 15 hours ^^ »
Still training, and  get 92% accuracy … Looks promising  !

The MZ61581 PI EXT 2015.12.12 case

Hi mates !

Let’s try to make this LCD screen working on linux raspbian wheezy. Symptoms on first tests were slow fps, or image distorsion.

First attempt : Let’s follow the official setup guide

sudo apt-get update
sudo apt-get upgrade
* coffee time *

sudo reboot
cd /boot/overlays
sudo mv mz61581-overlay.dtb old-mz61581-overlay.dtb
sudo wget http://www.itontec.com/mz61581-overlay.dtb
sudo nano /boot/config.txt

Add to the bottom :

dtparam=spi=on
dtoverlay=mz61581,debug=32

Save & reboot.
Result : slow fps. might have only 4k DMA memory  ?

Second attempt : What was that default driver ?
It seems that wheezy aldready provided us a mz61581-overlay.dtb file.
Let’s try again the official guide, but now ommit the rm / wget part.
Maybe this old driver would fit better ? (quick answer, no)
Mayber the new raspbian provided mz-overlay.dtb would be better (no again)

Third attempt : Notro our savior !

Working on the previous version of the MZ61581-PI-EXT, i had poor performance until using notro’s work. Let’s take a look on his wiki:
fbtft_device route  : tried
« sudo modprobe fbtft_device name=mz61581 » but mz61581 is only an overlay, not a fbtft official device.
« sudo modprobe fbtft_device name=tontec35_9481 » and « sudo modprobe fbtft_device name=tontec35_9486 », only get the backlight working.
Mz61581 is not one of these !

Forth attempt : woohoo!

Dtoverlay route:
To use the new spi bcm2835 with dma support, let’s disable old spi enabling in /boot/config.txt :
#dtoverlay=spi=on
dtoverlay=mz61581,speed=64000000
Be sure to use a 32Kb buffer version of mz61581-overlay.dtb !

This works !


Test protocol :

wget http://fredrik.hubbe.net/plugger/test.mpg
sudo apt-get install mencoder mplayer2
mencoder test.mpg -ovc lavc -lavcopts vcodec=mpeg4 -vf scale=480:320 -o test_480_320.mpg
mplayer -nolirc -vo fbdev2:/dev/fb1 test_480_320.mpg



 

Results :
New Tontec drivers have poor performances, but are working.
Old drivers have better performance, but are not fully compatible with the new MZ61581-PI-EXT 2015.12.12.

1) Tontec driver : With config.txt : dtparam=mz61581,debug=32
Laggy display

lsmod
Module                  Size  Used by
cfg80211              419759  0
rfkill                 16659  1 cfg80211
snd_bcm2835            19739  0
snd_pcm                74833  1 snd_bcm2835
snd_seq                53470  0
snd_seq_device          3650  1 snd_seq
snd_timer              18164  2 snd_pcm,snd_seq
snd                    52116  5 snd_bcm2835,snd_timer,snd_pcm,snd_seq,snd_seq_device
joydev                  9047  0
fb_s6d02a1              3540  0
fbtft                  27517  1 fb_s6d02a1
syscopyarea             2789  1 fbtft
sysfillrect             3313  1 fbtft
sysimgblt               1837  1 fbtft
fb_sys_fops             1149  1 fbtft
ads7846                11340  0
hwmon                   2927  1 ads7846
spi_bcm2708             5149  0
bcm2835_gpiomem         3023  0
uio_pdrv_genirq         2966  0
evdev                  10226  2
uio                     8228  1 uio_pdrv_genirq

[    3.863542] bcm2708_spi 3f204000.spi: master is unqueued, this is deprecated
[    3.888126] bcm2708_spi 3f204000.spi: SPI Controller at 0x3f204000 (irq 80)
[    4.013842] ads7846 spi0.1: touchscreen, irq 484
[    4.027389] input: ADS7846 Touchscreen as /devices/platform/soc/3f204000.spi/spi_master/spi0/spi0.1/input/input1
[    4.034035] fbtft: module is from the staging directory, the quality is unknown, you have been warned.
[    4.038912] fb_s6d02a1: module is from the staging directory, the quality is unknown, you have been warned.
[    4.076110] fbtft_of_value: width = 320
[    4.085907] fbtft_of_value: height = 480
[    4.092665] fbtft_of_value: buswidth = 8
[    4.099748] fbtft_of_value: debug = 3
[    4.105815] fbtft_of_value: rotate = 270
[    4.112181] fbtft_of_value: fps = 30
[    4.126520] fb_s6d02a1 spi0.0: fbtft_request_one_gpio: ‘reset-gpios’ = GPIO15
[    4.136612] fb_s6d02a1 spi0.0: fbtft_request_one_gpio: ‘dc-gpios’ = GPIO25
[    4.146366] fb_s6d02a1 spi0.0: fbtft_request_one_gpio: ‘led-gpios’ = GPIO18
[    4.155397] fb_s6d02a1 spi0.0: fbtft_verify_gpios()
[    4.162032] fb_s6d02a1 spi0.0: fbtft_init_display_dt()
[    4.169080] fb_s6d02a1 spi0.0: fbtft_reset()
[    4.297532] fb_s6d02a1 spi0.0: init: write_register: B0 00
[    4.304769] fb_s6d02a1 spi0.0: init: write_register: 11
[    4.311458] fb_s6d02a1 spi0.0: init: msleep(255)
[    4.582945] fb_s6d02a1 spi0.0: init: write_register: B3 02 00 00 00
[    4.590654] fb_s6d02a1 spi0.0: init: write_register: C0 13 3B 00 02 00 01 00 43
[    4.599359] fb_s6d02a1 spi0.0: init: write_register: C1 08 16 08 08
[    4.606934] fb_s6d02a1 spi0.0: init: write_register: C4 11 07 03 03
[    4.614425] fb_s6d02a1 spi0.0: init: write_register: C6 00
[    4.621006] fb_s6d02a1 spi0.0: init: write_register: C8 03 03 13 5C 03 07 14 08 00 21 08 14 07 53 0C 13 03 03 21 00
[    4.633602] fb_s6d02a1 spi0.0: init: write_register: 35 00
[    4.640307] fb_s6d02a1 spi0.0: init: write_register: 36 A0
[    4.647034] fb_s6d02a1 spi0.0: init: write_register: 3A 55
[    4.653747] fb_s6d02a1 spi0.0: init: write_register: 44 00 01
[    4.660693] fb_s6d02a1 spi0.0: init: write_register: D0 07 07 1D 03
[    4.668133] fb_s6d02a1 spi0.0: init: write_register: D1 03 30 10
[    4.675296] fb_s6d02a1 spi0.0: init: write_register: D2 03 14 04
[    4.682411] fb_s6d02a1 spi0.0: init: write_register: 29
[    4.688689] fb_s6d02a1 spi0.0: init: write_register: 2C
[    4.694942] fb_s6d02a1 spi0.0: set_var()
[    4.707942] random: nonblocking pool is initialized
[    4.901920] fb_s6d02a1 spi0.0: Display update: 1484 kB/s (202.020 ms), fps=0 (0.000 ms)
[    4.912125] fb_s6d02a1 spi0.0: fbtft_register_backlight()
[    4.919447] graphics fb1: fb_s6d02a1 frame buffer, 480×320, 300 KiB video memory, 4 KiB DMA buffer memory, fps=33, spi0.0 at 128 MHz
[    4.933895] fb_s6d02a1 spi0.0: fbtft_backlight_update_status: polarity=0, power=0, fb_blank=0

1) Tontec driver : With config.txt : dtparam=mz61581,fps=50,debug=32
Laggy display too
[ 5.174915] graphics fb1: fb_s6d02a1 frame buffer, 480×320, 300 KiB video memory, 4 KiB DMA buffer memory, fps=50, spi0.0 at 128 MHz
[   66.306436] fb_s6d02a1 spi0.0: Display update: 1475 kB/s (203.191 ms), fps=4 (239.998 ms)
[   66.481634] fb_s6d02a1 spi0.0: Display update: 1893 kB/s (158.398 ms), fps=4 (219.991 ms)

2) Rpi Driver : With config.txt : dtparam=mz61581,debug=32
Eurêka, we new have a good fps. Sadly,  there is some weird effect on the display, coor distorsion, strange transparent waves drawn over the image.

[ 3.809940] gpiomem-bcm2835 3f200000.gpiomem: Initialised: Registers at 0x3f2 00000
[ 3.859648] spi spi0.1: setting up native-CS1 as GPIO 7
[ 3.953568] spi spi0.0: setting up native-CS0 as GPIO 8
[ 4.233484] ads7846 spi0.1: touchscreen, irq 484
[ 4.243750] fbtft: module is from the staging directory, the quality is unkno wn, you have been warned.
[ 4.247379] fb_s6d02a1: module is from the staging directory, the quality is unknown, you have been warned.
[ 4.278306] input: ADS7846 Touchscreen as /devices/platform/soc/3f204000.spi/ spi_master/spi0/spi0.1/input/input1
[ 4.297086] fbtft_of_value: width = 320
[ 4.308640] fbtft_of_value: height = 480
[ 4.320960] fbtft_of_value: buswidth = 8
[ 4.333180] fbtft_of_value: debug = 32
[ 4.350623] fbtft_of_value: rotate = 270
[ 4.361306] fbtft_of_value: fps = 30
[ 4.371700] fbtft_of_value: txbuflen = 32768
[ 4.797830] fb_s6d02a1 spi0.0: Display update: 12591 kB/s (23.825 ms), fps=0 (0.000 ms)
[ 4.812592] graphics fb1: fb_s6d02a1 frame buffer, 480×320, 300 KiB video mem ory, 32 KiB DMA buffer memory, fps=33, spi0.0 at 128 MHz
[ 324.706973] fb_s6d02a1 spi0.0: Display update: 12671 kB/s (23.673 ms), fps=20 (49.978 ms)
[ 324.766897] fb_s6d02a1 spi0.0: Display update: 12662 kB/s (23.692 ms), fps=16 (59.906 ms)

2) Rpi Driver : With config.txt : dtparam=mz61581,fps=50,debug=32

The FPS is better, the glitchs remains.
[ 65.625644] fb_s6d02a1 spi0.0: Display update: 13373 kB/s (22.432 ms), fps=25 (39.992 ms)
[ 65.665652] fb_s6d02a1 spi0.0: Display update: 13370 kB/s (22.438 ms), fps=24 (40.003 ms)

Tried with fps=15 in config.txt, same glitchs happens.

After update

[ 4.592106] bcm2835-rng 3f104000.rng: hwrng registered
[ 4.603414] gpiomem-bcm2835 3f200000.gpiomem: Initialised: Registers at 0x3f200000
[ 4.630936] bcm2708_spi 3f204000.spi: master is unqueued, this is deprecated
[ 4.651956] bcm2708_spi 3f204000.spi: SPI Controller at 0x3f204000 (irq 80)
[ 4.701122] ads7846 spi0.1: touchscreen, irq 484
[ 4.706262] fbtft: module is from the staging directory, the quality is unknown, you have been warned.
[ 4.709738] fb_s6d02a1: module is from the staging directory, the quality is unknown, you have been warned.
[ 4.743410] input: ADS7846 Touchscreen as /devices/platform/soc/3f204000.spi/spi_master/spi0/spi0.1/input/input1
[ 4.762299] fbtft_of_value: width = 320
[ 4.772315] fbtft_of_value: height = 480
[ 4.783039] fbtft_of_value: buswidth = 8
[ 4.792916] fbtft_of_value: debug = 32
[ 4.802531] fbtft_of_value: rotate = 270
[ 4.812214] fbtft_of_value: fps = 15
[ 5.239950] random: nonblocking pool is initialized
[ 5.452605] fb_s6d02a1 spi0.0: Display update: 1225 kB/s (244.778 ms), fps=0 (0.000 ms)
[ 5.468757] graphics fb1: fb_s6d02a1 frame buffer, 480×320, 300 KiB video memory, 4 KiB DMA buffer memory, fps=16, spi0.0 at 128 MHz