Using iterative solvers – a few lessons learnt


I’ve been working on a couple of projects relating to camera and joint calibration on the Nao Robot for the last couple of years. During the last few months I put more focus on calibrating joints of this robot, particularly the legs.

Almost all calibration tools I’ve know of usually rely on non-linear iterative solvers, particularly the venerable Levenberg-Marquardt solver. The usual setting is the user define a cost function or a fitness function (usually in case of a genetic or like algorithm). This can also be introduced as an optimization problem depending on the field of interest.

This post is not meant to be an exhaustive review of the vast field of optimization or to give specific recommendations as I’m not an expert at this, but to share a few issues I came across and how they affect if not taken proper care of.

Lesson 1: Local or Global solver?

This is one of the most crucial selections in my opinion as it determines the time for calculation and many other factors. Why would this matter? Can’t we just throw in a global optimizer for all problems? – The simple answer is No!

Extrema example
Figure 1. Local and Global minimum; Source: I, KSmrq [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0/)%5D https://commons.wikimedia.org/wiki/File:Extrema_example.svg
Taking a look at Figure 1, if this plot relates to a cost function, then minimizing the cost is the objective, thus the solution would be at the global minimum. Another example can be in robotics, the case of inverse kinematics where there can be multiple solutions for a robot to reach the same place.

Majority of the solvers, rely on the gradient of the curve (Derivative or Jacobian matrix in case of multi-variable function), why? knowing the derivative helps the solver to quickly know if it is heading in the right direction! Some might not directly use the derivative, but see if it reduced or increased the cost.

The class of solvers called “local minima” solvers will terminate when they find a minima. Since they don’t spend searching the entire solution space, they are usually fast to give a solution. The ones that rely on derivative (AKA gradient) tend to converge fast as they usually employ “acceleration factors”. The overall result is quick results than brute-force search.

Unfortunately, this behavior of the local solvers also means it is sensitive to the initial position/ initial parameter vector, as the solution depends on where it started. Therefore these solvers work poorely when the cost function is noisy or discontinuous.

Why not use a global solver all the time?

Simply because it consumes a lot of time as it has to check the entire solution space (Consider a system with 20 variables with granularity of 0.1 and bounds of +- 5 -> (10/ 0.1)^20 possible solutions. There are huge systems with thousands of parameters. In order to reduce the workload, bounds for the parameters can be introduced and also to simplify the model. There are also methods such as Particle Swarm optimization which enables parallel processing, etc. Yet the results weren’t that magnificent in a few quick trials, coupled with the time for calculation, I would not recommend global solvers unless absolutely necessary.

This is a widely researched field and most of the machine learning or neural network training algorithms rely on one or more flavours of this class of solvers, clearly there are many new and old methods.

L1 : Conclusion

Try to use a local solver, such as Levenberg-Marquardt. But keep in mind of their limitations, particularly providing a good initial parameter set.

Lesson 2: Derivative free solvers?

This is another important choice and it will definitely ruin the day if not taken proper care of!.

  1. Is the cost/ fitness function differentiable?
  2. Is it noisy?
  3. Is there an analytical solution? Or is the numerical solution good/ stable?

If the answer to first is yes, then the options are wider, else the derivative free solvers have to be used or the cost function could be refactored to be differentiable.

If the second is also yes then one might need to employ smoothing functions or to refactor the cost function.

If differentiable but too much effort for analytical solution, then numerical differentiation could be used. But Numerical Differentiation poses additional dangers and can severely degrade performance if not used with care. I would call this the most important learnt-lesson for my case. Once I had a 10% drop of failures just by changing to central difference method alone! An alternative is Automatic Differentiation (Available with libraries like Eigen, see http://autodiff.org for more)

So far, in my experience, gradient or derivative or similarly motivated solvers converge fast under right conditions than simple brute force searches. So use them if possible. Else Genetic algorithms, CMA-ES, Pattern search, particle swarm optimization, etc could be used.

Lesson 3: Think before leap!

Before wasting hours or days, think it through which solver is good for the particular case. While it is tempting to use Levenberg Marquardt (probably the 3rd mention in this post) which works pretty well for a vast range of situations, it might not be the best for the job. Also pay attention to the factor of differentiability and quality of the differentiation if such solvers are used.

Picking the right language for the task..


In the past year I had to juggle multiple programming languages much more intensively. While it helped me to improve my skills and refresh, I also came to some conclusions.

Knowing to select a language for a task, knowing these languages and their capabilities is useful in RnD area as time and effort can be saved in writing the code and focus can be put to the actual research.

Below views are my personal opinions based on past experience. They might not be the most correct.

C/ C++

Best for high performance, real time work. Can be a hindrance for quick testing or prototyping. My usual setup is CMake + C++, commonly used for OpenCV projects, etc. I’m not very happy when it comes to things like REST calls, JSON objects…

IDE support can be sometimes bit annoying, esp. finding an open source one, they tend to hog memory or CPU. I use QT Creator, VS Code ( != visual studio**) and just plan text editors.

Yet when written well, its in it’s own class of beauty 😛

Python

After a long pause, I got back to using python. I paid more attention on code quality, learnt more of specific python3 matters.

Quite nice for quick prototyping, etc. Can handle many things well. Specially scientific calculations, matrices, etc got excellent support with libraries such as Numpy and Scipy. In essence, it is entirely possible to use Python in place of Matlab and can be quite faster and of course free of charge.

These libraries are good in performance too, thanks for nice support for calling C/ C++ API’s from python, computationally intensive algorithms can be written in C/C++ and wrap around Python. (C++ calling from Java was a nightmare in comparison). This same feature would be useful when providing Python API’s for C/C++ libraries or calling computationally intensive, optimized algorithms.

I’m not very convinced of production-level code, probably better to convert to Java or C++ (or use a wrapper?) when bundling. These questions mainly arise when dealing with proprietary code that has to be deployed to customers. There is bytecode based distributing option like Java, but I don’t know much about options.

Python definitely do better than JS for prototyping, esp. matrices, etc stuff.

JavaScript

Alright for quick prototyping with NodeJS. Being a born-in-browser language, it’s support for web and related things, async programming is nice.

However, for larger projects, consider writing in Typescript or similar typed language for better code quality and ease of debugging/ IDE support.

There is at least 2 libraries for everything, but finding a feature-complete, good ones isn’t that easy. Don’t even bother to do matrix calculations or “big-data”/ statistical stuff in this 😛

I rather stupidly implemented a camera calibration tool on JS since the debug framework was NodeJS + browser, pretty much wrote everything including a bug-freed Levenberg Marquardt solver.

Java

Despite “write once, run anywhere”, C++ is preferable on some ends (with many FOSS libraries available in most platforms). Java can do pretty much everything like C/ C++, but that doesn’t mean it should be used for everything!

However deploying in many platforms is easier with Java . Yet I don’t fully agree on “enterprise grade stuff is written in Java” story.

I believe Java usage in Enterprise environments got accelerated with Sun certifications, hardware, the very nature of JVM and there are many mature tools written in Java. My disagreement comes because there are better tools out there for certain jobs but some people cannot do anything without Java which can become a bottleneck in RnD environments.

Languages/ frameworks should be chosen for the task and situation, not by the fact the programmer or system architect is __ years experienced with Java based systems.

PHP

This used to be my de facto choice for server side web apps, rendering, etc. WordPress, Joomla, etc are written on PHP!

It’s quite fast (According to a recent benchmark I saw, PHp7.0 is only slower than -o2 C++ code. In the sense of variables typing, PHP is similar to Javascript and the code can look very messy.

Decent looking and readable code can be written in PHP with good use of classes and using well made frameworks. In terms of web-servers, Java tend to be generally slower in this area, and PHP is optimized and pretty much made for this job.

Conclusion

While it is possible to do pretty much everything from all these languages, it should be noted that all of these languages are written in C/ C++ in general – why? performance. But that doesn’t mean everything should be written in C/ C++.

Some people like to write everything in Java, C/C++ or Python, what I wanted to point out in this rather entangled and not-so verbose post is that tools are there for convenience and to get the job done!!

Don’t use a sledgehammer to crack a nut

Low cost LIDAR experiment. Part 1


Introduction

LIDAR (Light Detection And Ranging) is a technology similar to Radar, used to measure distance, etc using light. Generally the principle of measuring is same of basic radar, using time of flight technique. (Measuring time taken to light for travel). https://en.wikipedia.org/wiki/Lidar

I wanted to build or find a low cost setup a few years back, but the search was quite useless 😛 . Building one need sensitive components and finding these specialized components in Sri Lanka is almost impossible, so I gave up.

At the present, there is one or two low cost products like LIDAR lite ($150) approx. with  max range of 40m. There was another kickstarter project as well, similar pricing.

Objectives

For the first stage, I’ll simply experiment with available research, low cost components. Later on, optics and range will be improved.

  • Will use PIN photodiodes instead of commonly used APD (Avalanche Photo Diode) – cost and requirement of high voltage circuit for APD makes it unattractive.
  • Attempt to use phase shift method and time of flight method.

Implementation

  1. Light emitting part
  2. Light receiver and amplification circuit. This is probably the most crucial stage for a low cost setup due to the less sensitive PIN photodiodes.
  3. Timing circuit
  4. Final processing circuit.

Trial 1

My first trial was using Osram SFH 4545 IR emitting diode and Everlight PD333-3C photodiode.

IR emitter

I used a TI Tiva C launchpad to generate the needed pulse output and a BC337 transistor to drive the LED. This setup is not much ideal and I need to use a MOSFET to get good rise/ fall time and current output. (This LED can handle up to 1A pulse).

IR Reciever

I started with available components I had. Judging on the literature available, the needed circuit is called “trans impedance amplifier” – this simply mean the amplifier convert a current to a voltage. The reason was that photo diodes actually generate current than a voltage on different lights. After referring articles mainly by Texas Instruments, I constructed the following circuit. I did not do any formal calculation or analysis, this was just a trial and error setup.

The Op amp I had at hand was TLC25L4A. This is a low power op amp with decent gain but the bandwidth is not that impressive. Nevertheless, the circuit was the foundation of the next iteration.

amp_circuit
Op amp : TLC25L4A, R2 = 3Mohm, CR1 = PD333-3C. Image from SBOA060 – TI application note.

Results and limitations

The circuit amplified signals picked up from the photodiode quite okay. But the range was severely limited. Measurements were taken from my 5 year old DSO Quad oscilloscope.

The main issues I faced was the lack of optical filter on the photo diode, therefore it picked up 50Hz light came from the lights in my room, etc. Attempts for filters were rather futile (I tried RC filters only, didn’t have inductors at hand).

16388423_10210955954912213_4228562809503519770_n
At maximum range, approx 60cm!. The amplifier did not have a big gain.

In the next installment, I’ll explore the next set of circuits and other changes such as IR pulse width, etc.

File services with Dreamfactory (file creation)


This and the following posts will cover the areas about the file API that was not clearly covered in the Dreamfactory Documentation. The focus on this article is about file & folder creation.

Contents

  1. Create a file
    1. Set content via JSON Request Body
    2. Set content via Multi-part form upload (file upload)
    3. Download to server from URL
  2. Create a folder
  3. Combine folder and file creation

General Information

For all these requests, a JSON request body is used in the following basic format.

POST : Each element of resource array will define a File, Folder request.

  • type : file or folder
  • name : name of file
  • path : path to file
  • content_type : media type ie: text/json
  • content : contents of the file
{
    "resource": [
        {
            "name": "folder2",
            "path": "folder2",
            "type":"folder"
        },
        {
            "name": "folder2/file1.txt",
            "path": "folder2/file1.txt",
            "type":"file",
            "content_type": "text",
            "content":"gdfgdgdfgf"
        }
    ]
}

1. Create a file

1.1 One or multiple files with JSON data.

Type : POST

Note: Setting property “is_base64”: true will enable to upload images with “content” set as Base 64 encoded string.

Warning: Using Base64 encoding for large images is highly discouraged!!! (I tried uploading 18MP image and it was not worth the trouble). For these cases, go for direct file upload by multi-part form.

{
    "resource": [
        {
            "name": "folder2/file1.txt",
            "path": "folder2/file1.txt",
            "type":"file",
            "content_type": "text",
            "content":"gdfgdgdfgf"
        },
        {
            "name": "folder2/file2.txt",
            "path": "folder2/file2.txt",
            "type":"file",
            "content_type": "text",
            "content":"lsjfvgm"
        }
    ]
}

This will create two files in the folder names “folder2”. If the folder does not exist, an error will occur.

1.2 Multi-part form upload (file upload)

When I tried to upload some big images, base 64 encoding was not good for the performance at any level. Therefore direct file upload was the best option for my scenario. Unfortunately this was not clearly documented in the wiki.

Type : POST (multi part form)

I have tested and used the following three methods.

Method 1. Plain HTML + minimum Javascript Based.

This can be done in a few ways. Below example is from “test_rest.html” in dreamfactory to test REST calls. Of course the javascript can be completely dropped out if needed.

HTML

<form enctype="multipart/form-data" onsubmit="postForm(this)" action="/api/v2/system/user/" method="POST">
 <input type="hidden" name="app_name" value="admin" />
 <!-- MAX_FILE_SIZE must precede the file input field -->
 <input type="hidden" name="MAX_FILE_SIZE" value="3000000000000" />
 <!-- Name of input element determines name in $_FILES array -->
 Test importing users via file: <br/>
 <input name="files" type="file" />
 <br/>
 <br/>
 <input type="submit" value="Send File" />
 </form>

JS

function postForm(form){
    var jwt = $('#token').val(); //Session Token
    var apiKey= $('#app').val(); // API Key for app
    var url = $('#url').val(); //url for the REST call
    form.action = url+'?session_token='+jwt+"&api_key="+apiKey;
    // the token and api key can be sent as headers (if going for AJAX call)
}

Method 2. JQuery AJAX Based.

TODO

Method 3. Java (okHTTP) based

This method employs the already given Java SDK example for Dreamfactory. I have modified ImageUtils.java, ImageServices.java for my purpose. Original link https://github.com/dreamfactorysoftware/android-sdk

Note: This method was done to use on Java 8 on a PC. The method for Android is different and can be found in the api info.

ImageUtils.java

public void addImageFromLocalFile(String fileServiceName, String imageDir, String imageName, File imgFile, Callback<FileRecord> callBackFileRecord) {

 RequestBody requestBody = new MultipartBody.Builder()
 .setType(MultipartBody.FORM)
 .addFormDataPart("files", "imageName-1.png",
 RequestBody.create(MediaType.parse("image/png"), imgFile))
 .build();
 final ImageService imageService = DreamFactoryAPI.getInstance(App.SESSION_TOKEN).getService(ImageService.class);

 imageService.addLocalImage(fileServiceName, imageDir, imageName, requestBody).enqueue(callBackFileRecord);
 }

ImageService.java

@POST("{file_service_name}/{id}/{name}")
Call<FileRecord> addLocalImage(@Path(value = "file_service_name") 
 String fileServiceName, @Path(value = "id") 
 String contactId, @Path(value = "name") 
 String name, @Body RequestBody file);

1.3 Download to server from URL

This method is quite straightforward and explained in Dreamfactory Wiki;

http://wiki.dreamfactory.com/DreamFactory/Tutorials/Uploading_File#Example_-_Upload_a_JPEG_image_to_a_directory_called_images_using_storage_service_called_.27files.27

2. Create Folder

Creating a directory is similar to creating a file, the format is called FolderRequest. The difference is, you will be calling a directory in the API call. From the looks of it, this trick may work without specifying the exact folder, etc!!!

{
    "resource": [
        {
            "name": "folder2",
            "path": "folder2",
            "type":"folder"
        }
    ]
}

3. Combine Folder and File Creation

Same URL format in the above case, the only difference is you can ask Dreamfactory to create a folder and put files into it. First the file to be created must be specified, then the files to place inside the folder.

{
    "resource": [
        {
            "name": "folder2",
            "path": "folder2",
            "type":"folder"
        },
        {
            "name": "folder2/file1.txt",
            "path": "folder2/file1.txt",
            "type":"file",
            "content_type": "text",
            "content":"gdfgdgdfgf"
        }
    ]
}

Conclusion

The file API is quite nice and create a layer between the filesystem and our applications. You can easily switch to a cloud based storage or a different network drive without the users noticing it.

 

Dreamfactory – API Automation!


I came across Dreamfactory while I and a colleague of mine were searching for a REST API for PHP. In summary, this framework simplified a lot of setting up and development time! Specially this is an open source project and has enterprise support.

The most valuable feature I saw is ability to connect to almost all major database types and automatically generate the REST API calls. On top of that, this system offers role based authentication and a lot of features.

What you can do with Dreamfactory;

  • Connect to a database and get all necessary REST calls
  • User management, role based authentication, application level access control
  • Custom server side scripting with v8JS, PHP, Node.JS, Python
  • Auto generated API documentation with “try now” option (Based on swagger)

Setting up this framework need some practice and experience with the command line, however following the wiki articles will certainly do the job.

Performance : not so much! Depending on the server, the time to process a REST call may take up to a half a second or more.

You can try Dreamfactory on their trial accounts or you can clone the Git repo and set it up on a local machine or a hosted server.

Setting up GCC, CMake, Boost and Opencv on Windows


Background Story

For a project I’ve been working on, the need came to build the program to run on Windows OS. The project was written in c++ and used OpenCV and Boost libraries. For ease of configuration I employed CMake.

Despite the target being Windows, I was developing and testing everything under GNU/Linux 😛 , fortunately I managed to write the code with minimal amount of native unix API calls. For example, file handling was done via Boost Filesystem and so on.

Therefore the only consideration was running the CMake script in windows, using Visual Studio or GCC (via MinGW). First attempt was done with VS 2013, but compilation failed with a bug of VS c++ compiler related to c++ template classes and getting a later version was taking time. So I gave a try for gcc on windows!

Objectives

First I wanted to see if c++ building on windows with relative ease is possible. Next, I wanted to avoid the dependency of Visual Studio for the matter. This might be specially useful if you code for commercial work, but cannot afford to buy the license or simply dislike Visual Studio 😛

Step 1. Install MinGW

I went for MinGW-w64 (http://mingw-w64.org/doku.php) build since its more accepted and supports 64 bit. (Don’t expect citations justifying this 😀 ).

  1. Mingw Builds (http://mingw-w64.org/doku.php/download/mingw-builds) distribution was chosen as I didn’t want to install cygwin or win builds. Its a simple install. The following combination of settings worked for me.
    • Target Architecture – 64 bit (personal choice, depends on target system)
    • Threads – Win32 (Some recommend POSIX over Win32, However openCV build failed with mysterious problems with POSIX threads)
    • Exception – seh (I didn’t do much research here, just kept first available option
  2. Once the installation is complete, navigate to the install location and look for “bin” folder.
  3. Add that location to the PATH variable
    1. Control Panel -> System -> Advanced System settings -> Environment Variables -> system variables -> choose Path and click Edit
    2. Append the path to “bin” folder into the PATH variable. (There are tons of guides of how to do this)
  4. Open a command line (Start -> cmd.exe)
  5. Type the following commands it should show the version and other info
    • gcc -v
    • This is merely used to confirm setting the PATH variable worked.
  6. Once gcc works, navigate to “bin” folder of MinGW install and make a copy of “mingw32-make” and rename it to “make”. (This executable provide “make”) This step is for convenience with CMake 😉

Step 2. Install CMake

  1. Download (https://cmake.org/download/)
  2. Run the installer
    • It’ll ask whether it is okay to include CMake binary location to the PATH variable – Tick yes, preferably system wide.
  3. Open a command line (cmd.exe) and run the following
    • cmake --version
    • Provided everyting works, the output will show the CMake version.

Step 3. Build and Install OpenCV

  1. Download OpenCV from http://opencv.org/downloads.html
    • I built 3.1.0 with default options.
  2. Extract the archive (ie: D:\opencv-3_1_0)
  3. Open CMD and change directory to source folder of opencv. From this step, all commands would be executed through CMD unless otherwise noted.
    • D:
    • cd D:\opencv-3_1_0\source
    • First command is not needed if OpenCV resides in C: for other partitions enter the partition name to change to that, then use the cd command. (I find this inconvenient 😛 )
  4. Run CMake. The main change is additional “MinGW Makefiles” parameter. Other than that, this step is pretty much going with standard OpenCV Documentation. Add necessary arguments fitting the needs!.
    • cmake -G "MinGW Makefiles" [other arguments] ../build
    • “../build” at end pointed “build” directory as destination of MakeFile
  5. Once Cmake configures successfully, change to the build directory and execute make.
    • cd ../build
    • make -j5
    • I used -j5 as my computer have 4 logical processors so 5 threads is well enough to fully load it! If the computer have 8 cores, use -j8 or -j9
    • Use the multithreaded compile option (-j5) with caution, some laptops tend to go into thermal shutdown with maximum load!!
  6. If the build complete successfully, next run make install
    • make install
    • This step will finalize the install.

Step 3. Build and Install Boost

  1. Download and extract boost archieve from (http://www.boost.org/users/download/)
  2. Make sure to download the source package!
  3. Extract the archive, enter the directory from CMD.exe (example below)
    • cd "D:\Program Files\boost\"
  4. Run the following commands
    • bootstrap.bat gcc
    • b2 --build-dir=build cxxflags="-std=c++11" -j5 --with-filesystem --with-system define=BOOST_SYSTEM_NO_DEPRECATED toolset=gcc stage
    • Note the use of “toolset=gcc”, “-j5” options. These are self explanatory!
    •  “–with-<library_name>” flag  is used to explicitly include the necessary ones only. If you choose to build all libraries, then don’t specify this at all.
  5. If everything works fine, then Boost Build is complete!. Navigate to “stage” folder and go through the inner folders, there would be the compiled DLL files. (ie: libboost_system_xxx_mingw_xx.dll)

Step 4. Setting up CMake Script to work with windows

  1. Usage
    • cmake -G "MinGW Makefiles" .
  2. I mashed up OpenCV and Boost CMake example scripts and some of my experiments to come up with the following CMake script.
  3. This script is a bare-bone CMake script, it has to be modified to suite your own project. I’m not an expert in CMake, so there would be room for improvement.
  4. I’ve tested the script with following configurations for the same exact project.
    • Kubuntu (16.04 LTS) with CMake 3.2.2, OpenCV 3.1.0, Boost 1.58
    • Windows 7 Professional, with CMake 3.6.0-rc_2, OpenCV 3.1.0, Boost 1.61
  5. The script is an updated version featured in my previous post on Boost, OpenCV, CUDA and CMake on Linux (https://tuxbotix.net/2016/03/18/nvidia-optimus-bumblebee-and-cuda-on-kbuntu-15-10/)
cmake_minimum_required(VERSION 3.2 FATAL_ERROR)

set(execFiles test.cpp)

if(WIN32)
 set(OpenCV_DIR "D:\opencv-3-10\build") #Change this
endif()

set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake/Modules/")

if(CMAKE_COMPILER_IS_GNUCXX)
 set (CMAKE_CXX_FLAGS "-std=c++11 -lm")
 if(WIN32)
 set(BOOST_ROOT "F:\Program Files\boost_1_61_0") #change this, critical!
 endif()
endif()

find_package( OpenCV REQUIRED )
FIND_PACKAGE( Boost 1.58 COMPONENTS filesystem system REQUIRED )
if(Boost_FOUND)
 include_directories(${Boost_INCLUDE_DIRS})
endif()
message(boost ${BOOST_INCLUDEDIR})

LIST(APPEND CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}")
set(CMAKE_BUILD_TYPE Release)

add_executable (testApp ${execFiles})

target_link_libraries(testApp ${OpenCV_LIBS} ${Boost_LIBRARIES})

Possible Issues, Observations

I came across a few issues while setting up GNU Toolchain on Windows as well as configuring Boost, Opencv.

  1. CMake complains “CMake was unable to find a build program corresponding to “MinGW Makefiles”. CMAKE_MAKE_PROGRAM is not set.”
    • Cause : Seems to be CMake not recognizing “mingw32-make” as the make program despite CMake documentation saying it works! (https://cmake.org/cmake/help/v3.6/generator/MinGW%20Makefiles.html)
    • Workaround is making a copy of mingw32-make and rename it as “make”
    • The workaround may clash with existing “make” executable in the PATH. So take care!!
  2. When configuring Boost, “c1 is not recognized as an internal command”
    • Cause : Not specifying toolchain when executing bootstrap.bat and b2.exe
  3. Windows shows errors “libboost_system_xx_mingw_xx.dll” is not installed or “libopencv_imgproc310.dll is not installed” or similar errors
    • Cause : Windows cannot locate the DLL files.
    • Simplest fix is just copying the necessary DLL files and package them when distributing.
    • Warning : Always check legal matters (license agreement) before packing libraries that are not owned by you. Even if the libraries are open source, the license type may restrict distribution in binary format like this.

Nvidia Optimus, Bumblebee and CUDA on Kbuntu 15.10


I decided to write this post after experiencing a chain of weird events with setting up CUDA with ubuntu (Kubuntu,etc)!

I’ve used CUDA on OpenCV with Archlinux in 2014-2015 and it wasn’t too hard to get to work. But the story with Ubuntu is completely different 😛

First path is nvidia developer repo. That have own perils but you get latest CUDA version. (7.5). Second path is Ubuntu provided way which is much safer but not the latest (6.5).

Option 1 CUDA through nvidia repo. ( nvidia proprietary drivers + nvidia-primus)

Follow the guide at http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/#axzz43DrWGyFP

They’ll replace ubuntu driver with their version (352.79 instead of 352.68 in my case)

Warning 1 :Do not reboot right now, otherwise you may come to a black screen and will have to boot from a live cd and chroot !! From my experience you have to run gpu-manager manually (explained below)

Warning 2: nvidia-prime will set nvidia chipset as the default** This doesn’t work very well and consume power (caused kde to crash with multiple monitors, strange font sizes, etc).

  • Use nvidia-settings and set intel as the primary chipset.
  • Login from the command line (alt + ctrl +f1);
  • Run the following commands, first will stop the DM (sddm for kubuntu, kde dropped KDM since kde 5) second will run gpu-manager that’ll go through the configuration.
  1. sudo systemctl stop sddm
  2. sudo gpu-manager
  • Observe the output of gpu-manager. Now you can reboot and see the results.
  • Make sure everything works fine (multiple monitors, etc).

Option 2 CUDA through Ubuntu Repo (nvidia proprietary + bumblebee or nvidia-primus)

Install nvidia drivers. Easiest path is using the “driver manager” software of ubuntu/ kubuntu. In kubuntu its accessible in System settings.

Note "Driver Management" icon in hardware section.
Note “Driver Management” icon in hardware section.
Choose the nvidia proprietary driver. (352 recommended)
Choose the nvidia proprietary driver. (352 recommended)

Next install the following packages: nvidia-cuda nvidia-cuda-toolkit

sudo apt-get install nvidia-cuda-dev nvidia-cuda-toolkit

Now run “nvcc -V” and see if the compiler runs.

Getting CUDA to work with cmake and gcc

I prefer to use cmake script for OpenCV projects so I’ll explain that method. Other options are easily found on internet.

If you observe closely, the compatibility matrix shown at nvidia website, maximum supported gcc version at this time is 4.9 with CUDA 7.5 Now the issue is Ubuntu 15.10 have gcc 5+.

So the first fix had to be install gcc 4.9 and point nvcc to gcc 4.9. In cMake scripts, the following declarations worked for me. In addition I had to specify some more info. Some people suggest editing nvcc.profile but I didn’t bother, I was already using cmake for the opencv projects!

set(CUDA_TOOLKIT_ROOT_DIR "/usr/local/cuda")
set(CUDA_HOST_COMPILER "/usr/bin/gcc-4.9")
set(CUDA_CUDART_LIBRARY "${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcudart.so")
find_package(CUDA 7.5)

The first two lines found via opencv cmake config and the third had to be added after cmake complained “missing CUDA_CUDART_LIBRARY”.

Now everything should work fine;

If you get “invalid device ordinal” when running CUDA apps, the reason is the driver seems not to load properly on resuming from sleep. Dmesg will show this as “gpu falling off the bus”. Currently I couldn’t find a fix for the matter, I guess editing the config of nvidia-primus or bumblebee may help.

CUDA devicequery

Some history without dates 😛

Initially the only solution for graphics switching and keeping the nvidia chipset from overheating or acting strange required some hacking of ACPI calls. Then this project called Bumblebee emerged (http://bumblebee-project.org/). Thanks for the bumblebee daemon, it was possible to have proper power management. Some time later, they released a kernel module called “bbswitch”. This module make life even easier by automatically enabling power management.

Since the beginning I went with bumblebee + nvidia propriatery drivers, so as usual I went that way with ubuntu.

By now, Nvidia has released their version of optimus switching for Linux, the package is known as nvidia-prime. In past, the only alternative was bumblebee. I’m yet to see which works better, but I’m not much concerned as the only use of nvidia in linux is for CUDA based stuff. On windows however, it sees enough action ;).

 

My Github repository


I started hosting a few projects a libraries on github, My repo. is located at https://github.com/tuxbotix.

Why github?

In the recent years, GitHub has become the de-facto place to host open source projects. The success has been proven by the number of Sourceforge projects being moved to Github and the closure of google code (and their suggestion to move for github).

Git has been there for a long time, in fact it was there since the days linux kernel was born and git was invented by Linus Torvolds and he used git to manage the source of linux kernel.

However I believe that ease of use and the clean interface of github made many people to use it rather than sourceforge. I have a sourceforge account too, but I almost never used it and even to date, browsing through the source, downloading, etc is somewhat inconvenient and there were instances that the advertisements mislead people!.

Converting Keyboard to MIDI with a microcontroller


I had some curiosity and interest with MIDI devices for some years and this small project came up after my Yamaha Portasound PSS-190 had a couple of burnt traces and the synthesizer IC was gone.

This particular instrument isn’t really high end, so there were no velocity sensing (velocity matters, try pressing a key on a piano slower and faster). And the keys were wired to form a matrix – thus known as matrix keyboards. This means each wont connect directly to the microprocessor/ synthesizer IC but the wiring arrangement is like a matrix/ table.

sch_keyboard
A matrix keypad schematics. Same story with the synthesizer keyboard!. Image source : http://www.hellspark.com/dm/ebench/sx/chm/topics/ex_4x4_keypad.htm

This means you only need 8 wires to access 16 keys. In case of the Yamaha keyboard I tried to fix, it had 7 + 6 wires and served 30+ keys. However the drawback is, you cannot all the keys at the same time (real time – I’m talking microseconds), the reason is that these needs “scanning”, enabling one row and read the values and so on. But this can be implemented to be fast enough for a human. For example, the standard PC keyboard is almost always a matrix keyboard with the internals scanning for key presses at least hundred times per second.

Back to the topic, since I couldn’t source the original synth IC, I decided to build a MIDI synth and install inside the keyboard! I had a STM32F4 Discovery board with me, so i went for the “Goom” (http://www.quinapalus.com/goom.html) ported to MidiBox (http://ucapps.de). Midibox is a platform to build various types of MIDI instruments. Will discuss about this in a later post.

With the synth running, what I needed was to get MIDI signals upon key presses of the keyboard. Therefore I did some googling, found out that MIDI is pretty much serial communication at 31250 baud rate. So to test, I used Arduino Mega.

// Pin Definitions
// Rows are connected to

const byte Mask = 255;
double oldtime;
uint8_t keyToMidiMap[32];

boolean keyPressed[50];
int command = 0x90;
int noteVelocity = 60;

//#define DEBUG


// use prepared bit vectors instead of shifting bit left everytime
byte bits[] = { 
 B00000001, B00000010, B00000100, B00001000, B00010000, B00100000, B01000000, B10000000 };
byte colVals[] = {
 255,255, 255, 255, 255, 255, 255, 255 };
byte bits1[] = { 
 B11111110, B11111101, B11111011, B11110111, B11101111, B11011111, B10111111, B01111111 };

void scanColumn(int value) {
 PORTA=value;
}

void setup() {

 DDRA=B11111111;//output
 DDRC=B00000000;//input
 // Enable the pullups
 PORTC = PORTC | Mask;

 for(int i=0; i<50;i++){
 keyPressed[i]=false;
 }

 Serial2.begin(31250);
 Serial.begin(115200);
 delay(500);

 //noteOn(176,124,0);
 for (int note = 0x1E; note < 0x5A; note ++) {
 //Note on channel 1 (0x90), some note value (note), middle velocity (0x45):
 noteOn(0x90, note, 0x45);
 delay(100);
 //Note on channel 1 (0x90), some note value (note), silent velocity (0x00):
 noteOn(0x90, note, 0x00);
 delay(100);
 }
 delay(100); 
 // noteOn(176,124,0);
}

void loop() {


 for (int col = 0; col < 7; col++) {

 // shift scan matrix to following column
 scanColumn(bits1[col]); //enable all except one.
 delayMicroseconds(3);

 byte rowVal1 = PINC & Mask;
 byte rowVal= ~rowVal1;//inverted rowVal => key press = 1 
 if(colVals[col] == rowVal1){
 continue;
 }
 else{ 
 colVals[col] = rowVal1;
 }

 for (int row = 0; row < 6; row++) {
 if(col==0 &&row>0){
 break;
 }
 int index =row + ((int)col *6) ;
 int note= index + 48;

 byte k =(bits[row] & rowVal);
 if(k>0 && keyPressed[index]==false){ //and op. on each bit of rowval and determine note press.
 keyPressed[index]=true;
 noteOn(command,note,noteVelocity);
 }
 if(k==0 && keyPressed[index]==true){
 keyPressed[index]=false;
 noteOn(command,note,0);
 }

 }

 }
}

void noteOn(int cmd, int pitch, int velocity) {
 Serial2.write(cmd);
 Serial2.write(pitch);
 Serial2.write(velocity);
 /**
 * DEBUG stuff
 */
 /*
 Serial.print("Note: ");
 Serial.print(pitch,DEC);
 Serial.print(" Velocity :");
 Serial.print(velocity,DEC);
 Serial.println();
 */
}

First I setup the basics, enable internal pullups, then set the output port (PORT A) to a given arrangement – one pin turned OFF, others turned ON. The reason to do this than other way around is due to the usage of pullups instead of pull down resistors.

Then I read the input at PORT C. now this is where the rows are connected, so if a key is pressed, the corresponding pin would go LOW. For ease of processing I inverted this reading and I also keep track of “change of state” which means the code will proceed if an only if the previous state was changed.

Then depending whether it was a press down or releasing a key, the appropriate MIDI command is sent. – 0x91 means channel 1, note ON. Pitch is mapped as “48” = C3. (refer https://newt.phys.unsw.edu.au/jw/notes.html for detailed mapping information).

With the code tested, all that remains is to wire it up to the keyboard and test!

Stellaris Launchpad – Starting with ARM Microcontrollers


Last year I ordered a Stellaris Launchpad Evaluation Board from Texas Instruments for $12.99. It arrived through FedEx in 3-4 days (free shipping!).

Update 1: They have changed the brand names from Stellaris to Tiva. 

Update 2: Now they offer an updated variant with chip number “TM4C123G”. This version got built in PWM modules and some other differences. But the old stellaris code can be uploaded directly.Follow the migration document: http://www.ti.com/lit/an/spma050a/spma050a.pdf

 

stella

Specifications (LM4F120 based Launchpad – Original version) :

Microcontroller Architecture : Arm Cortex M4

Maximum clock speed: 80MHz

RAM : 32kB

PWM pins: 16 (Using timer interrupts instead of dedicated hardware PWM modules)

GPIO pins on the microcontroller: 43 (including PWM; All GPIO are not accessible in the launchpad board)

SSI/ SPI Ports : 4

I2C Ports: 4

UART Ports: 8

Overview

Although the number of GPIO and other available pheripharels looks impressive, it should be noted that pinmuxing is used. In simple terms, same pin can be configured to use one of the available peripheral, therefore in practical terms you cannot use all SSI/SPI, I2C, UART, PWM ports at the same time.

To reduce this pin-multiplexing (pinmux) confusion, the “PinMux utility” by Texas instruments can be used to configure the pin usage. The program will generate the necessary code to use in the projects. –

Word Of Caution: Note the copyright notice in the generated code, I think the best idea is to use the program to get an idea on pin config, but not using the C files directly in the code to reduce copyright troubles in case you are worried of the legal wording!

Similar to Arduino “Shields” there are “Booster packs” that can be plugged to the launchpad. Or you can design your own boosterpack like we did.

stella-boosterpack

Setting up toolchain

Several options are offered for development, from Texas Instrument owned “Code Sourcery” to Arm’s Keil or GNU C compiler.

I went for GNU C compiler based path. The ide used was eclipse, however other IDE’s can be used without a problem. The below link explain the method in detail.

In essence, the configuring is into 3 steps,

  • Installing GNU Arm C compiler (and other stuff = toolchain)
  • Installing the flashing utility, setting up UDEV rules, etc
  • Setting up a project (template) on eclipse with the needed settings

http://kernelhacks.blogspot.com/2012/11/the-complete-tutorial-for-stellaris.html

For Tiva C setup, follow the below link. Referring the above page is highly recommended**

http://www.scienceprog.com/setting-up-tiva-c-launchpad-project-template-with-sourcery-codebench-and-eclipse/

The language complications

The Stellarisware/Tivaware library set gives high level functions to access and control the peripharels and others without referring the registers directly (these functions do it for you). This helps programmers who are used for encapsulated high level programming to start working on the device instead of bothering which register is used to do something. But knowing how the low level works is highly recommended to proceed in path of Arm or other embedded architectures.

For me, working with this microcontroller through C helped to understand the somewhat confusing concept of “Pointers”. Also I practically used bit shifting and binary operations.

Interrupts

The highlight of this architecture can be easily marked as the interrupts. The NVIC (Nested Vectored Interrupt Controller) enables the programmer to define the priority of different interrupts. For example, updating a display panel or responding to a polling query is low priority than dealing with an encoder.

Also the large amount of available interrupts is quite useful on real time work since running everything on the while loop is not only inefficient, it cannot guarantee constant time between execution of each cycle of the loop.

So do not use the infinite while loop for Control loops, instead use timed interrupts OR measure the time duration- this way is messy. I used these interrupts extensively in one of the major projects with the launchpad. That used UART, GPIO, systick and PWM timer interupts.

Conclusion

Despite the lesser documentation and libraries found for this architecture  (that use Stellarisware/tivaware) than competitors like ST Microelectronics, etc development board. This evaluation board is cheap (13 US Dollars) and decently fast which is ideal for newcomers for Arm and for hobbyists!

Also you can try the easier way by using “Energia IDE” (based on Arduino project). It looks and works like arduino IDE!!

References