Automatic Dog Door (Part 2c) – Software Problems

This post is part of a series looking at the challenges faced when taking my initial idea and prototype I described in Part 1, though to a working implementation. This post focuses on the software problems encountered.

This is the first serious project I’ve undertaken with the Particle Photon, as a result, there has been a lot of research and learning combined with a large part of trial and error.

Initial development began using Particle Desktop IDE, an Atom-based environment, but soon switched across the newer VS Code based Particle Workbench. Both provide local and cloud-based compilation and deployment to Particle devices along with a host of key development tools. Key amongst which is the integrated serial monitor.

The source code for this project is hosted and shared on Github.com: https://github.com/shortbloke/DogDoor

Problem 1: Debugging

Debugging the electrical side of the project often involves measuring voltages with a multi-meter or examining signals with an oscilloscope. The Particle Debugger (which I don’t have, but definitely would get for any future projects) provides more advanced debugging including breakpoints. Without this debugging software is mainly through Serial.print() statements, viewable on a connected PC.

Some additional monitoring was possible by using the Particle Cloud to monitor published variables and messages. This doesn’t work where there is a fast rate of change.

Red SOS flash of death

When things go badly wrong, the Particle Photon will flash a red SOS from the onboard LED. There can be a number of different causes, the most common one I hit was a stack overflow. In order to aid my troubleshooting on each startup of the device, I read the reason for the last reset and log it to help with troubleshooting.

    resetReason = (int) System.resetReason();
    resetReasonData = (int) System.resetReasonData();

Stack overflow

One of the most common and problematic issues I hit during the development of the software was SOS signal followed by 13 further blinks, indicating a stack overflow. As by the time the system resets and recovers the cause of the problem is no longer apparent.

I eventually narrowed the problem down to debug serial messages. If I disabled all the logging the system was stable, now that’s a problem since without additional debugging hardware the only debug method I had was serial messages.

For serial logging, I was using a function that would fire every second, to output various internal variable states. This was to throttle the message rate to something manageable and avoid needing to log every small change.

In addition to the timer for logging messages, I had interrupt handler that would be used for monitoring the state of the inputs to the system (sensors, limit switches and manually activated switches. On their state change, they would also log to the serial port.

Resolution: Reading further of how interrupts are handled on the Photon (and other Arduino like systems) indicated that they needed to be as fast as possible and avoid blocking calls. Calling out to the relatively slow serial port was a problem. This was compounded as software timers share many of the same properties as the interrupts.

In order to avoid logging within the interrupt service routine (ISR) or software timer callback, it was necessary instead to use the ISR to update variables and set a global flag which could be used as an indicator to run some function from the main loop.

Problem 2: A need for watchdogs!

Reliability of the system is key to general acceptance by all members of the family. To that end, it is necessary to think about how to detect the system isn’t running correctly. During the development phase, there were times when the system was running fine and then after some unknown period of time would stop operating.

Resolution: Watchdog timers provide a means of ensuring the system is running. The Photon provides a built-in Application Watchdog that is automatically reset at the end of each loop, so that if the process gets stuck for some reason the main loop is no longer running the system will automatically reset. With the fast startup time of the Photon the system is back up and running within seconds of reset being called.

In addition to the Application Watchdog, an additional form of watchdog was added.

void keepOpenTimerCallback() {
    Log.trace("keepOpenTimerCallback - Timer expired");
    if (overriddenDesiredDoorState == STATE_OPEN) {
        // Clear the override on timeout.
        overrideDoorState = false;
    }
    desiredDoorState = getDesiredDoorState();
    if (currentDoorState != STATE_OPEN) {
        keepOpenTimer.changePeriodFromISR(keepOpenTime);
        keepOpenTimerResetCountWaitingToOpen++;
        if (keepOpenTimerResetCountWaitingToOpen > keepOpenTimerMaxResetsWaitingToOpen) {
            runNonCriticalTasksNow = true;
            systemResetRequested = true;
        }
    }
}

Here where the door takes too long to reach the STATE_OPEN state, a global flag systemResetRequested is set, which is acted upon in the main loop to call System.reset();

Problem 3: Operation Speed

The design of this software requires that the main loop runs quickly, as when the door needs to open or close, a call is made to move the stepper a single step. Delays before calling stepper.run(); limit the maximum speed and smoothness of the operation of the stepper motors.

Whilst the initial implementation worked, it thought it should be able to run more quickly. Adjusting the max speed and acceleration of the AccelStepper library didn’t lead to any improvements.

Resolution: There were quite a few changes made to optimise performance of all the code which ultimately impacted the rate at which stepper.run(); was being called. The starting point was to measure the current performance and understand where the bottlenecks are.

Performance profiling

In order to measure the real performance, additional profiling code was added to each function. Initially starting with the main loop and then in each and every function called from the main loop, then examining each of the ISR in order to optimise everything. The outline code for profiling is very simple:

// Global Variable
uint32_t duration = 0;

void loop() {
    uint32_t startTimeTicks;
    startTimeTicks = System.ticks();
    //
    // DO SOMETHING
    //
    uint32_t endTimeTicks = System.ticks();
    duration = (endTimeTicks-startTimeTicks)/System.ticksPerMicrosecond();
}

The duration is exposed to Particle.io cloud via: Particle.variable("LoopDuration", duration); It was also possible to log the value periodically to the serial port, which aided the initial timing.

Note: Profiling code for the main loop and for the time taken to open the door remains in the final code and can be activated through the particle cloud web console if needed.

Optimisations from profiling

Constantly reading IO is slow, be it driven by an interrupt on change or on-demand. The simple switches are ok as they tend to hold a steady-state due to pull-up resistors. But the IR sensors were triggering change constantly, which would lead to the value being read and evaluated against the threshold. This was addressed by changing sensors to be polled every 250ms, quick enough to detect presence and at a reduced frequency to minimise the impact of the main loop.

The function calls out of the main loop were also optimised to use flags set by the ISR to determine if extra functions were necessary to be called. For example if the current door state is the same as the desired state, then there is a lot more background work we can afford to do as the motor doesn’t need to move.

A review of the logic of a number of functions showed how code could be eliminated, simplified or only called when specifically needed i.e. first time through the function.

Connectivity is expensive

Once the system was up and running and seemingly working well, I wanted to expose the state of the door to my Home Assistant based home automation system. The obvious method for this was the lightweight MQTT protocol. The Particle library includes a supported MQTT library so it should be straight forward. The simple example given is:

#include "application.h"
#include "MQTT.h"

void callback(char* topic, byte* payload, unsigned int length);
MQTT client("iot.eclipse.org", 1883, callback);

// recieve message
void callback(char* topic, byte* payload, unsigned int length) {
    char p[length + 1];
    memcpy(p, payload, length);
    p[length] = NULL;

    if (!strcmp(p, "RED"))
        RGB.color(255, 0, 0);
    else if (!strcmp(p, "GREEN"))
        RGB.color(0, 255, 0);
    else if (!strcmp(p, "BLUE"))
        RGB.color(0, 0, 255);
    else
        RGB.color(255, 255, 255);
    delay(1000);
}


void setup() {
    RGB.control(true);

    // connect to the server(unique id by Time.now())
    client.connect("sparkclient_" + String(Time.now()));

    // publish/subscribe
    if (client.isConnected()) {
        client.publish("outTopic/message","hello world");
        client.subscribe("inTopic/message");
    }
}

void loop() {
    if (client.isConnected())
        client.loop();
}

This looks straight forward enough, I needed to add a call to connect to MQTT if not already connected and then call the MQTT client.loop() call within the main loop.

if (client.isConnected())
    client.loop();

What transpired was that calling connect is slow and blocking and the call to client.loop() when it checks for incoming messages is incredibly slow.

Adding this small amount code, led to the door being very slow to open and not at all smooth. At this stage, I wasn’t sure it would be possible to use MQTT in a system which required maximum speed within the main loop.

Resolution: Having already added profiling code I could measure the impact of the change and seek to find ways to mitigate it. This was achieved by grouping MQTT and Particle cloud interactions into a runNonCriticalTasks() function, which is called regularly unless the door is actively moving. At which point we prioritise the stepper movement above all messaging.

Project Series

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.