Elevator crashes when elevatorserver dies #36

Closed
opened 2026-03-08 14:13:20 +00:00 by sebgab · 0 comments
Owner

Bug description

The elevator program crashes then the elevator server dies.

Justification

From the project specification:

Failure states are anything that prevents the elevator from communicating with other elevators or servicing calls
– This includes losing network connection entirely, software that crashes, doors that won’t close, and losing power - both to the elevator motor and the machine that controls the elevator

The key bit here is that we should tolerate software that crashes, it is slightly ambiguous if this means we should tolerate our program crashing, or if that also includes external programs.
Due to this we should assume the worst case, that we need to handle external programs crashing.

Crash log

thread '<unnamed>' (198891) panicked at /home/sebgab/.cargo/git/checkouts/driver-rust-0f0c9d8ec750caeb/6c287b7/src/elevio/elev.rs:63:30:
called `Result::unwrap()` on an `Err` value: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

thread '<unnamed>' (198892) panicked at /home/sebgab/.cargo/git/checkouts/driver-rust-0f0c9d8ec750caeb/6c287b7/src/elevio/elev.rs:70:43:
called `Result::unwrap()` on an `Err` value: PoisonError { .. }

thread 'tokio-runtime-worker' (198886) panicked at src/system_state_controller/button_handler.rs:90:10:
Failed to recieve call button notification: RecvError

thread '<unnamed>' (198894) panicked at /home/sebgab/.cargo/git/checkouts/driver-rust-0f0c9d8ec750caeb/6c287b7/src/elevio/elev.rs:90:43:
called `Result::unwrap()` on an `Err` value: PoisonError { .. }

thread 'tokio-runtime-worker' (198888) panicked at src/system_state_controller/floor_handler.rs:12:46:
called `Result::unwrap()` on an `Err` value: RecvError

thread '<unnamed>' (198893) panicked at /home/sebgab/.cargo/git/checkouts/driver-rust-0f0c9d8ec750caeb/6c287b7/src/elevio/elev.rs:82:43:
called `Result::unwrap()` on an `Err` value: PoisonError { .. }

thread 'tokio-runtime-worker' (198887) panicked at src/system_state_controller/obstruction_handler.rs:18:53:
called `Result::unwrap()` on an `Err` value: RecvError

thread 'tokio-runtime-worker' (198882) panicked at src/stop_button_handler.rs:24:14:
Failed to get stop_button_value: RecvError
 ERROR elevator                                           > Button handle returned

thread 'main' (198855) panicked at src/main.rs:156:13:
Button handle returned

Potential fix

The program seems to crash with a PoisonError as the cbc channels from the driver no longer exist.

As such, we need to handle this case, I believe this can be done by moving all tasks dependent on the external driver into a separate new task, which, rather than panicking when the tasks return kills all it's tasks, tries to re-establish connection the the elevator, then re-spawns the tasks on success.

# Bug description The elevator program crashes then the elevator server dies. ## Justification From the project specification: > Failure states are anything that prevents the elevator from communicating with other elevators or servicing calls > – This includes losing network connection entirely, software that crashes, doors that won’t close, and losing power - both to the elevator motor and the machine that controls the elevator The key bit here is that we should tolerate `software that crashes`, it is slightly ambiguous if this means we should tolerate _our_ program crashing, or if that also includes external programs. Due to this we should assume the worst case, that we need to handle external programs crashing. # Crash log ``` thread '<unnamed>' (198891) panicked at /home/sebgab/.cargo/git/checkouts/driver-rust-0f0c9d8ec750caeb/6c287b7/src/elevio/elev.rs:63:30: called `Result::unwrap()` on an `Err` value: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" } note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace thread '<unnamed>' (198892) panicked at /home/sebgab/.cargo/git/checkouts/driver-rust-0f0c9d8ec750caeb/6c287b7/src/elevio/elev.rs:70:43: called `Result::unwrap()` on an `Err` value: PoisonError { .. } thread 'tokio-runtime-worker' (198886) panicked at src/system_state_controller/button_handler.rs:90:10: Failed to recieve call button notification: RecvError thread '<unnamed>' (198894) panicked at /home/sebgab/.cargo/git/checkouts/driver-rust-0f0c9d8ec750caeb/6c287b7/src/elevio/elev.rs:90:43: called `Result::unwrap()` on an `Err` value: PoisonError { .. } thread 'tokio-runtime-worker' (198888) panicked at src/system_state_controller/floor_handler.rs:12:46: called `Result::unwrap()` on an `Err` value: RecvError thread '<unnamed>' (198893) panicked at /home/sebgab/.cargo/git/checkouts/driver-rust-0f0c9d8ec750caeb/6c287b7/src/elevio/elev.rs:82:43: called `Result::unwrap()` on an `Err` value: PoisonError { .. } thread 'tokio-runtime-worker' (198887) panicked at src/system_state_controller/obstruction_handler.rs:18:53: called `Result::unwrap()` on an `Err` value: RecvError thread 'tokio-runtime-worker' (198882) panicked at src/stop_button_handler.rs:24:14: Failed to get stop_button_value: RecvError ERROR elevator > Button handle returned thread 'main' (198855) panicked at src/main.rs:156:13: Button handle returned ``` # Potential fix The program seems to crash with a `PoisonError` as the `cbc` channels from the driver no longer exist. As such, we need to handle this case, I believe this can be done by moving all tasks dependent on the external driver into a separate new task, which, rather than panicking when the tasks return kills all it's tasks, tries to re-establish connection the the elevator, then re-spawns the tasks on success.
sebgab self-assigned this 2026-03-08 14:15:35 +00:00
sebgab reopened this issue 2026-03-09 09:36:43 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
TTK4145/elevator#36
No description provided.