Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: detect an unresponsive stick and reset it #6244

Merged
merged 4 commits into from
Sep 6, 2023

Conversation

AlCalzone
Copy link
Member

@AlCalzone AlCalzone commented Sep 5, 2023

This PR makes use of the previously-added Unresponsive controller status by adding a detection for an unresponsive controller (not ACKing serial API commands). When this situation happens, we soft-reset the controller in an attempt to recover. If that does not help either, the driver instance gets destroyed.
This should trigger a restart in most applications, which effectively closes and re-opens the serial port.

fixes: #2723

Verified

This commit was signed with the committer’s verified signature.
renaudhartert-db Renaud Hartert

Verified

This commit was signed with the committer’s verified signature.
renaudhartert-db Renaud Hartert

Verified

This commit was signed with the committer’s verified signature.
renaudhartert-db Renaud Hartert

Verified

This commit was signed with the committer’s verified signature.
@AlCalzone AlCalzone marked this pull request as ready for review September 5, 2023 12:05
@AlCalzone
Copy link
Member Author

@zwave-js-bot automerge

@zwave-js-bot zwave-js-bot merged commit 7471dd8 into master Sep 6, 2023
@zwave-js-bot zwave-js-bot deleted the unresponsive-stick branch September 6, 2023 07:16
AlCalzone added a commit that referenced this pull request Sep 6, 2023

Verified

This commit was signed with the committer’s verified signature.
### Breaking changes · [Migration guide](https://zwave-js.github.io/node-zwave-js/#/getting-started/migrating-to-v12)
* Remove support for Node.js 14 and 16 (#6245)
* Subpath exports are now exposed using the `exports` field in `package.json` instead of `typesVersions` (#5839)
* The `"notification"` event now includes a reference to the endpoint that sent the notification (#6083)
* Keep separate Supervision session ID counters for each node (#6175)
* Validate the device fingerprint before installing firmware update instead of when checking for updates (#6192)
* Removed some deprecated methods (#6250)
* Managing SUC routes with the non-SUC method variants is no longer allowed (#6251)
* "Heal (network)" was renamed to "rebuild routes" to better reflect what it does (#6252)

### Features
* Detect an unresponsive stick and reset it (#6244)
AlCalzone added a commit that referenced this pull request Sep 26, 2023

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
### Breaking changes · [Migration guide](https://zwave-js.github.io/node-zwave-js/#/getting-started/migrating-to-v12)
* Remove support for Node.js 14 and 16 (#6245)
* Subpath exports are now exposed using the `exports` field in `package.json` instead of `typesVersions` (#5839)
* The `"notification"` event now includes a reference to the endpoint that sent the notification (#6083)
* Keep separate Supervision session ID counters for each node (#6175)
* Validate the device fingerprint before installing firmware update instead of when checking for updates (#6192)
* Removed some deprecated methods (#6250)
* Managing SUC routes with the non-SUC method variants is no longer allowed (#6251)
* "Heal (network)" was renamed to "rebuild routes" to better reflect what it does (#6252)
* Corrected the argument type for `Driver.constructor`, `updateLogConfig` and `updateOptions` (#6254, #6319)

### Features
* Detect an unresponsive stick and reset it (#6244)
* The default time after which battery-powered devices with no pending commands are sent back to sleep is now `250 ms` (down from `1000ms`). This timeout is now configurable using the driver option `timeouts.sendToSleep`. This should result in significant battery savings for devices that frequently wake up. (#6312)

### Bugfixes
* A bug in the `7.19.x` SDK has surfaced where the controller gets stuck in the middle of a transmission. Previously this would go unnoticed because the failed commands would cause the nodes to be marked dead until the controller finally recovered. Since `v11.12.0` however, Z-Wave JS would consider the controller jammed and retry the last command indefinitely. This situation is now detected and Z-Wave JS attempts to recover by soft-resetting the controller when this happens. (#6296)
* Removed auto-disabling of soft-reset capability (#6256)
* Default to RF protection state `Unprotected` if not given for `Protection CC` V2+ (#6257)

### Config file changes
* Add Heatit Z-Water 2 (#6299)
* Add Shelly Wave 1PM (#6280, #6317)
* Add Heatit Z-TRM6 (#6263)
* Increase poll delay for ZW500D (#6270)
* Add fingerprint for Simon IO Master Roller Blind (#6262)
* Add HOPPE eHandle ConnectSense (#6269)
* Add parameters to Zooz ZEN17 from firmware 1.30 (#6189)
* Update Zooz ZEN32 config to the latest firmware, include 800 series (#6283)

### Changes under the hood
* Fixed the interpretation of `limit_options` in OpenSmartHouse import script (#6313)
* Some Z-Wave JS specific implementation checks are now done using a custom ESLint plugin (#6276, #6279, #6315)
* Migrated more Z-Wave JS specific checks to the custom ESLint plugin (#6297, #6302)
* Use ESLint to enforce consistent property ordering in config parameters and avoid unnecessary `minValue/maxValue` (#6321, #6322)
* `yarn test` now only runs tests affected by changed files by default. This is also done on CI in PRs to speed up check times (#6274)
* Upgraded lots of dependencies (#6258)
@sashalevin
Copy link

Hello,
It seems that with this change, I've started seeing the following failure:

06:41:46.117 DRIVER   Controller is still timing out. Restarting the driver...
Error in driver ZWaveError: Controller is still timing out. Restarting the driver... (ZW0100)
    at Driver.destroyWithMessage (/home/homeassistant/zwave-js-server/node_modules/zwave-js/src/lib/driver/Driver.ts:2769:17)
    at fail (/home/homeassistant/zwave-js-server/node_modules/zwave-js/src/lib/driver/Driver.ts:3484:14)
    at Driver.handleUnresponsiveController (/home/homeassistant/zwave-js-server/node_modules/zwave-js/src/lib/driver/Driver.ts:3493:4)
    at Driver.handleFailedTransaction (/home/homeassistant/zwave-js-server/node_modules/zwave-js/src/lib/driver/Driver.ts:5521:13)
    at Driver.drainTransactionQueue (/home/homeassistant/zwave-js-server/node_modules/zwave-js/src/lib/driver/Driver.ts:4613:10) {
  code: 100,
  context: undefined,
  transactionSource: undefined
}
Shutting down
Closing server...
06:41:46.142 DRIVER   destroying driver instance...
06:41:46.154 CNTRLR   [Node 041] Interview attempt 1/5 failed, retrying in 5000 ms...
06:41:46.159 CNTRLR » [Node 041] Assigning SUC return route...
06:41:46.161 CNTRLR » [Node 041] Deleting SUC return route...
06:41:46.167 CNTRLR   [Node 041] Deleting SUC return route failed: The driver is not ready or has be
                      en destroyed (ZW0103)
06:41:46.171 CNTRLR   [Node 041] Assigning SUC return route failed: The driver is not ready or has b
                      een destroyed (ZW0103)
06:41:46.184 DRIVER   all queues idle
Client disconnected
Code 1000:
Server closed
06:41:46.321 DRIVER   driver instance destroyed

I have a zst10 stick and the issue happens without running home assistant at all (just starting using zwave-js-server is enough).

@AlCalzone
Copy link
Member Author

See #6341

@sashalevin
Copy link

Hm, that's interesting: is it normal for the driver to be reloading every few minutes or so? It makes home assistant unusable.

I tried using zwave-js-ui to disable soft resets, but it didn't seem to help with the issue.

@AlCalzone
Copy link
Member Author

AlCalzone commented Oct 5, 2023

No it's not normal.

Please make a driver log, loglevel debug and attach it here as a file (drag & drop into the text field).

@sashalevin
Copy link

Ack, attached.
zwavejs_2023-10-05.log

@AlCalzone
Copy link
Member Author

Looks like whenever Z-Wave JS tries to interview node 41, requesting the node's info causes the controller to time out and not send the expected callback, triggering the new "unresponsive controller" detection.

That first restarts the stick, which doesn't resolve the issue, then it restarts Z-Wave JS.

I'll track that problem in #6371

@trankin
Copy link

trankin commented Oct 11, 2023

@AlCalzone , does an MQTT event get raised when an unresponsive controller is detected?

@AlCalzone
Copy link
Member Author

That's a question for https://github.com/zwave-js/zwave-js-ui

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect an unresponsive stick and reset it
4 participants