Skip to content

Commit

Permalink
update changelog
Browse files Browse the repository at this point in the history
  • Loading branch information
PSeitz committed Feb 8, 2023
1 parent 517f1da commit b20c518
Show file tree
Hide file tree
Showing 2 changed files with 89 additions and 3 deletions.
87 changes: 87 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,92 @@
0.11.0 (2023-02-08)
==================

### Documentation

- Docs: add decompress block example

### Features

- Feat: allow to pass buffer larger than size, warn on missing docs

### Performance

- Perf: faster duplicate_overlapping [#69](https://github.com/PSeitz/lz4_flex/pull/69)

improve duplicate_overlapping unsafe version. The compiler generates unfavourable assembly for the simple version.
Now we copy 4 bytes, instead of one in every iteration.
Without that the compiler will unroll/auto-vectorize the copy with a lot of branches.
This is not what we want, as large overlapping copies are not that common.

- Perf: simplify extend_from_within_overlapping [#72](https://github.com/PSeitz/lz4_flex/pull/72)

extend_from_within_overlapping is used in safe decompression when
overlapping data has been detected. The prev version had unnecessary
assertions/safe guard, since this method is only used in safe code.
Removing the temporary &mut slice also simplified assembly output.

uiCA Code Analyzer

Prev
Tool Skylake IceLake Tiger Lake Rocket Lake
uiCA Cycles 28.71 30.67 28.71 27.57

Simplified
Tool Skylake IceLake TigerLake Rocket Lake
uiCA Cycles 13.00 15.00 13.00 11.00

- Perf: remove unnecessary assertions

those assertions are only used in safe code and therefore unnecessary

- Perf: improve safe decompression performance 8-18% [#73](https://github.com/PSeitz/lz4_flex/pull/73)

Improve safe decompression speed by 8-18%

Reduce multiple slice fetches. every slice access, also nested ones
,carries some overhead. In the hot loop a fixed &[u8;16] is fetched to
operate on. This is purely done to pass that info to the compiler.

Remove error handling that only carries overhead. As we are in safe
mode we can rely on bounds checks if custom error handling only adds overhead.
In normal operation no error should occur.

The strategy to identify improvements was by counting the lines of
assembly. A rough heuristic, but seems effective.
cargo asm --release --example decompress_block decompress_block::main |
wc -l

- Perf: improve safe frame compression performance 7-15% [#74](https://github.com/PSeitz/lz4_flex/pull/74)

The frame encoding uses a fixed size hashtable.
By creating a special hashtable with a Box<[u32; 4096]> size,
in combination with the bit shift of 4, which is also moved into a constant,
the compiler can remove the bounds checks.
For that to happen, the compiler also needs to recognize the `>> 48` right
shift from the hash algorithm (u64 >> 52 <= 4096), which is the case. Yey

It also means we can use less `unsafe` for the unsafe version

- Perf: switch to use only 3 kinds of hashtable [#77](https://github.com/PSeitz/lz4_flex/pull/77)

use only hashtables with fixed sizes and bit shifts, that allow to
remove bounds checks.


### Refactor

- Refactor: remove VecSink [#71](https://github.com/PSeitz/lz4_flex/pull/71)

remove VecSink since it can be fully replaced with a slice
this will reduce code bloat from generics

### Testing

- Tests: add proptest roundtrip [#69](https://github.com/PSeitz/lz4_flex/pull/69)

0.10.0 (2023-01-30)
==================
### Features
Add support of decoding legacy frames, used by linux kernel (thanks to @yestyle)
* https://github.com/PSeitz/lz4_flex/pull/66

Expand Down
5 changes: 2 additions & 3 deletions cliff.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,13 @@
[changelog]
# changelog header
header = """
# Changelog\n
All notable changes to this project will be documented in this file.\n
"""
# template for the changelog body
# https://tera.netlify.app/docs/#introduction
body = """
{% if version %}\
## [{{ version | trim_start_matches(pat="v") }}] - {{ timestamp | date(format="%Y-%m-%d") }}
{{ version | trim_start_matches(pat="v") }} ({{ timestamp | date(format="%Y-%m-%d") }})
==================
{% else %}\
## [unreleased]
{% endif %}\
Expand Down

0 comments on commit b20c518

Please sign in to comment.