Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chathistory Support #370

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from
Draft

Conversation

andymandias
Copy link
Contributor

@andymandias andymandias commented May 6, 2024

Work in progress PR to add IRCv3 CHATHISTORY support (i.e. #206). Since it's not a widely available feature yet, the goal is to minimize the any effects on Halloy's operation when CHATHISTORY is not available. Testing on ircs://irc.ergo.chat:6697/ to start.

Currently (more-or-less) implemented:

  • Request up to the latest 500 messages or all messages since latest message in channel history when joining a channel.
  • Request up to 500 messages to add to channel history when scrolling to the top of a channel buffer.
  • Basic message deduplication.

Planned:

  • Enable event-playback to allow replay of channel events (JOIN/PART/QUIT/etc). HistServ appears to be non-standard, so all its messages are filtered out for now for parity of experience. Convert HistServ PRIVMSG messages into the appropriate JOIN/PART/QUIT/etc messages.
  • Ensure history is loaded before requesting latest messages after JOIN.
  • Ensure messages received when starting Halloy are marked as new.
  • Request messages when reconnecting to a server (without closing Halloy; current testing shows this is covered by JOIN messages request(s)).
  • Improve UX for loading additional channel history (button at end of history & config option to automatically request when scrolling to top of history).
  • Do not move the backlog divider when adding older messages to a buffer.
  • Test multi-request LATEST functionality (artificially limit the size of batches).
  • Improve performance of deduplication (utilize something like an indexmap for message history?). And/or, avoid the need for deduplication.
  • Use TARGETS to open Query buffers for direct messages sent while disconnected.
  • Add get latest messages & load older messages to Query buffers.
  • Get message ids for user-sent messages.
  • Set up logic for the possibility of a chathistory request timeout.

@andymandias andymandias force-pushed the feat/chathistory branch 7 times, most recently from 23874d0 to bdd144e Compare May 14, 2024 02:37
@tarkah
Copy link
Member

tarkah commented May 14, 2024

@andymandias I'll look to review this very soon 👀

@andymandias andymandias force-pushed the feat/chathistory branch 2 times, most recently from 3bb4486 to d6f23e3 Compare May 14, 2024 19:45
@andymandias
Copy link
Contributor Author

andymandias commented May 14, 2024

Thanks @tarkah! It's definitely still a work in progress, in particular query buffers don't have chathistory support yet. There are also a number of smaller issues to work out (will try to add these to the list at the top of the PR). I am currently using it with soju (timestamp-based) and ircs://ergo.chat/#ergo (msgid-based) and it ~works.

Things I think particularly need reviewing:

  • I'm committing some crimes against asynchronicity with load_history_now and make_partial_now (starting in main.rs, going into manager.rs). I believe that functionality should instead be implemented by a chain of asynchronous commands (load history → get latest message), but I wasn't quite sure how to set up the chain. I set it aside to deal with later, but I suspect you may have an even better solution.
  • It's useful to have a Full history in order to have a clear reference message when adding messages to the history. I've been working on making all processes work with a Partial history, but in the meantime I've messed with the make_partial to allow from converting from a Full-with-unread-messages to a Partial history. Done my best to respect the existing structure, but it probably warrants review.

@andymandias
Copy link
Contributor Author

Brief addendum to note it looks like my next push will involve a moderate restructuring, so probably best to review after (hopefully tonight).

@andymandias
Copy link
Contributor Author

I'll be chipping away at this still, but it's probably in a good spot for a WIP review now.

Comment on lines 243 to 280
Some(buffer::Event::ScrolledToTop) => {
if config.buffer.chathistory.infinite_scroll {
if let Some(buffer) = pane.buffer.data() {
let server = buffer.server();

if clients.get_server_supports_chathistory(server) {
if let Some(target) = buffer.target() {
let message_reference_type = clients
.get_server_chathistory_message_reference_type(
server,
);

let oldest_message_reference = self
.get_oldest_message_reference(
server,
&target,
message_reference_type,
);

clients.send_chathistory_request(
ChatHistorySubcommand::Before,
server,
&target,
oldest_message_reference,
);
}
}
}
}
}
Some(buffer::Event::ChatHistoryBeforeRequest) => {
if let Some(buffer) = pane.buffer.data() {
let server = buffer.server();

if clients.get_server_supports_chathistory(server) {
if let Some(target) = buffer.target() {
let message_reference_type = clients
.get_server_chathistory_message_reference_type(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can consider infinite_scroll within the scroll_view and conditionally return the event or not. Then we can define a single event variant Event::RequestMoreChatHistory that's used from both the button message & scroll to top message and dedupe all this code.

}
}
}
_ => (),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ is dangerous as the compiler won't notify us when new variants are added. Match on None instead to fully exhaust the match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hadn't thought about it that way before, thanks for the suggestion/tip :)

Comment on lines 817 to 770
if config.buffer.chathistory.infinite_scroll {
if let Some((_, pane)) = self.get_focused_mut() {
if let Some(buffer) = pane.buffer.data() {
let server = buffer.server();

if clients.get_server_supports_chathistory(server) {
if let Some(target) = buffer.target() {
let message_reference_type = clients
.get_server_chathistory_message_reference_type(server);

let oldest_message_reference = self
.get_oldest_message_reference(
server,
&target,
message_reference_type,
);

clients.send_chathistory_request(
ChatHistorySubcommand::Before,
server,
&target,
oldest_message_reference,
);
}
}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3rd place this logic is defined. Time to refactor it into a helper method :)

@tarkah
Copy link
Member

tarkah commented May 16, 2024

I'm a bit unsure about all the API changes to history to support this feature. I need to internalize it all more, but it seems to me we shouldn't have so many new code paths. All we're doing is feeding messages to history like we would any other message that comes in, the only difference is we may want to splice it into place vs appending it.

So it seems to me the only API change of history needed would be the following:

  • Did the message come in via history?
  • If yes, splice it into place using message reference

It'd be great to sync up some time to discuss this over IRC so I can better understand the model at a high level as there's a lot to internalize. I'm also seeing async functions getting called in from main but w/out awaiting which is a no-op.

true
} else {
false
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why contains_starting_with when we then insert is as the full value? We need to ensure we're requesting the exact same capability that we checked existed. Why doesn't just contains work here since we check both w/ and w/out draft prefix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I had run into a server that listed the chathistory capability with a value attached (e.g. draft/chathistory=1000), and I used contains_starting_with to sidestep having an issue with it, but I can't find any server where that's the case now. Will switch it back to contains since that's my reading of the specification.

Comment on lines 337 to 347
let events = if let Some(target) = batch_tag
.as_ref()
.and_then(|batch| self.batches.get(batch))
.and_then(|batch| batch.chathistory_target.clone())
{
if let Some(ChatHistoryRequest {
subcommand,
message_reference,
..
}) = self.chathistory_requests.get(&target)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can collapse these two if let branches together by calling self.chathistory_requests.get in an and_then and pattern matching the Some(ChatHistoryRequest in the root if let

Comment on lines 389 to 394
_ => vec![Event::ChatHistorySingle(
message,
self.nickname().to_owned(),
subcommand.clone(),
message_reference.clone(),
)],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we're just forwarding all messages within here to history and not calling handle as we don't want to update local state based on history? That makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Particularly since we're requesting draft/event-playback when available there can be messages that would mess with local state.

Comment on lines 652 to 653
pub enum MessageReference {
Timestamp(DateTime<Utc>, String),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the 2 field / string value of the Timestamp variant here? I don't see that documented anywhere here:

https://ircv3.net/specs/extensions/chathistory

This seems to be our own internal representation of a message reference value. I don't think this should live in isupport, but probably message::Reference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this warrants discussion, particularly since I'm not 100% convinced the way I've implemented this is what we want long term. Difficulties arise with Timestamp based chathistory since the DateTime<Utc> is not unique identifier. All messages within the same second will have the same timestamp, and while that doesn't happen all the time it happens enough (especially since we're getting JOIN/QUIT/PART/etc messages as part of our requests). Also, because of that lumping together, we need to request messages with a server time before the last message we saw, otherwise we might miss messages that happened in the same interval but the client didn't receive. That means we will basically have at least one duplicate message per chathistory request when using Timestamp. In order to perform deduplication and proper stitching together of message histories we need to be able to distinguish between messages with the same timestamp, which is where the String comes in.

Originally I had the String be the text of the message, but that has obvious issues when the message text depends on the current state (i.e. for JOIN/QUIT/etc messages where the user in text will be represented by what we currently know about them). So, I switched the second bit of information to be a client_id, which is a seahash of the Encoded that comes in. I don't love this really, since changing the structure of Encoded will cause ids to shift, but it works better than using text.

Client ids are distinguished from server ids by starting with :, which is an invalid character in a message tag (which is how server tags are delivered).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I swear I was seeing second rounded server times, but I must have screwed up a debug print somewhere because I see proper millisecond times looking at it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulling out the client_id business for now. Will push that along with other changes to match our discussion tonight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tarkah I figured out why I thought I was seeing server times rounded to the nearest second; Halloy's public soju bouncer is returning message server times rounded to the nearest second. The instance of soju I fired up for personal use (edited:) is v0.8.0 now, and it's returning millisecond server times with messages. Maybe the public bouncer is running an older version?

Comment on lines 210 to 228
pub async fn load_now(&mut self, server: Server, target: &str) {
let kind = if proto::is_channel(target) {
history::Kind::Channel(target.to_string())
} else {
history::Kind::Query(target.to_string().into())
};

let loaded_server = server.clone();
let loaded_kind = kind.clone();

let loaded_messages = history::load(&server.clone(), &kind.clone())
.map(move |result| Message::Loaded(loaded_server, loaded_kind, result))
.await;

self.update(loaded_messages);
}

#[tokio::main]
pub async fn make_partial_now(
&mut self,
server: Server,
target: &str,
message_reference: Option<MessageReference>,
) {
let kind = if proto::is_channel(target) {
history::Kind::Channel(target.to_string())
} else {
history::Kind::Query(target.to_string().into())
};

self.data.map.get_mut(&server).and_then(|map| {
map.get_mut(&kind)
.and_then(|history| history.make_partial(message_reference))
});
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these _now methods used for? They're async but we're calling them from a blocking thread so we aren't actually doing anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the #[tokio::main] allows them to run ~synchronously. The histories are loaded, that much I am certain of. Regardless, this is the crime against asynchronicity I was referring to earlier, and is a hack that's not really intended to stay. The ultimate goal of this is to load the history such that the history can be used to (a) find a reference message to base the chathistory request on and (b) perform any necessary deduplication on received messages. I've been working on reducing the need for (b) with the aim of ultimately removing that need entirely, but I haven't tested it lately to see if the need is gone yet. I presume that (a) could be done more elegantly than I'm doing here, but I wanted to hash out some of the basic chathistory functionality before trying to work that out.

Comment on lines 399 to 288
let unread_messages =
message_reference.and_then(|message_reference| match message_reference {
isupport::MessageReference::None => Some(messages.split_off(0)),
_ => messages
.iter()
.rev()
.position(|message| message_reference == *message)
.map(|reference_position| {
messages.split_off(messages.len() - reference_position)
}),
});
let opened_at = if let Some(ref unread_messages) = unread_messages {
unread_messages.iter().fold(
Posix::now(),
|earliest_received_at, unread_message| {
std::cmp::min(earliest_received_at, unread_message.received_at)
},
)
} else {
Posix::now()
};
let messages = std::mem::take(messages);

*self = Self::partial(server.clone(), kind.clone(), Posix::now());
*self = Self::partial(server.clone(), kind.clone(), unread_messages, opened_at);
Copy link
Member

@tarkah tarkah May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow what unread_messages and opened_at means here when we convert from a full "opened" buffer history to a partial "closed" buffer history.

Why are we calling make_partial from a chat history event? The idea behind our history API and full vs partial is that these are full / partial based on if the underlying buffer is open / closed. Our buffer state is the source of truth that drives how history opened / closed.

  • Partial exists as a way to store messages received on a closed buffer which are then fully flushed to disk after X seconds of no activity.
  • Full for open buffers has all messages in it and we append new messages, we save this after X seconds of no activity but keep it open. When a buffer is closed, we flush this entire history to disk and convert it to partial.

This lifecycle is managed by Manager::track which is called after processing any Dashboard messages, where we see which set of buffers are open.

Shouldn't chathistory messages just feed in to the existing full / partial history that exists for a buffer? I don't think chathistory should break this lifecycle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly a continuation of the hack above, and is the second thing I was trying to highlight needed review. I don't disagree with anything you're saying here, crimes are being committed which need to be rectified. But let me lay out the functionality that the crimes provide because I'm not yet sure how else to get at it.

When joining a channel after launch, we want to get all messages since we last closed Halloy. To do that properly, we need to know what the last message we received for that channel was, but we can't do that with having loaded the message history for the channel. We can request the LATEST however many messages, but there will often be duplicates in there we don't really want to process and we won't know if they're duplicates until we can compare them to the message history. That messes with our ability to tell the user whether the channel has any new messages or not, and I wasn't able to see any way around it. The process of loading a channel would also become more complicated in that scenario (deduplication would need to be done on channel load, along with any additional requests needed to fill in chat history that wasn't covered by the initial request), but that part is doable.

So, the hack that's being performed now is to load the history for every channel joined, use the loaded history to form the chathistory request for the latest messages, position received messages, and deduplicate, then make the loaded history partial. All this junk right here is a way to convert a loaded history with all the chathistory request messages added into a proper partial history that we'd see if the chathistory messages had been loaded into a proper, fresh partial history.

Comment on lines 191 to 197
fn add_chathistory_message(
&mut self,
message: Message,
subcommand: ChatHistorySubcommand,
message_reference: MessageReference,
) {
match self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested it, but I think we can simplify this and expose a single add_message API with the following. This is also non-dependent on ChatHistorySubcommand, not sure why that's needed here for splicing the message in.

We basically splice messages in server_time order, allowing us to binary_search on that time which should be pretty fast.

Binary search returns a Result<usize, usize> where the Ok value means an identical Ordering::Equal match was found, so we can swap out that value with the incoming value.

This is 100% safe when we use the Lookup::MessageId since we return Equal if the id's match, otherwise the binary search continues based on server time.

For Lookup::Timestamp this means the message has an identical timestamp. Since servertime extension mandates milisecond accuracy, and if we don't have servertime, we use Utc::now which is nanosecond accuracy, I think this is a strong match. I doubt we will have too many false positives.

The Err value means no match was found, but the position returns is the index where the element can be safely inserted to preserve ordering (server_time), so we can just safely insert it.

    fn add_message(&mut self, message: Message, reference: Option<MessageReference>) {
        if let History::Partial {
            unread_message_count,
            ..
        } = self
        {
            if message.triggers_unread() {
                *unread_message_count += 1;
            }
        }

        match self {
            History::Partial {
                messages,
                last_received_at,
                ..
            }
            | History::Full {
                messages,
                last_received_at,
                ..
            } => {
                enum Lookup {
                    Timestamp(DateTime<Utc>),
                    MessageId(String),
                }

                let lookup = match reference {
                    Some(MessageReference::Timestamp(timestamp, _)) => Lookup::Timestamp(timestamp),
                    Some(MessageReference::MessageId(id)) => Lookup::MessageId(id),
                    Some(MessageReference::None) | None => {
                        if let Some(id) = message.id.clone() {
                            Lookup::MessageId(id)
                        } else {
                            Lookup::Timestamp(message.server_time)
                        }
                    }
                };

                let search = match lookup {
                    Lookup::Timestamp(timestamp) => {
                        messages.binary_search_by(|stored| stored.server_time.cmp(&timestamp))
                    }
                    Lookup::MessageId(id) => messages.binary_search_by(|stored| {
                        if stored.id.as_ref() == Some(&id) {
                            Ordering::Equal
                        } else {
                            stored.server_time.cmp(&message.server_time)
                        }
                    }),
                };

                match search {
                    // Identical value found, replace it
                    //
                    // Message Id is 100% match
                    // Timestamp (server time) is milisecond accuracy so
                    // a very strong match
                    Ok(pos) => {
                        messages.splice(pos..pos + 1, Some(message));
                    }
                    // No match found, insert it at pos (server time sorted order)
                    Err(pos) => messages.insert(pos, message),
                }

                *last_received_at = Some(Instant::now());
            }
        }
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had considered something like this (was thinking of using a BTreeMap), but I was worried about the overhead it would put on the non-chathistory variant of add_message.

I'll put this in and test it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binary search on 10k messages will be pretty quick, that's like 13-14 iterations before you've exhausted the search.

However, I think I need to make it more robust. I have some ideas and will dig in more tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we'd need to sort the messages by server_time or id for this to work. Sorting by server_time should be okay, that's more or less how we expect messages to be already, but we have no guarantee that sorting by id will match the ordering given by server_time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the binary search will insert things in server time sorted order.

I plan to do 2 binary search lookups +- N duration from current message server time.

I can then iterator through those messages and check for an id match. If an id match occurs, replace. If no id match, I can insert after the last position where server time <= current message.

This allows us to handle checking messages with duplicate server times, which the binary search function on its own doesn't handle.

We can also check user + content as well if no id exists for duplicate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure. I'm thinking this add_message can be completely unaware of chat history since we already will have the id and server time in the incoming message, so we can just treat all incoming messages the same for how we insert them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I will leave it to you :) Edit: Happy to help out with it in any way still, will just be getting out of your way for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok checkout 0cd37be and let me know what you think.

It inserts items in server time sorted order and uses a fuzz duration of +/- 1 second of the incoming messages server time to check the surrounding messages for "deduplication" (message id match or a looser definition of server time + target + content match)

This gives us the benefit of only having to do linear search on a 1 second subset of the messages, and a binary search to find the bounds of that linear search.

Otherwise if we want to check ALL messages for a clash of message id, it'd be linear search on 10k message which would hurt. I don't think it's reasonable to check messages outside +/- 1 second for any collision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I agree that checking all messages for a clash is not necessary (and potentially not desirable, since a non-duplicate clash means something odd has happened; let's cross that bridge when we come to it).

I incorporated it further into the existing PR and made a couple small tweaks:

  1. I think we don't want to mark a message as incrementing the unread count when it is truly a duplicate of an existing message. Timestamp-based requests are fuzzed, so we will frequently get a message back that the user has already seen. So I added a bit of logic to not increment the unread message count in that scenario.
  2. JOIN/PART/QUIT messages frequently, in my experience, do not match well on text due to how the user attributes resolve. E.g. when a new user joins a channel it's not uncommon for their user attributes to not resolve at all if they weren't in another channel with the Halloy user, but if they're still in the channel with the Halloy user when the chathistory batch is received then their user attributes will resolve when the chathistory copy of the join is received. So I relaxed the match requirements for those messages to just match on the target, which contains the nick of the user involved in the message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

@tarkah
Copy link
Member

tarkah commented May 17, 2024

@andymandias how does this feature interact w/ bouncers that send a replay buffer on connect? Are the bouncers programmed to only send one or the other, or will both get sent? If both are sent, do we just rely on our dedupe strategy to eliminate dupes, or do we have some other mechanism to handle this?

@andymandias
Copy link
Contributor Author

@tarkah the ircv3 description of the specification says that the replay buffer should not be sent automatically when a client has negotiated chathistory. In my experience with soju that is the case (ZNC, which I've switched off of, does not support chathistory to my knowledge). I think we should be prepared to potentially get dupe messages around join time (even if they follow ircv3 specification, I think we could end up with some dupes if - for example - a message is sent to the channel right after we join but before the server receives our associated chathistory request). But, my feeling at the moment is that our dedupe strategy is sufficient for that purpose.

Add `id` component to messages, and record message id when available.
Request all messages since last seen message or latest <=500 messages when joining a channel.
Request <=500 older messages when scrolling to the top of channel buffer.
Drop duplicate messages.
Initial timestamp-based depduplication.
…or of its utilization).

Fix timestamp-based deduplication.
Skip `RPL_TOPIC` and `RPL_TOPICWHOTIME` messages when looking for the oldest/latest message in a channel.
andymandias and others added 17 commits May 20, 2024 17:00
…p-to-date message reference.

Add fuzz interval to timestamp based `chathistory` requests.
Ignore messages received at/after `JOIN` when finding the latest message in a channel (replaces ignoring `RPL_TOPIC` and `RPL_TOPICWHOTIME` messages).
Ignore internal messages when finding the oldest/latest message in a channel.
…when finding oldest/latest message in channel.
…lication, then convert to `History::Partial` for all channels without a pane.
…d when joining a channel.

Filter timestamp-based `chathistory` batches in order to reduce the need for record-time deduplication.
Use hash of `Encoded` for client-side message-id (used to distinguish between timestamp-based `chathistory` messages with the same timestamp).
…ry` messages.

Add preference for automatically loading older `chathistory` messages when scrolled to the top of channel history.
…g at the first message that belongs in the backlog. This way the backlog divider won't be moved when messages are added to the beginning of a message history via `chathistory`.
Renamed configuration option to more typical phrase.
Use `server_time` to position message with respect to unreferenceable messages.
Fix for `opened_at` calculation in `make_partial` (the earliest received message does not always appear at the beginning of the `unread_messages` vector).
Other simplifications and clarifications suggested by review.
Efficient deduplication via the new insertion strategy can be used in place of filtering received `chathistory` batches.
Minor tweaks to insertion strategy: more stringent checking for triggering unread notification (to avoid false positives) and more relaxed matching for `JOIN`/`PART`/`QUIT` messages (to avoid false negatives).
… all `chathistory` subcommands.

Switch from repeated `AFTER` subcommands to `LATEST` → repeated `BETWEEN` commands to receive history when joining a channel (`AFTER` can fail silently, with 0 returned messages, if the message reference is no longer in the history available from the server).
Tweaks to `insert_message` (utilize same fuzz constant as `send_chathistory_request`) and to `is_referenceable_message` (disallow `RPL_TOPIC` and `RPL_TOPICWHOTIME` as message references).
@andymandias
Copy link
Contributor Author

Another restructuring, to better allow for using LATEST then repeated BETWEENs to update when joining a channel. The old scheme using repeated AFTER would fail to receive any messages when the reference message was no longer in the server's available message history. The restructuring should also make it easier to utilize the TARGETS subcommand, which I plan to work on next.

Should be a bit clearer in intent and use than before, but it doesn't change any of the problem areas of the PR (History and asynchronicity).

Allow messages with matching content to match as duplicate without matching `server_time` iff the stored message is a sent message and the inserted message is a received message;  reduced strictness to match sent messages where the server's copy will have a slightly different time than the client.
… channel.

Messages retrieved when joining a channel are no longer properly deduplicated.
Channels will be marked as having unread messages even if those messages will ultimately wind up deduplicated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants