New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On Error: Continue to Next Line #136
Comments
forking stream-json so that if input files are bad we do not blow up. fork is here: https://github.com/ak--47/stream-json OG comment is here: uhop/stream-json#136
First of all: using the ‘error’ event is a non-starter because it can be fired only once. To wit: https://nodejs.org/api/stream.html#event-error Restarting a generic JSON parser after a syntax error has proved to be impossible. Trust me — I tried. Potentially we can restart a JSONL parser by ignoring a line and returning some specified value, which indicates an error. That’s probably the best we can do. |
@uhop thank you for the kind and thoughtful responses. for now, i have my own fork which just swallows errors and continues parsing. it seems to work for what i need... i'll see if i can make progress on something that could actually work via an option pasted to the jsonlParser ... |
If all you need is the JSONL parser that ignores errors — that can be arranged. Make a good PR, or wait until I have time to add this feature using a temporary solution meanwhile. |
@uhop here's the first real implementation i tried:
_skipErrors_processBuffer(callback) {
const lines = this._buffer.split('\n');
this._rest += lines[0];
if (lines.length > 1) {
try {
this._rest && this.push({
key: this._counter++,
value: JSON.parse(this._rest, this._reviver)
});
} catch (e) {
//first line is bad json, skip it
}
this._rest = lines.pop();
loopBuffer: for (let i = 1; i < lines.length; ++i) {
try {
lines[i] && this.push({
key: this._counter++,
value: JSON.parse(lines[i], this._reviver)
});
} catch (e) {
//bad json, skip it
continue loopBuffer;
}
}
}
this._buffer = '';
callback(null);
} and: class JsonlParser extends Utf8Stream {
static make(options) {
return new JsonlParser(options);
}
constructor(options) {
super(Object.assign({}, options, { readableObjectMode: true }));
this._rest = '';
this._counter = 0;
this._reviver = options && options.reviver;
if (options && options.checkErrors) {
this._processBuffer = this._checked_processBuffer;
this._flush = this._checked_flush;
}
//new
if (options && options.skipErrors) {
this._processBuffer = this._skipErrors_processBuffer;
}
} so basically haven't written any tests or docs, but curious to get your impression before i go further. |
hello @uhop ... me again 😇
i understand that
stream-json
expects to be working with validated json or jsonl, and i've also read many of your comments on why it is not the job of this library to validate json... just to parse itmy use cases involves parsing jsonl of unknown origin, so i don't know ahead of time if it's valid.
i found the the
checkErrors
option, which does throw when a line in the source file is bad json, but i'm wondering if there's a straightforward way, when catching the error to tell the parser to "skip the bad record and continue to the next line\n
" ... something like:(i may not have a correct mental model of how this library packs values... but i'm trying to avoid blowing up on an entire file just because there's one "bad line")
when exploring the jsonl parser code, i realize i can swallow the errors and continue by calling the callback with
null
in the catch clause:but this feels messy...
any pointers are appreciate. thanks again!
The text was updated successfully, but these errors were encountered: