I need to scan very large JSONL files efficiently and am considering a parallel grep-style approach over line-delimited text.
Would love to hear how you would design it.
I need to scan very large JSONL files efficiently and am considering a parallel grep-style approach over line-delimited text.
Would love to hear how you would design it.
Don’t have a thread doing line by line file reads, just to have it in memory. There is a piece of software optimized for tasks like this, the OS.
Just mmap your file and start processing.