Hello everyone 🤗🤗
I could use some guidance from experienced developers on a project I'm currently working on that requires using Java to process and analyze large data sets. The data I'm working with consists of several gigabytes of data in various organized and semi-structured formats, including CSV and JSON.
I want to make sure that memory issues during processing do not cause the application to crash. Do you think Java Streams are a good option for managing large amounts of data and are there other options?
What libraries can effectively handle large JSON files without having to load the entire file into memory? I've heard of Jackson and Gson, but I'm not sure how well they work with large files.
I want to use multithreading to speed up processing since the data is easily parallelizable. Do you suggest any specific frameworks or patterns for effective thread management?
What procedures do you use to ensure reliable error handling, especially when processing potentially corrupted data in large files?
Any recommendations for tools, libraries? and design patterns that could improve process efficiency are also welcome. I would be interested to hear how you approached the project and what problems you encountered if anyone has worked on something similar before.
I also read this post https://www.java-forum.org/thema/grundsaetzliche-fragen- salesforceadmintraining zu-geplanter-programmstruktur.191711/#post-1251012 , but I did not find a sufficient solution to my question.
I appreciate your advice in advance! I'm excited to absorb the knowledge this community has to offer.
I could use some guidance from experienced developers on a project I'm currently working on that requires using Java to process and analyze large data sets. The data I'm working with consists of several gigabytes of data in various organized and semi-structured formats, including CSV and JSON.
I want to make sure that memory issues during processing do not cause the application to crash. Do you think Java Streams are a good option for managing large amounts of data and are there other options?
What libraries can effectively handle large JSON files without having to load the entire file into memory? I've heard of Jackson and Gson, but I'm not sure how well they work with large files.
I want to use multithreading to speed up processing since the data is easily parallelizable. Do you suggest any specific frameworks or patterns for effective thread management?
What procedures do you use to ensure reliable error handling, especially when processing potentially corrupted data in large files?
Any recommendations for tools, libraries? and design patterns that could improve process efficiency are also welcome. I would be interested to hear how you approached the project and what problems you encountered if anyone has worked on something similar before.
I also read this post https://www.java-forum.org/thema/grundsaetzliche-fragen- salesforceadmintraining zu-geplanter-programmstruktur.191711/#post-1251012 , but I did not find a sufficient solution to my question.
I appreciate your advice in advance! I'm excited to absorb the knowledge this community has to offer.