Sorting: Internally and Externally
As the amount of data we handle keeps getting larger, standard in memory sorts that are O(n^2) are not longer acceptable. We need them faster and we need sorts that can deal with more data that can be stored in internal memory at one time. This lab is designed to look into both of those issues
Objectives
- Learn to use binary files
- Improving understanding of cohesion and coupling
- Work with interfaces
- Learn and implement the merge sort
- Learn and implement an external sort
- Adapt your programs to report the amount of work
Assignment
Phase 1:
- Design a class called DataGroup that can hold up to 10,000 pieces of data. I will provide a number of data files with a large amount of data. We only care about parts 0, 4, 6, 8, and 9 of each record. I will provide a class for the data itself. These files are in text
- This class should implement the merge sort on its data and track the number of compares and copies whenever it is called upon to sort.
- The data read in might not use the entire array.
- Use the QueueInterface provided when implementing your queue.
- Design another class called BigData that can sort more data than can fit in memory using the method covered in class
- This class should report back the number of reads and number of writes (each divided by 5) done by the program.
- Develop a testing plan that will sufficiently show the amount of work (Big O) for each of your sorting methods.
- Turn in the software design report as a pdf by the due date specified on Canvas.
- Data files to use can be found at: https://excelbianalytics.com/wp/downloads-18-sample-csv-files-data-sets-for-testing-sales/ . Be sure to test your program on large data files. You might need to modify some code to ignore the first line of data.
- Link for GitHub Classroom: https://classroom.github.com/a/fBHMC7kO
Phase 2:
- Implement your designs
- Run your working programs on enough data to demonstrate their Big O
- In a second section of your testing report, demonstrate that your Big O estimates are correct.
Submission instructions for Phase 2
- Complete the SDR.
- Combine your SDR and all of your .java and testing files into one zip folder
- Submit the zip folder on Canvas by the due data specific there.
Restrictions:
- You may not use any Internet resources (other than to look up Java syntax).
- You may use code from this course or last semesters.
- You may not use any online code for any part of the sorting
Challenges
- Learn and implement the heap sort (just using your textbook) for the internal sort.