Partial sorting algorithm -


Say I have 50 million features, each feature comes from disk.

At the beginning of my program, I handle every feature and depend on some situations, I apply some amendments.

At this point in my program, I am reading a feature from the disk, processing it and writing it back, because well enough to open all the 50 million features at once There is no RAM.

Now to say that I want to sort these 50 million features, there is no optimal algorithm to do this because I can not load everyone at the same time?

Partial sorting algorithm or something like that?

In general, the algorithm you are searching for is called its class. Probably the most widely known example of such sorting algorithm is called.

The idea of ​​this algorithm (external version) is that you divide the data into pieces which you can sort in-place memory (100 thousand) and each block independently Sort (like using some standard algorithms) then you take the blocks and merge them (so you can merge two 100k blocks in a 200k blocks) by reading both elements of the block in buffer (h The block is already being done by the sorted). Finally, you combine two small blocks into one block, in which all the elements will be in the right order.


Comments