python - How does git fetches commits associated to a file? -

I am writing a simple parser of .git / * files. I cover almost everything like items, riffs, pack files etc. But I have a problem. Suppose I have a large 300m repository (in a pack file) and I want to find all those commands that have changed some / deep / inside / file files.

Finding a file in it:

Bringing the parent tree
As long as I do not get into the file
Additionally, I am checking the hash of each subfolder on my way to file if one of them is committed as the first, I think the file was not changed (because its parent DIR was not changed)

Then I

Find the file again and check if the hash changes

If yes then the default was committed (i.e. one of the parents before) a file was changing

And I repeat it again until I commit much earlier.

This solution works, but it is useless. In the worst case, the first search can take 3 minutes (300 M pack).

Is there any way to speed it up? I tried to avoid putting such a large object in memory, but I do not see any other way right now and even, the initial memory will take forever: << p> ">

This The basic algorithm that GIT uses to track changes in a particular file is the reason that "GIT log - some / path / to / filetax" is another comparatively slow operation, where many other SCMs This would be simpler than stem (like in CVS, P4 et al is a server file with the history of each repo file file).

However it should not take so long to evaluate: this amount Which you have to keep, is very small in memory. You have already mentioned the main point: Remember the Paid ID which is going on a quick way to avoid touching that subhead. Very rare for them, just like directories, file system (unstoppable).

Are you using the pack index? If you are not, then you must essentially complete the pack It has to be opened because trees can be at the end of the long delta series. If you have an index, you have to apply the delta to get the objects of your tree, but at least you should be able to find them quickly. Keep the delta cache applied, because it is obvious that it is very common to reuse the same or similar base of trees - most tree changes of the object are changing only 20 bytes from the previous tree object. So, to get tree T1, you have to start with the object T8 and apply TD7 to get TD7, T6 .... It is entirely possible that these other trees are T2 -8 will be referenced again.

New Tmime

Search This Blog

python - How does git fetches commits associated to a file? -

Comments

Post a Comment