From vincent at nexedi.com  Thu Dec 23 12:20:10 2010
From: vincent at nexedi.com (Vincent Pelletier)
Date: Thu, 23 Dec 2010 11:20:10 +0000
Subject: [Neo-dev] Code status update & short roadmap
In-Reply-To: <201012151320.19156.vincent@nexedi.com>
References: <201012151320.19156.vincent@nexedi.com>
Message-ID: <201012231120.10984.vincent@nexedi.com>

Le mercredi 15 d?cembre 2010 13:20:19, Vincent Pelletier a ?crit :
> Client code will be fixed so this test passes.

I have working code on my maching for this, I need to split it in smaller 
patches (it's quite huge) if possible, and update NEO tests.

> An iterator test fails on current code, which is because current iterator
> implementation is too superficial. It is being completely reworked.

Done in r2550.

There is a notable recent change (r2564, some fixes are coming as we found 
bugs after committing), which reduced storage tpc_finish lock lifespan to 
increase storage operation parallelism (object-level read-lock taking & 
answering master). It should be its smallest lifespan, and is expected to 
reduce the amount of deadlocks encountered after the work on locks described 
in previous mail.

For the record, those deadlocks happens in either of those scenarios:
- competing transactions taking write lock on the same object on differently-
  sorted list of storage nodes (T1 locks on S1, T2 locks on S2, then T1 tries
  lock on S2 while T2 tries to lock on S1).
  This can be reduced somewhat, but there is no perfect solution known yet.
- competing transactions taking write locks for a common subset of object, but
  in different order (T1 locks O1, T2 locks O2, then T1 tries to lock O2 while
  T2 tries to lock O2).
Both cases cause lock timeout to be reached (30s by default) and at least one 
transaction gets rolled back (ConflictError is raised), freeing locks for the 
remaining transaction. This degrades performance significantly, and it's bad 
to rollback transactions.

-- 
Vincent Pelletier


From vincent at nexedi.com  Thu Dec 23 12:20:12 2010
From: vincent at nexedi.com (Vincent Pelletier)
Date: Thu, 23 Dec 2010 11:20:12 -0000
Subject: [Neo-dev] Code status update & short roadmap
In-Reply-To: <201012151320.19156.vincent@nexedi.com>
References: <201012151320.19156.vincent@nexedi.com>
Message-ID: <201012231120.10984.vincent@nexedi.com>

Le mercredi 15 d?cembre 2010 13:20:19, Vincent Pelletier a ?crit :
> Client code will be fixed so this test passes.

I have working code on my maching for this, I need to split it in smaller 
patches (it's quite huge) if possible, and update NEO tests.

> An iterator test fails on current code, which is because current iterator
> implementation is too superficial. It is being completely reworked.

Done in r2550.

There is a notable recent change (r2564, some fixes are coming as we found 
bugs after committing), which reduced storage tpc_finish lock lifespan to 
increase storage operation parallelism (object-level read-lock taking & 
answering master). It should be its smallest lifespan, and is expected to 
reduce the amount of deadlocks encountered after the work on locks described 
in previous mail.

For the record, those deadlocks happens in either of those scenarios:
- competing transactions taking write lock on the same object on differently-
  sorted list of storage nodes (T1 locks on S1, T2 locks on S2, then T1 tries
  lock on S2 while T2 tries to lock on S1).
  This can be reduced somewhat, but there is no perfect solution known yet.
- competing transactions taking write locks for a common subset of object, but
  in different order (T1 locks O1, T2 locks O2, then T1 tries to lock O2 while
  T2 tries to lock O2).
Both cases cause lock timeout to be reached (30s by default) and at least one 
transaction gets rolled back (ConflictError is raised), freeing locks for the 
remaining transaction. This degrades performance significantly, and it's bad 
to rollback transactions.

-- 
Vincent Pelletier