Apparently the TYPO3 version we are using, doesn’t play too nice with the MySQL MasterMaster replication.
Sometimes, something like this is going to happen:
1 2 |
070826 0:44:32 [ERROR] Slave: Error 'Duplicate entry '75-222419149' for key 1' on query. Default database: 't3nb'. Query: 'INSERT INTO cache_pagesection 070826 0:44:32 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'dbc-mysql1.000192' position 611861372 |
Well, as you can see from the last line in the log, the Slave-SQL thread found a duplicate entry and thought it is smart to just turn off the thread instead of disregarding the just made entry. So from now on, both databases drift since there ain’t no replication anymore until someone kick starts the replication again (someone being me).
Anyway, I think I finally traced the fucker down, supposedly one of the problematic cases is located in t3lib/class.t3lib_tstemplate.php on line 362.
1 2 |
$GLOBALS['TYPO3_DB']->exec_DELETEquery('cache_pagesection', 'page_id='.intval($GLOBALS['TSFE']->id).' AND mpvar_hash='.t3lib_div::md5int($GLOBALS['TSFE']->MP)); $GLOBALS['TYPO3_DB']->exec_INSERTquery('cache_pagesection', $insertFields); |
Basically what TYPO3 is doing is a DELETE and an INSERT right afterwards. But apparently, it doesn’t check whether the DELETE even succeeded. I hacked it for now, simply adding this:
1 2 3 4 5 |
- $GLOBALS['TYPO3_DB']->exec_INSERTquery('cache_pagesection', $insertFields); + // Only insert a new cache entry with the same value, if the DELETE succeeded + if ($GLOBALS['TYPO3_DB']->sql_affected_rows() == 1) + $GLOBALS['TYPO3_DB']->exec_INSERTquery('cache_pagesection', $insertFields); + |
Sadly, this looks more and more like a race-condition between the two boxes (as in the replication / UPDATE being too slow), when users visit a edited site, that hasn’t had it’s cache regenerated yet. Problem is, it ain’t just this single spot, but also the search indexing, image cache and the whole page cache. For now we switched the cluster to active/passive load balancing, till we have a chance to see if a newer TYPO3 fixes those issues.