Background
I've been experimenting with using SqlStorageClient to store dataset data within a SQLite database. During my testing, I've seen quite a few database lock errors, like the following, in my logs:
[crawlee.storage_clients._sql._client_mixin] WARN Error occurred during session transaction: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO key_value_store_records (key_value_store_id, "key", value, content_type, size) VALUES (?, ?, ?, ?, ?) ON CONFLICT (key_value_store_id, "key") DO UPDATE SET value = excluded.value, content_type = excluded.content_type, size = excluded.size]
[parameters: ('u4IehavotWGCQRNEj', 'xxxxxxx', <memory at 0x7f73dd7b68c0>, 'application/json; charset=utf-8', 62)]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Looking at the crawlee source code, I see that these errors seem to get "swallowed up" and the transaction is rolled back, causing data loss.
@asynccontextmanager
async def get_session(self, *, with_simple_commit: bool = False) -> AsyncIterator[AsyncSession]:
"""Create a new SQLAlchemy session for this storage."""
async with self._storage_client.create_session() as session:
# For operations where a final commit is mandatory and does not require specific processing conditions
if with_simple_commit:
try:
yield session
await session.commit()
except SQLAlchemyError as e:
logger.warning(f'Error occurred during session transaction: {e}')
await session.rollback()
else:
yield session
I do not see any existing mechanism to retry failed transaction nor any obvious way for me to implement transaction retries in my client code.
Expected behavior
A transient database error during a write operation should either:
- Re-raise the exception after logging, so the caller (and ultimately the crawler's retry logic) can handle it, or
- Retry the transaction internally with backoff, which is especially important for SQLite's common
SQLITE_BUSY / database is locked errors.
Actual behavior
The error is caught, logged as a warning, and silently discarded. The data is lost with no retry.
Background
I've been experimenting with using
SqlStorageClientto store dataset data within a SQLite database. During my testing, I've seen quite a few database lock errors, like the following, in my logs:Looking at the crawlee source code, I see that these errors seem to get "swallowed up" and the transaction is rolled back, causing data loss.
I do not see any existing mechanism to retry failed transaction nor any obvious way for me to implement transaction retries in my client code.
Expected behavior
A transient database error during a write operation should either:
SQLITE_BUSY/database is lockederrors.Actual behavior
The error is caught, logged as a warning, and silently discarded. The data is lost with no retry.