I recently built a new home server (post about this later) and as part of this reboot decided to setup proxmox and utilize containers for improving security and isolation for different services hosted in my home server. Everything has worked quite well (another post about this later) but because I wanted to keep my old server around to prevent disruptions while setting the new one up, I set the new one up with a different hostname than what I use for my home server. Little did I know, that changing hostname is one of the most perilous operations in proxmox. After doing the changes for hostname, there was a problem with the containers I had created. I “fixed” this by manually copying/moving the container files in /etc/pve/lxc
and things seemed to work. However, next reboot revealed a horror show – I saw these error messages in my boot log and my lovingly crafted proxmox configuration seemed to be DOA.
Jul 08 20:01:29 <server> pmxcfs[2621]: [database] crit: found entry with duplicate name 'lxc' - A:(inode = 0x00000000000B63FB, parent = 0x00000000000B63F9, v./mtime = 0xB63FB/0x1688859591) vs. B:(inode = 0x00000000000B65A7, parent = 0x00000000000B63F9, v./mtime = 0xB65A7/0x1688860306)
Ugh – I was really not interested in setting things back up that had taken me a couple of days (granted, the next time should’ve been faster than the first). Some sleuthing on the Internet told me that proxmox saves the configuration in a sqlite db and I wanted to check if I could fix the problem by manually fixing the database. To my surprise, this worked and I got my proxmox setup working again! Here’s what I did:
- Step 1: Backup
/var/lib/pve-cluster/config.db
– in fact make multiple copies. That way if you keep screwing up your recovery process, you can try again from the “original” state. Essentially, the process for attempting the restore is:- Start from a clean copy of config.db
- Make modifications to it
- Replace the one in
/var/lib/pve-cluster/config.db
- Check if it worked – if it did, celebrate and go do something else!
- If it didn’t work, go back to the start of this loop
- Step 2: Check the integrity of your sqlite db for sanity
sqlite3 config.db 'PRAGMA integrity_check'
- Step 3: Use the parent inode value in the error to check all the available entries:
sqlite3 config.db.orig.broken 'SELECT inode,mtime,name FROM tree WHERE parent = 0x00000000000B63F9'
14|1688860143|qemu-server
15|1688860143|openvz
16|1688860143|priv
26|1688860143|pve-ssl.key
35|1688860143|pve-ssl.pem
746491|1688859591|lxc
746919|1688860306|lxc
751855|1688867669|lrm_status
- Step 4: Ok, we can see the problem – there’s a duplicate lxc node. However, we don’t know which one of these to keep. I decided to delete the one showing in my log but this is where being careful and following the loop above can be handy if you’re not lucky with the first attempt. Delete that entry, check parent again for all the available entries:
sqlite3 config.db 'DELETE FROM tree WHERE inode = 0x00000000000B63FB'
sqlite3 config.db.orig.broken 'SELECT inode,mtime,name FROM tree WHERE parent = 0x00000000000B63F9'
14|1688860143|qemu-server
15|1688860143|openvz
16|1688860143|priv
26|1688860143|pve-ssl.key
35|1688860143|pve-ssl.pem
746919|1688860306|lxc
751855|1688867669|lrm_status
- Step 5: Copy this version of
config.db
to/var/lib/pve-cluster
, restart thepve-cluster
service (and other failed pve services as necessary). Check if your proxmox server looks like it should. If it does, you are done. If it doesn’t, start from the beginning (using the copy of the original config.db) and repeat these steps except this time, delete the other duplicate entry instead.
Hi Ram,
thank you very much for the post. I was able to restore my cluster with your information.
Thanks a lot and have a great day!
Stephan