February 8, 20251002 words

Data Practices Thoughts

In my opinion, a good data practice must include 3 principles:

  • It must always be easily retrievable programmatically via API
  • It must be scalable, organizable, and definitively searchable
  • There must be a single source of truth, duplications must have pointers/hierarchies

I don't even care that much about open source. I am purely pragamtic.

Google Photos/Instagram:

  • ❌ Not easily retrievable progammatically, API frequently breaks
  • ❌ They are NOT scalable. Google Photos have search, but it is not "definitive". Google Photos have a lot of json files, but that's only retrievable via Google Takeout. You can only organize and change the old photos on their graphic web interface.
  • ❌ There is no permanent URLs, and there is no "single source of truth" for your photos if you plan to use them elsewhere. You cannot embed a single photo into your markdown file. There is a URL for Google Photos that look something like lh3.googleusercontent.com. However, it can break. If you put a photo into your markdown file, you do not know easily which photo you are referring to, unless you are not using Google Photos as your "source of truth".

WeChat:

  • ❌ No API, strong censorship. WeChat official accounts do not even have a stable web interface.
  • ❌ Not searchable. Your data is not even preserved on the cloud or easily exportable. While you can search, it's searching the chats inside your local device. If you search for articles, the results are randomly controlled by an authoritarian state.
  • ❌ Your data is not even preserved on the cloud unless you sync from one device to another inside the app. In fact, WeChat groups frequently disappear. You cannot log in simultaneously.

Telegram:

  • ✅ You can use bots. While it is not a complete cloud developer platform, the API is robust.
  • 🟡 Telegram is very scalable for groups, offers endless file storage, and the search works well. However, you still can't organize your chat history or put groups into categories. Anyway, it is already the best among consumer apps.
  • ✅ Telegram is synced across devices and you can log in to multiple devices simultaneously. However, if you chat with someone else they can delete the message.

S3/Object Storage:

  • ✅ Robust API and CDN integration.
  • ✅ Highly scalable, searchable and you can use json files and fake hierarchies to organize.
  • ✅ Highly stable, and you can just embed your files elsewhere to point back to the S3 bucket.

Text with Version Control (Git):

  • ✅ Pure text/markdown
  • ✅ Text is very small, as long as you write a good gitignore and put the blobs into S3, you can do anything with text
  • ✅ You can use one main branch, you have all of the commit history

Remote VPS Server for Storage (Ok, I don't manage my VPS anymore):

  • ✅ Easily accessible, can schedule jobs, can create a VPC for security, though it requires some work and debugging
  • 🟡 For reliability and to preserve the data, or to prevent yourself from accidentally damaging the server, you can use several servers or dedicated ones for frontend/searching/database. Pure disk space is more expensive to store data compared to typical Object Storage with metadata. It takes more time and a lot of work to maintain.
  • ✅ You can create snapshots for the server, and you can do anything with your servers

Smartphone File Storage:

  • ❌ Not easily accessible programmatically and does not have a comfortable keyboard. While developer tools like termux and adb exist, they are not very controllable like a native terminal. If a script goes wrong, it takes a massive amount of time to debug on a tiny screen, which makes you not want to mess on smartphones at all. On the other hand, Linux smartphones are too niche and not fully functional.
  • 🟡 A standard smartphone uses USB-2, which is slow for file transfer. Actually it is very difficult to get a large file out of that small screen onto the laptop even if you are plugged in. Even backing things up on Telegram is easier.
  • ❌ Highly confusing file system with unclear privileges for different apps. Many folders not accessible without jailbreaking, which is increasingly difficult and may brick your phone completely.

Archive.org

  • ✅ You can use it like an S3.
  • 🟡 It is highly searchable, has no file limits (though you can't upload an encrypted zip, but some people just zip the encrypted zip again), and you can tag your files. However, Archive can sometimes be slow and doesn't guarantee reliability as it is a free service.
  • ✅ If you upload a file it provides you with a URL to view it.

The Complicated, Horrible Solution (Even if you do automation, you have to make sure they can ran good without errors, and it is just too exhausting. You spend too much time and energy without any real life happiness.)

Conclusion: My Minimalistic Way to Control My Data

  • Use mainstream services like normal (Telegram, Instagram, YouTube, TikTok, etc). If you want to take some brief notes, write it in Telegram, then update to the blog later. If you want to save a file and send it later, save it in Telegram.
  • Put my blog on Github
  • Every month:
    • Back up (Github) and (Photos/Videos on my Phone) to Cloudflare R2
    • Back up Cloudflare R2 to my second laptop and external hard drive



Loading comments...