What is with you kids and DBs?

Discussion in 'Plugin Development' started by hash, Feb 8, 2011.

Thread Status:
Not open for further replies.
  1. Offline

    hash

    No, seriously, what is with you kids and your databases these days?

    Modern database systems are fabulously complex and powerful software tools. However, I think a lot of maturing developers see DBs as a hammer, and once they get one in their hands, everything starts looking like a nail -- even if it's actually a banana.

    Databases have, I think, three main broad categories of features that interest people: the claim of atomicity and durability, the ability to apply relational operators to datasets, and the impression of efficiency over flat files. I'm going to discuss each of these motivations for databases (in reverse order, because I'm that kind of guy), and then put in it context of doing the same thing with a flat file configuration system like YAML (which seems to be the main competitor around here; I'm a fan of JSON, myself, but they're essentially the same).

    3. Databases are essentially designed for situations where you have to deal with significantly more data can fit in the RAM of a machine. That's the number one rule behind their development. And if you have data that is several gigs in size, then a database suddenly becomes a hugely awesome tool for helping you get the most out of managing your data without burning your disk drive on seek operations.

    However, that condition -- if you have data that is several gigs in size -- ought to be heeded! If you DON'T have data that big, bringing a relational database system into the works is using a nuclear bomb to kill a housefly. It's just not necessary. A developer of even a few months of experience in his language of choice should have absolutely no difficulty beating the performance of a database when it comes to configuration data that is only kilobytes or even megabytes in size -- you simply have to remember to cache the data in memory after you've parsed it and not load the whole file again every single time you want to query something.

    2. Relational database systems have all of these cool operators like "SELECT" and "WHERE" and "AVE" and "MAX" and "COUNT" and so on and so forth as a direct result of the fact that they're intended for use with massive datasets that can't all fit in RAM. It's utterly trivial to implement all of these operations if your data can fit entirely in RAM! All of the questions that relational operators let you ask of a database can essentially be implemented by nested loops -- and that's just what the databases do on the inside. The reason the databases offer these operators instead of making you do it yourself is because many of them can be implemented more quickly on data that is maintained in a sorted way, and sometimes by only accessing small disk regions instead of reading entire files (i.e. you only have to read the end of a dataset to find the max if you know it's been sorted).

    But if you can fit all of your data in RAM, why bother using the database as a heavy-weight intermediate between you and your data? Sort it yourself -- it's a one-line call in java, for Christ's sake -- and then this again becomes a situation where even a very mediocre developer can easily beat the performance of a database. The absolute BEST performance a database could offer in a situation like this is by simply loading all of your data into RAM in one go, and thus serving as nothing but an extremely fat middleman between you and a simple read of a single file.

    1. The atomicity of operations and the durability on the filesystem that databases offer is, indeed, formidable.

    But if you can hold all your data in RAM, you can write it all out in one file by yourself too... and while file writes are impossible to make atomic in the presence of machine failures (which IS a valid reason why databases can be valuable)... it IS possible to get yourself cheap atomicity anyway. Write your updated data to a new, temporary file. MOVE the temp file where the old/main file was. That move operation? Atomic.



    I'm tired of seeing database systems thrown at every little problem. There ARE situations where databases can be indispensably valuable tools, yes, but it is not EVERY situation -- databases come with a cost. One cost in particular is very visible in a community like this one: you're taking control away from your clients, because they are no longer able to use plain text configuration.

    That's big.

    That's so big that if you don't immediately understand the seriousness of that issue I'm not even sure how to begin to explain it to you.



    If you're a young developer just getting your feet wet and starting to flex your creative muscle, it can feel like you're facing a massive series of hurdles before the results you want even come near being in sight... been there often enough :) and I think that feeling never really goes away for good. But none of your problems are ever solved by using "uber" tools with heaps of buzzwords attached to them like "scalable atomic durable consistent database!" any more than waving of a magic wand, and just because some dude in a big company jismed on a blog once about how cool it is doesn't mean you need to go out and base your life around it from now on. Good solutions to problems come from good thinking, and the first step in good thinking is realizing what your needs really are.

    Stop hammering my bananas. :)
     
    Professor29, rakiru and SUparJErk like this.
  2. Offline

    Dinnerbone Bukkit Team Member

    May I ask who this is aimed at and why? Databases are there to store persistent data. That's their purpose. Storing configurables, that's bad yes. Data that only you should care about accessing in raw, that's good.
     
  3. Offline

    feverdream

    I fully support the above OP.

    Too many snerts and beginning programmers on the forum are telling us seasoned veterans that we should use a database for everything; Thats bullshit and it only makes things harder on users.

    I have nothing against databases if they are used correctly, but a DB seems to be seen as the proverbial hammer as the OP described, and its really getting annoying as hell when some dimwitted no-forehead slack-jaw keeps saying that using a db is better then using a flat file when you only have to store one or two lines of ascii text that is updated at most once or twice entire year, or even as much as once or twice a month if the server maintainers dont know what they are doing or like to test things.

    This thread should be stickied, imho.
     
  4. Offline

    hash

    This isn't really directed at anyone. I guess I just have a tendency to shout "stop and think about it for a moment!" at empty rooms now and again... but I also am really quite curious. I'd be pleased as punch if some folks could lay out to me what kind of a decision making process it is that makes them want to involve relational databases in data smaller than a megabyte.

    I'm not opposed to beginning programmers having opinions either though, haha. Even choosing a side that's wrong can teach you something if an argument gets you to do some more research. But I do admit that I perceive a definite trend in programmers (some young and some old, and not just in this forum by any means) to hold databases up like bibles and quote passages from marketing departments if challenged instead of applying critical thinking, and that bothers me.

    I reckon the delineation between persistent data versus configurables can make sense, but I'd still only make that admission with the caveat that those things can get awfully blurry if we're speaking in the context of bukkit plugins. Maybe I'm just not stretching my imagination enough, but it's pretty hard for me to come up anything where a server administrator might not want to peek or poke the data now and again... especially if you, say, put things in the context of other tools like MCEdit that let the server administrator slide the entire world one to the right. If you want to make a plugin that's going to be tolerant of the administrator in a situation like that, nothing where you store coordinates is ever allowed to fall into that nonconfigurable data category.

    I'm also supremely impressed by some of the plugins I've seen that manage to use the game world itself to store almost all of their information. CraftBook, for example -- using a line of fence posts to remember the location of an open gate instead of resorting to external storage is frankly pure genius, and I wish more developers would follow that lead. It also reminds me bring up another point of irony about the wanton and thoughtless use of DBs: the entire point of the atomicity guarantees is utterly ridiculous in the context of minecraft (or indeed, any place where you have more than one database). What if minecraft crashes and the world saves in a different state than the last time your precious database committed? Your illusion of atomicity is suddenly revealed as just that -- an expensive illusion.
     
  5. Offline

    Valrix

    Well if people aren't using the DB properly than this is obviously a problem. But you also need to consider more than just the size of the data being stored such as storing it so another programming language and pull it from the database instead of having to read a file possibly deep within the system. It really depends on convenience and if it's really needed. I agree that a few lines of ASCII or something very basic should be done as a flatfile, but it's truly based on common sense. If the developer doesn't know when to use a DB and when to use a FF, they either shouldn't be making plugins or they need to learn when to use and not to use a DB.
     
  6. Offline

    Redecouverte

    basically text/yaml files are good, as long as you don't write lots of data into it that gets updated very often

    imagine a text file with 1000 lines, that needs to be re-written every 10s because one entry changes, that's bad , use databases here,

    in other cases: don't use them
     
  7. Offline

    hash

    Um, a 1000 line file is still probably between 1 and...oh, say 10 filesystem blocks (average of about 40 characters per line sounds fair for a config-ish file?). I'm preeeetty sure my computer can handle one seek every 10s. :) Doesn't matter if I have to rewrite all 10 blocks after that; the whole operation is going to take milliseconds and is still dominated by the disk seek time anyway -- so the 1000 line file is hardly any different than a 10 line file at all at that point. You're going to have to go MUCH bigger before you should really be getting worried.

    And in all cases, remember that the question remains: who said you had to flush to disk on every action? You don't! You can keep that data in memory until you've accumulated as many changes as you see fit or a time window of your choosing has expired.
     
  8. Offline

    feverdream

    Dont forget, the sheer cost of loading and running a SQL interpreter to parse the syntax for even the simplest of DB queries alone is more work then all of the steps needed to update a flat-file or java Properties data store.
     
  9. Offline

    Toasty

    What about implementing a system where people join your forum to gain membership status, so you don't have to add them individually to a flatfile?

    That alone is worth using a databse for, since all modern forum software stores their user's info in a database (correct me if I'm wrong).

    I see your reasoning against using a database for something like bukkit, but if you have a community and you couple it with a forum, having a database for your users suddenly becomes a very enticing feature. And if you're dealing with even dozens of users, it becomes almost necessary (from a management perspective, not memory allocation).
     
  10. Offline

    hash

    Amen, brother!

    Have any of you out there ever rooted through the source for Postgres, for example? The amount of work it puts into making query plans and then optimizing them into query trees and then gathering statistics on the database and then optimizing them into execution trees... it's incredible. I say incredible in two ways: in the amount of power that it can really give you when you start talking about properly huge data that stretches into the terabyte range; and incredible in the amount of overkill and just plain waste it is for the megabyte or below range. (I know postgres isn't what most folks are using around here, but I'm willing to bet the complexity remains the same across any SQL-based or relational system.)
     
  11. Offline

    Redecouverte

    @Toasty
    i agree
    and also whether a bukkit server takes 70,00001% or 70,0000% of you cpu doesn't really matter.. and when it comes to ram, lol, the naked server needs about 100 meg ram per player, so i guess even 10mb per server more won't hurt anyone
    --- merged: Feb 9, 2011 12:03 AM ---
    let's say we have those 1000 lines, & now i want to get every line where a > 500 and b = "hehe"
    without any database engine, we would have to compare each entry (a db engine will not do that)

    i mean.. there must be a reason big companies use SQL servers for their data , right?

    if google would use text files to store it's data, a simple search request would take hours to get processed
     
  12. Offline

    feverdream

    @Toasty
    Thats a special case, a corner case that does not apply to 99.9% of what we are talking about.
     
  13. Offline

    hash

    As to your claim that a database engine will magically operate on data without actually touching it: horseshit. No discussion of this is possible. You're simply wrong. Take a class or read some source, or just use your common sense. You can't operate on data without operating on it.

    The database will probably do some smart things with that comparison if it has a sorted index based on that field, and it might do some smart things with hashmaps if it has a hash index on that field, but those are both completely non-magical things that you can do to! And the funny part about this: the database system won't make those indexes unless you're smart enough to tell it to anyway. :)

    You've ignored the very clear condition I set out at the beginning of this (and also the part about please don't listen to what a dude in a big company jismed on his blog about without thinking about the logic behind what he says first, which is even more disappointing). Google stores PETABYTES of information. Yes, they use databases. Databases fundamentally make sense at that scale, because that's what they're designed for. The problem is with people who insist that there isn't a difference between a MEGABYTE and a PETABYTE -- hint: a petabyte is 1,073,741,824 times bigger.

    And incidentally, what Google uses is bears very, very, very little resemblance to the sqllite dribble that a person is more likely to see in a bukkit plugin. Google doesn't do SQL (because ACID guarantees are fundamentally at odds with scalability, not that that's really relevant to this discussion), and Google does do distributed.

    Oh, and also, you mean "its", not "it's". The one with an apostrophe is a contraction for "it is"; the other is the possessive form that you meant to use.
     
  14. Offline

    Redecouverte

    well, i'll still use dbs when i want to, call the police if you fell like it:)

    and btw, do you know what sorted lists are?
    now if we know that the 2nd entry has a value of 2, and we look for values <2 .... do you notice something?

    about the grammar, yeah english is a pretty ugly language, i prefer french :)
     
  15. Offline

    petteyg359

    I support this. Especially when three different plugins want sqlite just to store playername:coordsx,y,z and every one of them wants their own special version of sqlite-something.jar in a different folder.

    SpawnControl, MinecartMania, and MyHome: I'm glaring at you. At the very least, pick a standard place to put the damn jar. (MinecartMania may have a valid usage, but the others are just being silly.)

    [​IMG]
    Go ahead and use a RDBMS to store the 200 lines of text you might have on an extremely busy MC server. I'll just stop using your stuff as soon as somebody else does it the right way.
     
  16. Offline

    hash

    That's an interesting situation you bring up, and not to be readily dismissed. However, it's still not a real great argument in favor of databases when you get down to the nuts and bolts of it.

    Both of these applications still have to have the exact same concept of how to interact with the database schema. Relational databases can help (with options like "foreign keys", "no null", and so on), but they can't do everything. Even if you take it as given that "all" modern forums store user information in a database, it's still definitely very far from true that they store user data in the same way. Suppose your forum software has a special range of restrictions on usernames, and your other application doesn't. Suddenly the one application inserts a name illegal to the other, and your system either has a meltdown and crashes completely, or worse yet it may interpret that illegal data in some way that creates a security exploit.

    In a nutshell: databases are not some magic wand that makes applications work together effortlessly. You still have to face all of the exact same hurdles with interoperability that you would using any other format whether it be JSON or XML or YAML or whatever.

    (Databases would solve some of the problems with concurrent modification that would show up in that scenario, though, so they're worth considering for that reason. You can still do all the same things without databases with moderate cleverness, though. It's my opinion that you're probably creating the most stable and maintainable system if you only have one of the two applications responsible for persisting the data, and the other one just talks to that application directly over some kind of API -- but this last statement is now firmly in opinion-land and certainly not applicable to all cases.)
     
  17. Offline

    Redecouverte

    lol, this community is so immature

    okay, so now i learned that databases are useless and everyone should use plain text files
    great
    still i am wondering how companies like oracle manage to make money with their databases
     
  18. Offline

    hash

    Now, gents, we can have this discussion without resorting to oversized images and youtube embedding to flame.

    But Redecouverte, it is a little silly of you to ask me if I know how sorted lists work when my first response to you already described the virtues of sorted indexes and when exactly a database system will use them, and my original post actually already dealt with the exact same issue when I was talking about relational operators. Responding to you has become quite redundant and unconstructive, it seems.
     
    Daniel Heppner likes this.
  19. Offline

    Verox

    Redecouverte, you seem to be missing the point entirely, they are not saying databases are useless they are saying databases are overkill in some situations used in Bukkit. I suggest you take your hands away from your eyes and actually read their posts.
     
  20. Offline

    Redecouverte

    @hash:
    well yes, i get your point there, and i do agree
    but i just don't buy the fact that it all comes down to how much ram a problem needs

    let's take for example popular software like phpBB, or joomla which is all PHP/MySQL based:
    most of these software only need very little ram (like let's say the whole db of a small forum is max 100MB which is way less than the normal amount of RAM a server has)

    why do they use a sql database if they would be so much faster with plain text files?
    can you explain that to me?
    --- merged: Feb 9, 2011 1:00 AM ---
    if you would have read my first post in this thread, you would have seen that i have exactly written that fact there
     
  21. Offline

    Byteflux

    Instead of managing potentially large datasets in memory, I generally approach the slightly-less-than-simple storage issues with SQLite.

    I agree people should be more aware about choosing a storage backend and whether or not they would actually benefit from using an RDBMS.
     
  22. Offline

    hash

    Well, PHP is interpreted, first of all, and something in the ballpark of 300 times slower than java. At that point, can you possibly ever outperform the database (which is running in native code) even on something as simple as a sort? Not so much. So, delegating work to the database in PHP makes sense; the same is not true of a full, powerful language like java.

    Second of all, they're designed for being big forums as well as small forums, and that answers the question right there. Things like phpBB were designed to be able to scale to hundreds of thousands if not millions of users -- and they do. This is an acceptable use of a database. The database is still overkill for small sites, but it does mean the design scales.

    This scaling concern doesn't make sense in any of the bukkit plugins I've seen so far. petteyg359 hit the nail on the head in that regard: SpawnControl can't claim to need that kind of scalability unless the plugin designer is anticipating people who are going to make hundreds of thousands of spawn points... which just plain wouldn't make any sense.

    Oh, and now that I think about it a little longer, there IS a third reason that phpBB and the like tend to turn to databases. PHP applications have a nasty secret that usually shocks developers learning the language after a few months (or maybe a few years if they're slow): they get run in parallel by Apache all the time unless you go well out of your way to configure it otherwise. That can create a very interesting situation for concurrency control, since you can't even control how many or when other processes are started that run your code. On linux, this isn't a problem, because you can use "flock" to coordinate what's going on, but historically it's been a major problem on Windows, which lacked even the most rudimentary filesystem concurrency control until quite recently. So, in order to be cross platform, phpBB would have been -forced- to use a database system. This reason has now evaporated since Windoze has finally gotten its shit together, but I suspect we're still feeling some effects of that inertia in the market.
     
  23. Offline

    feverdream

    First of all: Without talking about my resume, lets just say I have a bit of a better understanding then some of you about what google and company does and does not use, from personal professional experience I have personally seen with my eyes some of the things that you guys are talking about.. and your crazy if you think that bullshit and fallacious arguments will work.

    In NO WAY does ANYTHING bukkit does compare to ANYTHING a company like google does, or requre the sort of DB they do. Databases are created and maintained to solve a problem bukkit does not have except in extremely rare cases like the BigBrother plugin.. and even then at the scale you guys are trying to troll the discussion at they dont work due to the mathmatical limitations of the basic SQL JOIN operation.
     
  24. BigBrother :/?
     
  25. Offline

    Redecouverte

    @hash:
    okay you have sort of convinced me, i even consider removing sqlite from my plugins now
    i'm most likely gonna do that tomorrow :)
    --- merged: Feb 9, 2011 1:24 AM ---
    one last thing i want to add:

    in my eyes a good implementation of what sqlite does, is an "optimized plain text file", i do not know how good sqlite is at that, and it seems a bit bloated (~2MB file) so i agree with you there

    but if you would have a good lib that takes >basic< (not the advanced) sql commands and writes the data to pseudo-text files, that's what i thought i was using

    (whether your own functions or a lib write the data to the file should not make a difference, or would it?)
     
  26. Offline

    feverdream

    This is the ONLY plugin I have seen that needs a DB. It is the exception, not the rule.
     
  27. Offline

    hash

    Hmm. Hadn't seen BigBrother. That might be a bukkit plugin where a database actually makes sense. And at the very least, he's not storing data that it would ever make sense to tweak, so he's in the clear there as far as I'm concerned. At a glance it would seem that access is pretty much only ever sequential, though, and he could get away with pure append (i.e. never overwrite) while it's running... but indeed, for that thing I think working without a database could actually begin to be less effortless. I'm wondering how much information that thing ends up storing when you run it in the wild...does it reach the gig ranges?

    Edit: oh dear, I just saw what's been done with that plugin and how he published tutorials on how to access his schema and that there's web interfaces. This -IS- what it looks like when it's done right.

    Moar Edit: oh, and he has an option for flatfile output for folks like me who like to grep? This is what it looks like when it's done -really- right.
     
    Mazetar likes this.
  28. Offline

    Redecouverte

    i think i might switch to StelsXML: http://www.csv-jdbc.com/stels_xml_jdbc.htm

    it provides jdbc sql acccess to java and writes to xml files
    --- merged: Feb 9, 2011 1:35 AM ---
    yes it does, my bigbrother db already has about 800 mb and the server is only running a few weeks
     
  29. Offline

    Afforess

    Hey now, Minecart Mania has no database support, nothing is persistent except the config, which is all in flatfiles I designed! In fact, I took extra effort to ensure that I would need as little persistence as possible.
     
  30. Offline

    Byteflux

    I would consider SQLite to be still pretty viable for a Minecraft server if you end up needing a way to query your data outside of your plugin (e.g., from a web app). One of SQLite's design goals is to be a decent replacement for primitive flat file storage.
     
Thread Status:
Not open for further replies.

Share This Page