Very strange stability issues

Discussion in 'Bukkit Help' started by Hankscorpiouk, Jan 19, 2013.

Thread Status:
Not open for further replies.
  1. Offline

    Hankscorpiouk

    Hi, few questions for the pros here. :)

    We're having a VERY weird issue with server where it tends to consistently crash every 30 - 60 minutes, all the time, around the clock. This also means when I'm not around, like overnight, it can be down for many hours.

    This is despite setting the "auto-restart" option to true in bukkit.yml


    I've tried so many different startup switches for java and amounts of memory for Xmx etc that I've lost count, I've tried so many different combinations of plugins I've also lost count.


    The error messages are always the same thing, "outoferrormemory" and "unable to start new native thread". Is it worth me including the full crash report or server.log where it fails? Or is this enough?


    I'm pretty sure it's not heap memory that we're running out of, as there's frequently nothing happening on the server at the time of crash (maybe 1 person online, just AFK'ing - if that), also whenever I use the /gc command

    there's always apparently hundreds of MB free. Anyway I was under the impression -xincgc took care of stuff like that, or would we be better using one of the other garbage collectors, if so, which one?
    I don't know if this is the place to mention it, but we also tried Spigot, which threw-up even more errors. Getting a bit desperate here :(



    Anyway, this is our current startup script:

    screen -dmS minecraft java -Xmx1024M -Xincgc -server -jar craftbukkit-1.4.6-R0.3.jar nogui

    (yes we've tried with and without -server and -xincgc - I also tried updating to the new 1.4.7-R0.1 - but that caused even more errors as apparently a lot of our plugins aren't ready for it)




    This is our server.properties:

    #Minecraft server properties
    #Sat Jan 19 18:04:00 GMT 2013
    generator-settings=
    allow-nether=true
    level-name=world
    enable-query=true
    allow-flight=true
    server-port=25565
    query.port=25565
    level-type=DEFAULT
    enable-rcon=false
    level-seed=
    server-ip=
    max-build-height=256
    spawn-npcs=true
    white-list=false
    debug=false
    spawn-animals=true
    texture-pack=
    hardcore=false
    snooper-enabled=false (this has also been true, but with same results)
    online-mode=true (this is best isn't it?)
    pvp=false
    difficulty=3
    enable-command-block=true
    server-name=Unknown Server
    gamemode=0
    max-players=12
    spawn-monsters=true
    view-distance=10
    generate-structures=true
    spawn-protection=0
    motd=[1.4.7] Register @ kewlmcserver.webs.com





    This is our current bukkit.yml:

    # This is the main configuration file for Bukkit.
    # As you can see, there's actually not that much to configure without any plugins.
    # For a reference for any variable inside this file, check out the bukkit wiki at
    # http://wiki.bukkit.org/Bukkit.yml
    settings:
    allow-end: true
    warn-on-overload: true
    permissions-file: permissions.yml
    update-folder: update
    ping-packet-limit: 150 (in desperation I increased this from 100)
    use-exact-login-location: false
    plugin-profiling: false (I've tried true at various times)
    connection-throttle: 4000
    query-plugins: true
    deprecated-verbose: default
    shutdown-message: Server closed - please try again in a few minutes or check our website at http://kewlmcserver.webs.com for more info :) Donating would help us upgrade our server and tools meaning you'd be less likely to

    see this message. Please donate :eek:)
    restart-script-location: start.bat
    timeout-time: 400 (this was 300, but again in desperation I raised it to 400)
    restart-on-crash: true
    filter-unsafe-ips: true (also tried false)
    whitelist-message: You are not white-listed on this server!
    log-commands: true
    command-complete: true
    spam-exclusions:
    - /skill
    spawn-limits:
    monsters: 70
    animals: 15
    water-animals: 5
    ambient: 15
    chunk-gc:
    period-in-ticks: 600
    load-threshold: 0
    ticks-per:
    animal-spawns: 400
    monster-spawns: 1
    autosave: 0
    auto-updater:
    enabled: true
    on-broken:
    - warn-console
    - warn-ops
    on-update:
    - warn-console
    - warn-ops
    preferred-channel: rb
    host: dl.bukkit.org
    suggest-channels: false
    database:
    username: bukkit
    isolation: SERIALIZABLE
    driver: org.sqlite.JDBC
    password: walrus
    url: jdbc:sqlite:{DIR}{NAME}.db
    world-settings:
    default:
    growth-chunks-per-tick: 650
    mob-spawn-range: 4
    item-merge-radius: 3.5
    exp-merge-radius: 3.5
    random-light-updates: false
    aggregate-chunkticks: 4
    wheat-growth-modifier: 100
    cactus-growth-modifier: 100
    melon-growth-modifier: 100
    pumpkin-growth-modifier: 100
    sugar-growth-modifier: 100
    tree-growth-modifier: 100
    mushroom-growth-modifier: 100
    world:
    growth-chunks-per-tick: 1000
    world_nether:
    view-distance: 5
    growth-chunks-per-tick: 0
    random-light-updates: true
    water-creatures-per-chunk: 0
    storm-settings:
    strong-electrical-storm:
    chance: 5
    lightning-delay: 10
    lightning-random-delay: 20
    electrical-storm:
    chance: 15
    lightning-delay: 40
    lightning-random-delay: 150
    strong-thunderstorm:
    chance: 30
    lightning-delay: 60
    lightning-random-delay: 250
    thunderstorm:
    chance: 50
    lightning-delay: 100
    lightning-random-delay: 500
    weak-thunderstorm:
    chance: 75
    lightning-delay: 300
    lightning-random-delay: 1000
    rainstorm:
    chance: 100
    lightning-delay: 500
    lightning-random-delay: 2000





    A few days ago I thought I'd made a breakthrough when discovering our "outofmemoryerror: unable to create new native thread" meant that our stack size was too small and we needed to play with the -xss option.

    Well, that didn't work either. I tried different values for -xss ranging from -Xss2K to -Xss16384. The lower values wouldn't even allow the server to start-up, reporting, "segmentation fault" and the higher values gave us

    the same rash as always.


    However, I also discovered, "/etc/security/limits.conf" and the ulimit command. Writing ulimit -a at the command line usually reveals the following:


    core file size (blocks, -c) 0
    data seg size (kbytes, -d) unlimited
    scheduling priority (-e) 0
    file size (blocks, -f) unlimited
    pending signals (-i) 138240
    max locked memory (kbytes, -l) 32
    max memory size (kbytes, -m) unlimited
    open files (-n) 1024
    pipe size (512 bytes, -p) 8
    POSIX message queues (bytes, -q) 819200
    real-time priority (-r) 0
    stack size (kbytes, -s) 10240
    cpu time (seconds, -t) unlimited
    max user processes (-u) 138240
    virtual memory (kbytes, -v) unlimited
    file locks (-x) unlimited




    I'd read it made sense to increase our open files limit to 4096 and our stack size to 16,384 so I did both - seperately and together. I also updated the limits.conf file with:


    root hard stack stack 16384
    root soft stack stack 16384
    root hard nofile 4096


    Problem is when I lose my SSH connection and log back in again, the old defaults are back; 1024 and 10240. *tearing hair out*

    Going to try increasing stack size to 65536 *clutching at straws*

    I found I could get Putty to send keepalives to the server to prevent the loss of SSH connection but the Bukkit crashes still persist then.





    We're running on CentOS 64 bit on a 1and1 VPS and have recently updated to Java 7, update 11, yes the 64 bit one, but to no avail :(




    These are the plugins we're currently running:


    PlugMan,
    SimpleTips,
    VoxelSniper,
    Buycraft,
    Vault,
    Multiverse-Core,
    SimpleWarnings,
    MCDocs,
    RocketBoots,
    DisguiseCraft,
    Register,
    MultiCommand,
    WhatIsIt,
    TCPack,
    ColorPortals,
    mcMMO,
    FoundDiamonds,
    Citizens,
    NoCheatPlus,
    WorldEdit,
    Towny,
    Courier,
    Multiverse-Inventories,
    PermissionsEx,
    Questioner,
    LWC,
    DispenserRefill,
    SlotReserve,
    WorldGuard,
    boosCooldowns,
    FakeMessager,
    InfinitePlots,
    Lottery,
    CoreProtect,
    FakePlayersOnline,
    ReferGift,
    CraftBukkitUpToDate,
    Modifyworld,
    InfiniteClaims,
    FalseBookCore,
    Vote4Diamondz,
    Freeze,
    InfoMan,
    Essentials,
    MagicCarpet,
    floAuction,
    VillagerBlock,
    ChestShop,
    MultiSpawn,
    FalseBookExtra,
    MobArena,
    FalseBookCart,
    ChatManager,
    FalseBookBlock,
    FalseBook-IC,
    EssentialsGeoIP





    As you can see we have CraftBukkitUpToDate in there and just today it notified us of 4 plugins that needed updating, which we did.


    A lot of those plguins are indispensable to us, and there just aren't enough hours in the day to try removing them 1 by 1 and restarting - as I said it can take anywhere between 30 and 60 minutes for the server to fail,

    occassionally it will even take a few hours. If it's a plugin issue, surely there has to be a better way to quickly diagnose the culprit?

    This also happens with much fewer plugins.




    This is what top command is currently showing (immedately after crash):

    top - 18:37:12 up 50 min, 1 user, load average: 0.00, 0.00, 0.01
    Tasks: 24 total, 1 running, 23 sleeping, 0 stopped, 0 zombie
    Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Mem: 2097152k total, 1471360k used, 625792k free, 0k buffers
    Swap: 0k total, 0k used, 0k free, 0k cached

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    5917 root 15 0 12624 1200 932 R 7.1 0.1 0:00.01 top
    1 root 15 0 10364 752 628 S 0.0 0.0 0:00.62 init
    1127 root 17 -4 12620 672 360 S 0.0 0.0 0:00.00 udevd
    1403 root 15 0 5924 624 508 S 0.0 0.0 0:00.00 syslogd
    1414 root 18 0 62640 1204 644 S 0.0 0.1 0:00.00 sshd
    1423 root 23 0 21660 940 724 S 0.0 0.0 0:00.00 xinetd
    1436 root 25 0 13180 612 476 S 0.0 0.0 0:00.00 couriertcpd
    1438 root 25 0 3672 384 316 S 0.0 0.0 0:00.00 courierlogger
    1446 root 20 0 13180 612 476 S 0.0 0.0 0:00.00 couriertcpd
    1448 root 25 0 3672 380 316 S 0.0 0.0 0:00.00 courierlogger
    1454 root 25 0 13180 612 476 S 0.0 0.0 0:00.00 couriertcpd
    1456 root 25 0 3672 380 316 S 0.0 0.0 0:00.00 courierlogger
    1463 root 24 0 13180 608 476 S 0.0 0.0 0:00.00 couriertcpd
    1465 root 24 0 3672 380 316 S 0.0 0.0 0:00.00 courierlogger
    1518 root 22 0 54164 2308 1760 S 0.0 0.1 0:00.26 master
    1521 postfix 15 0 54228 2276 1764 S 0.0 0.1 0:00.00 pickup
    1522 postfix 15 0 54288 2328 1808 S 0.0 0.1 0:00.01 qmgr
    1579 named 20 0 116m 4040 1884 S 0.0 0.2 0:00.01 named
    1838 root 18 0 46752 816 420 S 0.0 0.0 0:00.00 saslauthd
    1839 root 18 0 46752 556 160 S 0.0 0.0 0:00.00 saslauthd
    3533 root 18 0 97376 4072 3200 S 0.0 0.2 0:00.32 sshd
    3535 root 16 0 12080 1760 1320 S 0.0 0.1 0:00.13 bash
    4028 root 15 0 22844 1312 748 S 0.0 0.1 0:00.04 screen
    4029 root 18 0 1483m 785m 12m S 0.0 38.4 2:06.20 java




    and this is immediately after server start:

    5943 root 18 0 1423m 760m 12m S 97.9 37.1 1:36.23 java
    1 root 15 0 10364 752 628 S 0.0 0.0 0:00.63 init
    1127 root 17 -4 12620 672 360 S 0.0 0.0 0:00.00 udevd
    1403 root 15 0 5924 624 508 S 0.0 0.0 0:00.00 syslogd
    1414 root 18 0 62640 1204 644 S 0.0 0.1 0:00.00 sshd
    1423 root 23 0 21660 940 724 S 0.0 0.0 0:00.00 xinetd
    1436 root 25 0 13180 612 476 S 0.0 0.0 0:00.00 couriertcpd
    1438 root 25 0 3672 384 316 S 0.0 0.0 0:00.00 courierlogger
    1446 root 20 0 13180 612 476 S 0.0 0.0 0:00.00 couriertcpd
    1448 root 25 0 3672 380 316 S 0.0 0.0 0:00.00 courierlogger
    1454 root 25 0 13180 612 476 S 0.0 0.0 0:00.00 couriertcpd
    1456 root 25 0 3672 380 316 S 0.0 0.0 0:00.00 courierlogger
    1463 root 24 0 13180 608 476 S 0.0 0.0 0:00.00 couriertcpd
    1465 root 24 0 3672 380 316 S 0.0 0.0 0:00.00 courierlogger
    1518 root 15 0 54164 2308 1760 S 0.0 0.1 0:00.28 master
    1521 postfix 18 0 54228 2276 1764 S 0.0 0.1 0:00.00 pickup
    1522 postfix 15 0 54288 2328 1808 S 0.0 0.1 0:00.01 qmgr
    1579 named 20 0 116m 4040 1884 S 0.0 0.2 0:00.01 named
    1838 root 18 0 46752 816 420 S 0.0 0.0 0:00.00 saslauthd
    1839 root 18 0 46752 556 160 S 0.0 0.0 0:00.00 saslauthd
    3533 root 15 0 97376 4072 3200 S 0.0 0.2 0:00.38 sshd
    3535 root 15 0 12080 1764 1320 S 0.0 0.1 0:00.15 bash
    5942 root 15 0 23008 1516 748 S 0.0 0.1 0:00.01 screen
    7173 root 15 0 12620 1124 856 R 0.0 0.1 0:00.00 top
    7178 root 19 0 9848 1228 1048 R 0.0 0.1 0:00.00 sh
    7179 root 21 0 0 0 0 Z 0.0 0.0 0:00.00 stty <defunct>



    (37.1% mem for Java)




    The periodicness of the crashes made me wonder if it could be something related to cron, but we stop that service everytime we restart our VPS container, along with httpd, before launching CraftBukkit.


    This problem has been going on for months now, and I'm at a loss to explain it. Does anyone have any ideas? Also, if anyone with any expertise in these matters would like to join our server as staff, please feel free to

    let us know - we need all the help we can get :)

    I'm also willing to give VIP memberships to anyone who can help us finally resolve these issues.

    Apologies for the length of this OP, but I wanted to make sure my report was as comprehensive as possible and I didn't leave anything out.

    Regards,
    Hank.
     
  2. Offline

    LaxWasHere

    This is a bukkit support forums, not spigot.
     
  3. Offline

    Hankscorpiouk

    Fine, let's pretend I didn't mention Spigot. Thank you for your contribution.

    As I said we're running CraftBukkit R0.3 (build 2586)

    Just found this in the logs...

    19:11:52 [INFO] quant99 lost connection: disconnect.quitting
    19:11:52 [INFO] Connection reset
    >mem
    19:12:05 [INFO] Uptime: 10 minutes 6 seconds
    19:12:05 [INFO] Current TPS = 20.0
    19:12:05 [INFO] Maximum memory: 990 MB
    19:12:05 [INFO] Allocated memory: 700 MB
    19:12:05 [INFO] Free memory: 385 MB
    19:12:05 [INFO] World "world": 256 chunks, 82 entities
    19:12:05 [INFO] Nether "world_nether": 0 chunks, 0 entities
    19:12:05 [INFO] The End "world_the_end": 0 chunks, 0 entities
    19:12:05 [INFO] World "HankCreative1": 256 chunks, 0 entities
    19:12:05 [INFO] World "NewSpawn": 256 chunks, 0 entities
    19:12:05 [INFO] World "HanksCreativeWorld": 256 chunks, 44 entities
    19:12:05 [INFO] World "HankCreativeWorld": 256 chunks, 0 entities
    19:12:05 [INFO] World "creative_World": 256 chunks, 34 entities
    19:12:05 [INFO] World "spawncreativemain": 272 chunks, 21 entities
    19:12:49 [SEVERE] Exception in thread "Listen thread"
    19:12:49 [SEVERE] java.lang.OutOfMemoryError: unable to create new native thread
    19:12:49 [SEVERE] at java.lang.Thread.start0(Native Method)
    19:12:49 [SEVERE] at java.lang.Thread.start(Unknown Source)
    19:12:49 [SEVERE] at net.minecraft.server.v1_4_6.NetworkManager.<init>(NetworkManager.java:68
    )
    19:12:49 [SEVERE] at net.minecraft.server.v1_4_6.PendingConnection.<init>(PendingConnection.j
    ava:33)
    19:12:49 [SEVERE] at net.minecraft.server.v1_4_6.DedicatedServerConnectionThread.run(Dedicate
    dServerConnectionThread.java:86)



    So just a few seconds after saying /gc and apparently having about 300MB free, we get the outofmemoryerror thing again....

    EDIT by Moderator: merged posts, please use the edit button instead of double posting.
     
    Last edited by a moderator: May 30, 2016
  4. Offline

    LaxWasHere

    You see any people asking apple for Windows support? I thought so.
     
  5. Offline

    Hankscorpiouk

    How am I doing anything like that? This forum is for Bukkit right, well, that's what I'm running and asking for help with!

    Now would you please stop trolling my thread?
     
  6. Offline

    Elite6809

    The only thing I could suggest doing is starting from scratch with the plugins. Keep the config files and work from the ground up with the plugins. It may be one plugin that hasn't been updated in a while which is having trouble with Bukkit. When I was a developer on a server we kept getting weird JNI exceptions in the log which caused everyone to stop receiving any packets for 30 seconds (that's what seemed to be happening - I'm not the best with Wireshark). When we deleted all of the plugin.jar files and re-downloaded them the problem went away - and in the process we realized that we weren't using at least 10 of our plugins, so we cut our plugin count from nearly 40 to 24 or so.
     
    Hankscorpiouk likes this.
  7. Offline

    TnT

    Locked. You are not running CraftBukkit.

    If you wish to use other server software, seek out that community for assistance.
     
Thread Status:
Not open for further replies.

Share This Page