Tuesday, 21 June 2016

SMB sharing

Recently while at home I thought a good way to use one of my idle systems would just be for some storage space for windows clients, for things like backups media sharing or whatever.

Following http://www.oracle.com/technetwork/articles/servers-storage-admin/solaris-zfssmb-sharing-2390458.html is largely simple. Only thing I have trouble with is modifying the path that appears client side with the syntax of the sharesmb property. I've also been wanting to know more about the property 'vscan' as if you've got windows clients housing documents in this centralized location may as well do some checks.

The docs on vscan don't give enough information on how to use this; you can configure, install the packages and enable the service but you must have some scan engine IDs which I cannot find reference on how to get those IDs. I'll have to look into that missing part.

Still waiting on my replacement x99 board, at least a few weeks to a month I've been told...once I get this I'd probably do shadow migration or simply send a snapshot over to the rebuilt system.

I also setup on the windows 10 client side file history with the SMB volumes and it was a pain to do. For some reason it kept on crashing and I had to disconnect and re-map the network drive, reset the config on file history etc before I managed to get it working to automatically backup windows files and specific paths/folders to the drive. While at it I setup copies=2 on some more important data server side. Perhaps I should also find out how I can setup autosnapshots in addition. Next task could be to configure some headless vbox windows hosts for people to use when needed.

Sunday, 22 May 2016

Unusual disk failures

Drives always seem to be failing or having problems. More often than I gave credit for. I had 3 drives do this at the same time just during a simple archive creation process. I wonder if I had a bad batch, then again the more drives you have the higher chance something will fail.

        NAME                         STATE     READ WRITE CKSUM
        data                         DEGRADED     0     0     0
          raidz2-0                   DEGRADED     0     0     0
            c0t50004CF210AD1C22d0    ONLINE       0     0     0
            spare-1                  DEGRADED     0     0   249
              c0t50004CF210BE51F1d0  DEGRADED     0     0    70
              c4t0d0                 ONLINE       0     0     0
            spare-2                  DEGRADED     1     0     2
              c0t50004CF210BE51F3d0  UNAVAIL      0     0     0
              c4t1d0                 ONLINE       0     0     0
            c0t50004CF210BE5214d0    ONLINE       0     0     0
            c5t3d0                   ONLINE       0     0     0
            c4t3d0                   ONLINE       0     0     0
          c4t1d0                     INUSE  
          c4t0d0                     INUSE  
  NAME                       STATE     READ WRITE CKSUM
        rpool                      DEGRADED     0     0     0
          mirror-0                 DEGRADED     0     0     0
            c0t500A0751F0096E9Ed0  DEGRADED     0     0   196
            c0t500A0751F0097DA7d0  ONLINE       0     0     0

I attempted reading some more and ...

         NAME                       STATE     READ WRITE CKSUM
        rpool                      DEGRADED     0     0     0
          mirror-0                 DEGRADED     0     0     0
            c0t500A0751F0096E9Ed0  DEGRADED     0     0 1.00K
            c0t500A0751F0097DA7d0  ONLINE       0     0     0

so 2 are degraded due to checksum errors attempting to read data back, other drive just seems to not be powered on at all. (L.E.D. on front inactive) why?

I'll re architect the data pool, first I'll test out autoreplace and find out how it works. (I assume simply take disk out, put in new then all done). Will depend on HW support as well so, best test this. Made a comment in the Oracle Community - https://community.oracle.com/message/13836284#13836284

zpool get autoreplace data
data  autoreplace  on     local

in the end the Mobo was simply faulty, (went on fire) beside some small chips by the LSI SAS controller next to heat-sink.

Wednesday, 11 May 2016

OpenStack meeting

Came back from Fujitsu HQ in London, hearing some of the problems faced by the community and made a few contacts.

A couple of presentations were made by various customers/companies and it looks to be used in some various ways. Had an Ubuntu guy showing this ontop of his laptop with KVM + LXD  also known as  "LXC 2.0" with ZFS underneath. I noticed he had 28% fragmentation on his zpool (only 1 vdev) which seems a little odd to me and wondering why that is. Running many of these LXD "containers" he used some lxc command to take snapshot (ZFS underneath) then something like rm -rf / and was after able to recover this from snapshot. https://insights.ubuntu.com/2016/03/22/lxd-2-0-your-first-lxd-container/ 

So: Ubuntu - KVM - Systemd - LXD - ZFS - OpenStack

Unusual viewpoint that seems backwards to me from this Ubuntu guy: "create lxd containers, each with different OpenStack service running per container to then run OpenStack on top of this?" - Reason was if you have more systems running and want to do some upgrade from distribution you can migrate those OpenStack services within those containers to another machine then do upgrade for example and migrate everything back... or if disk fails, memory fails etc...

I get the feeling this is probably being done over complicated for one. Maybe it can be done easier but I haven't really played with how Ubuntu is doing these things.

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators so could be handy to discuss with some of the developers maybe and look into more user cases, many people around the world are working on this and contributing. I reckon it'll be easier to see what they've done and go from there.

I heard problems after mass deployment across 1,000+ linux hosts with KVM attempting upgrades across systems. From what I heard described it could be people rushing to use this and implement without first thinking longer term. What problems could we encounter? How to we manage upgrades? How will this scale?

It appears might be possible to use something other than Neutron for SDN (software defined networking) We were shown some demo but didn't tell much. (more about pretty bubbly GUI) http://www.nuagenetworks.net/  https://www.youtube.com/watch?v=OjXII11hYwc&feature=youtu.be

Apparently it was a big problem and no clean direct upgrade from Nova to Neutron. Required "entire rebuild" Nova is deprecated - http://docs.openstack.org/openstack-ops/content/nova-network-deprecation.html

Also Fujitsu are creating a new type of software based from OpenStack. called "K5" doing "Iaas + PaaS" with 200+ more APIs than base OpenStack. Taking this on as an internal Global product to attempt saving millions in the process. (All done on Redhat & CentOS, not anything else.)

It looks like everyone around is thinking around the same areas in this Cloud and Container space then how to make money out of it across larger scales of various customers. If so much focus is directed to this are we therefore missing on other aspects around that are happening? I will be following up on this more.

Monday, 2 May 2016

Openstack beginnings

Lately been looking into using OpenS. From what I have gathered it looks to be better integrated into Solaris than Linux, although more up to date versions are more easily available on Linux to get hold of.

Part of the OpenStack service "Glance" requires .uar (Unified Archives) for host deployments and so it is probably a preferred method to use .uar for installation of the zone/kernel zone across systems as well to keep this the same everywhere.

I'm thinking that it'd be good to practise re-install a few times, reverse engineer what is inside the publicly available .uar from Oracle which we're using as a test bed. It was generated using 11.3 GA and further steps haven't been described in much detail. I want to customise the install to be lightweight and only contain what we need to make deployments faster so it'll be easier to scale at the same time when and if we get further down that road.

I also will have to get a bit more used to the front end interface and think about what kind of "flavours" we could also configure for use. (type of zone + resources). I was surprised to discover a bug present that no-one looks to have found using the Archives for installation, in the manifest file to install you have a section like

<software_data action="install">
<name>{deployable system name}</name>

I was unsure what this <name> tag is for, I figured maybe zone name but turns out this is for the Deployable System name, the bug means this much match the name in the manifest for the archive otherwise it will fail to install with no useful output in the install log for why it failed. "list index out of range" it should work with any name but is recommended to match the same as the .uar file which is simple to check with archiveadm info <name of .uararchive file>. I further found other documents that I've had Oracle correct with minor typos on archives. 

Need to find out what different packages and services are required and how to configure these for the different node types, how to have this prepared out the box. Also want to setup some options on FS like compress and atime off where possible. One simple problem atm is trying to install from an archive I get an "ERROR: Archived zone oscn-uar-kz has no AI media" hmm... "Archive creation failed: Failed to locate AI media, --exclude-media may be used" but I cannot create without -e due to that... tried another zone and same error...

Friday, 26 February 2016

Ed Oates, from Oracle

Useful insights from one of the Oracle Co-founders https://vimeo.com/30929523

I think this part of the slide has a few good key points, as he describes in the vid. I found the first half the best and the second half a bit more business orientated on some specifics ( such as patenting )

Wednesday, 3 February 2016

Great Server hardware

Feast your eyes on these:

This 90 bay 4U monster http://www.supermicro.co.uk/products/chassis/4U/946/SC946ED-R2KJBOD.cfm  and it is just over 102 kg!

Go back around 10 years first one I know of is this http://techreport.com/blog/13849/behold-thumper-sun-sunfire-x4500-storage-server 

48 NVMe? - https://www.supermicro.com/products/system/2U/2028/SSG-2028R-NR48N.cfm


Thursday, 7 January 2016

Nvidia, raising standards

Recently looking into the changes from the current card I've got (Nvidia GTX 680) vs the newer ones for the hardware and architectural changes is interesting and where the next phases lead.

This public doc is good as a showcase for some of the main reasons Kepler-GK110-Architecture

and compares what the goals and changes that have been done from Fermi to Kepler. I quite like some of this including the Atomic operation improvements for one.

Gives some simplified points then in the next going from Kepler to Maxwell - Top 5 things to know about Maxwell

Overview - Maxwell

This is great because you can be a card now with almost 1,000 cores for under £100 without requiring separate power connectors either. So saves money on buying a PSU to support it as you had to before and the electric bill will be lower. Doing this on a large scale simply means many more people will find upgrading their older systems/desktops more of a viable option whilst in the HPC and other distributed computing projects can spend less £ and save running costs on a larger infrastructure and now combined with this Micron 8GB GDDR5 are working to produce greater amounts of memory that can be used with a graphics card, combine this with Nvidia improvements will also benefit those who love high end games will probably like to transition to 4K monitors. So much more processing required across many more pixels, we also have things like new video compression/decompression HEVC being introduced as well. Just need GPU to decode...

GeForce 1000 series  - the next gen, named "Pascal"

Also this NVLINK is great! - nvlink  what is NVlink?"5-12x higher bandwidth" etc

Excellent news for HPC, I'd like to see progress made with nuclear fusion reactors for defo. - exascale-supercomputing

The other point to mention is the clock speeds are lower in the newer architecture whilst still giving better performance. As the frequency of a clock is higher the amount of heat generated is higher too so therefore cooling in this case will be less noisy (if using a fan) as it won't be as hot. A CPU or GPU that runs at a higher clock rates require substantially more power. i.e. a 4GHz core will use much more than double power of a 2GHz core. (If assuming cores are of the same architecture). Can always look up "cpu clock speed vs power consumption" along with the charts, docs etc if anyone disagrees.

Forgot to mention that it is also helpful to programmers & developers.. and it is recommended you read all the way through this doc due to some key considerations - Is parallel programming hard?