Thursday, February 23, 2012

Scrum can lead to BAD software and engineering too....

Scrum


Scrum if you didn't know by NOW, is essentially micro-managment of software engineering at a  high level and of software engineers; work on these small prioritized tasks within estimated time constraints ranging only in hours, report daily on the progress and any issues so that they can immediately be addressed.  Cue up a variety of metrics that attempt to get more from the process then is put into it,  violates fundamentals of thermodynamics but keep trying: Velocity is killing Agile! This falls out from several principles of what is known as the "Agile manifesto" where Scrum was formed as an empirical process control model covering each principle in it's own way.  There is of course much more to it but boiled down that's the 1000 ft view on purpose and history.  I say micro-management in a good way here, it's tough to build complex software while forecasting delivery under modulating customer requirements if everything is allowed to run amok.  The rest of what Scrum promises however (product mgmt with a more accurate gauge of release and feature dev control) and BETTER ENGINEERING is highly debatable.

I'm pro Agile and Scrum really, but I don't think it's the answer to every problem or is in fact even an SDLC though many treat it as one.  There are also agile engineering practices to consider which Scrum does NOT address though it's great at measuring SOME of the results.

Scrum evangelism,  borderline idiocracy


Most of what you hear from Scrum advocates (esp newly minted ones) are some pretty broad characterizations about more traditional SW engineering practices that fall outside of what they have come to perceive as "agile".  Scrum offers a relatively easy to understand and implement process that can work.  The downside is evangelism behind Scrum that must somehow include as example by contrast the myriad of bad engineering practices and flawed management models that must have existed before; or anything else used today for that matter.  The truth is that MANY very large, robust, and successful projects have been built and sustained using Waterfall, Spiral, or something simply done using solid system engineering practices, RUP comes to mind.

What's worse is that many Scrum practitioners confuse engineering activity and practices with the Scrum process control, NOT THE SAME!  It should be clear that "value" in the Scrum sense relies primarily on practices not outlined in any manner by Scrum itself.  A  lot of what is in the engineering domain that Scrum proscribes, comes from XP (Extreme Programming) another Agile methodology.


Scrum is better then....


 Usually it's the dreaded waterfall, bane of all human endeavors! I challenge anyone that claims "Waterfall" as a failed model to describe it and point out that it must be anything other then employment of proven design, analysis, and development techniques that can be iterative and "agile" in many respects. I'll explain....

Waterfall is an SDLC, and Scrum is not!!  SCRUM is not an SDLC or project management methodology!

The proof of this is that it simply doe not illustratively inform on, design, architecture, release planning, QA process, and maintenance.

 In fact take a look at wikipedia's page on waterfall and you will find that the model described, requirements->design->implementation->verification is essentially what ALL development models adhere to no matter what they exactly call themselves.  It's only the level of scope and iteration between phases that defines a more agile approach.  Despite some shuffling with the verification you can't very well implement software without having some design that must have come from requirements, no? Even breaking down a goal or user story to something that can be developed is in effect design though it may not include specific technology.

Here is an SDLC that looks pretty much Waterfallish and I think it's easy to see how Scrum could be used inside such a lifecycle Design->Implement->Test.



If I were to make those arrows bi-directional and collaborative across roles I end up with something akin to Kanban, another agile process for software development.

Scrum is a tool realizing Agile principles but not the only one. 


There is a fair amount of largely non descript software development methods in industry leveraging from various methodologies while avoiding what doesn't fit.  I was recently turned on to Steve Yegge's (Google) blog post  good-agile-bad agile  demonstrating successful emergent practices.  In another more recent article agile-hybridization Christopher Goldsbury discusses not only hybrid approaches using Agile philosophy, but some great examples where Scrum is not a good fit for adoption or success.


 Scrum succeeds in being easy to understand and fairly rapid to put into place.  At the same time, the failure of Scrum to achieve desired results might be more nuanced and difficult to pinpoint.  On considering the Adoption of Scrum, I work in a highly matrixed environment where we move between projects ranging from pure research to near production level.  Thank God becasue it keeps me away from full time Scrum and endless Sprints.  The diversity of people's roles  and their locations are also in flux at MIT, no one size fits all methodology is going to succeed there.  Ken Schwaber's book, "Agile Project Management With Scrum" has some great stories about the failures encountered in bringing Scrum to new clients.  However;  it seemed that every failure listed was was attributed ONLY to mistakes in implementing Scrum and no exploration of other possibilities, admittedly it's a book about Scrum after all.

Monday, February 20, 2012

Accumulo on Mac OSX

Quick point of interest for anyone wishing to run Apache Accumulo on a mac.  You need to have three Apache projects (Accumulo, Hadoop, Zookeeper) installed for this to work, and luckily they all work flawlessly on mac.

Download Accumulo incubation src here, build using Maven as directed in the README. Move the distro someplace you will run everything from, /usr/local; /opt; or whatever you pefer.  A hadoop user is usually created to run all the services but I'm just using my local self and directories.

accumulo

Next get hadoop version 0.20.2 as also recommended by the accumulo team in the README.

For the user you select to run hadoop, zookeeper, and accumulo you will be setting the following common variable for the shell that will be used to start all the services. There is an option to add this to each of the services config so you must know the value in any event.

Add this line to (or its output) to the file $HADOOP_HOME/conf/hadoop-env.sh


export JAVA_HOME=$(/usr/libexec/java_home)

I chose to run in pseudo distributed mode as it mirrors cluster setup somewhat (cluster of 1). For that you will need to modify three files in the hadoop config directory $HADOOP_HOME/conf

*NOTE This caused a race condition that ran my processor at 90%, suggest if this occurs to use non distributed approach.


core-site.xml
======================

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<!-- Put site-specific property overrides in this file. -->


<configuration>
   <property>
     <name>fs.default.name</name>
     <value>hdfs://localhost:9000</value>
   </property>
</configuration>
~


mapred-site.xml
=======================

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<!-- Put site-specific property overrides in this file. -->


<configuration>
  <property>
     <name>mapred.job.tracker</name>
     <value>localhost:9001</value>
  </property>
</configuration>
~              



hdfs-site.xml
========================

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<!-- Put site-specific property overrides in this file. -->


<configuration>
   <property>
      <name>dfs.replication</name>
       <value>1</value>
   </property>


</configuration>


~

Make sure the both masters, and slaves, list localhost as the only value.

You will need password-less ssh configured for your "nodes" to talk to one another. Since the node is the same host, we enable that here only. Open up system-preferences and change the following setting under Sharing. I am  running all this as myself for dev purposes but normally there is a hadoop user as previously mentioned, if you did it that way obviously set up ssh for him.



 Now that is done, create the keys for you user in the user home dir and add them to the keyring as follows.


$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Your identification has been saved in /Users/cwyse/.ssh/id_dsa.
Your public key has been saved in /Users/cwyse/.ssh/id_dsa.pub.
The key fingerprint is:
c6:60:39:90:d7:d4:08:43:7e:d0:f1:5e:00:e4:1d:f5 cwyse@Chris-Wyses-MacBook-Pro-2.local
The key's randomart image is:
+--[ DSA 1024]----+
|    .o=*==o..    |
|    .o.=+o.o .   |
|     .* o o . E  |
|     . = . .     |
|        S .      |
|       .         |
|                 |
|                 |
|                 |
+-----------------+

$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys



Now when you $ssh localhost you will be connected without prompt for a password.

Next format the name-node using the following command from $HADOOP_HOME

$ $HADOOP_HOME/bin/hadoop namenode -format


You'll see some positive output listing your storeage directory and a shutdown, mine was.


Storage directory /tmp/hadoop-cwyse/dfs/name  


Now run 
$HADOOP_HOME/bin/start-all.sh


Some more good happy things happen, then go to this URL; http://localhost:50070/ and you will see something akin to this.






Congrats, we are half way there!


Now you will need zookeeper version  > 3.3.0


zookeeper


Installing zookeeper OOB is easy, I'm sure there are a ton of configurations in both hadoop and zookepper but this is just to get up and running with an accumulo shell so that you can being basic development on a cluster like setup. 


In $ZOOKEEPER_HOME/conf there is a sample config file, I copied it to zoo.cfg and ran with that option no changes.  


Quick connect to the service then shut it down.



$ZOOKEEPER_HOME/bin/zkServer.sh start
$ZOOKEEPER_HOME/bin/zkCli.sh -server 127.0.0.1:2181
Zookeeper will only need to be running in standalone mode.

The accumulo setup itself is straight out of the README.
Modify $ACCUMULO_HOME/conf/accumulo-env.sh by copying the *.example file and changing your env variables to JAVA_HOME and the locations you installed hadoop and zookeeper in. Copy the accumulo-site.xml, master and slave examples in place as well. Make sure hadoop and zookeeper are both running as described above.

Run 
$ACCUMULO_HOME/bin/accumulo init
to initialize the accumulo hdfs structure and setup instance and credentials.
Output upon should look like this.
$ACCUMULO_HOME/bin/start-all.sh 
Starting tablet servers and loggers .... done
Starting tablet server on localhost
Starting logger on localhost
Starting master on localhost
Starting garbage collector on localhost
Starting monitor on localhost
Starting tracer on localhost
I've yet to play with this much beyond writing this out so please let me know if there is something amiss.

Credits really go out to the Accumulo, hadoop, and zookeeper documenters as well as Chuck Lam's excellent "Hadoop In Action" Manning publication.