Practical Cloud Computing

Sunday, February 2, 2014

Metromap Layout Engine for Interactive Browser Visualizations

Finally finished my own right-angle metromap layout for interactive browser visualizations.

For now I have it as part of Chrome Extension -- see the code which is actually a working Chrome extension at this GitHub Project Page. README in the project also explains what to feed the engine. The data structure is actually very simple.

Now I can finally finish my Rebot, properly this time.

Wednesday, January 1, 2014

PHP5 as the Current Best Mini Web Server

Quite often today ... especially in cloud environments or cloud-like settings (will not explain the latter term), you want to do away with the traditional Apache (or LightHTTPD, etc.) + PHP setup in favor of ...

-- a small,
-- portable
-- PHP-capable
-- web server.

The following figure will show you a standard usecase for such a setting.

You have 2 locations that need to talk to each other over HTTP. Since basically all cloud (and others as well) APIs today are HTTP -- or RESTful if the use a fancy word, this is quite a common case. The easiest way to do this is:

(1) To use wget (here is a link for Windows native version but I personally use Linux and cygwin versions) when you need to initiate a request to the other side, and
(2) To listen to requests from the other side using your mini web server.

Never mind the ports and all such details. What's important is the model used by the pair to communicate in both directions.

Now, specifically about PHP. This is a crucial element. I hear some people still use Java. I have nothing against these people but my personal opinion is that PHP is a much better .. flexibility and otherwise ... programming environment. I migrated to PHP 7-8 years ago and never doubted this decision.

Now, you might think that there is a whole bunch of mini web servers? Nope. Even Nope-i-er when it comes to Windows. I personally went through:

- TinyWeb Server -- it has a PHP version which simply does not work. It does run but you cannot get $_GET, $_POST and some other important stuff to work. Was very disappointed because this was the most promising version.
- XAMPP Portable. There is nothing portable or mini about this server as it is half-a-Gigabyte when decompressed. Could not get it to work either. First, Apache complained, then PHP did not run, etc. Had to quit on that one, too.
- Lighty2Go. Very promising name. Very disappointing results. Specifically no PHP support.

To cut a long story short, the solution was in plain sight since PHP 5.4.x was released. This version of PHP has a mini-server mode. Have been using it for a week now and have had zero problems. After all, it is about time PHP found out a way to serve its scripts without needing to go through Apache.

To run the server, you simply run:

php -S localhost:[YOUR PORT like 8001, etc.] -t /path/to/web/root

Some side notes on the convenience of this solution:
(1) I already have PHP everywhere. Windows native, Windows/Cygwin, Linux, virtual machines, added to XCP hosts, etc. Meaning that I get the server without any additional work.
(2) Easy to start and stop it. Temporary solution is exactly what we need in cloud applications. You never have anything permanent.
(3) PHP Server can now become a command line tool like wget. Again, simpler to understand and use. I enjoy having access to Cygwin environment while running the server.

Strongly recommended.

Friday, December 27, 2013

How to Build a Robot in Browser

Robot, bot, browserbot, botser, botowser? I actually used a ReBot -- REcommendation Bot before. Many possible names, the same meaning -- you need a software robot running in your browser.

Why browser? Some of the reasons are:

REASON 1. The stuff you will feed to your robot is in the web -- something you view in your tabs or possibly something that your robot will open and read on its own (possible!, done that already!),

REASON 2. Final results of your robot's digestion tract (I avoid saying poop, obviously) are also web-based -- the closest example is storing your stuff in cloud drive, where I personally use the Dropbox JS API which I wrote myself (comes with the code below).

REASON 3. You need your robot to possess maximum achievable compatibility, where web technologies is obviously the way to go. What did the Firefox guy recently say about Firefox OS release? -- "All other platforms are beautiful rose gardens surrounded by unreasonably high fences".

REASON 4. ... fill in with your personal reasons ... of which I have a couple but will not write them here to stay focused on the main topic.

So, what do you need to write a robot? Not much, it comes out. See the points below.

POINT 1. Use Chrome. See illustration below about how chrome designs its extensions. Firefox and other browsers have their own designs, but I find Chrome the easiest to use. In, fact I failed to get an example Firefox extension to work in the first place. Does not indicate my stupidity... it indicates how clumsy the design is -- take my word for it. Besides, Chrome makes it easy to debug extensions. You can open consoles for each of the three below components -- float, inpage and background.

POINT 2. About extension components. Use them wisely, meaning "in accordance to their purpose". *.bg.js (background) will start running immediately when the extension is loaded. *.inpage.js will run on each page which matches your prefix. *.float.js will show the pretty (hopefully) GUI when user clicks on the icon that shows up in the browser. Yes, you can create a pretty icon for your extension.

POINT 3. If you work with one URL prefix, it is easy -- just write the matching rule in your manifest.json. However, if your robot wants to digest many different pages (like mine does), then you need to write the matching rule as "matches":["http://*/*"] (exact line in manifest.json) and fork into individual processors from inside your *.inpage.js. Increases complexity, but totally worthwhile given the increased scope/coverage -- ultimately you are building a Swiss knife of web page parsing.

POINT3. Do not be afraid to throw all the web technology you have at your problem. Specifically, note the following unique aspects of Chrome extensions. (1) you have DOM in all three components, ... it would seem that there should be no DOM in *.bg.js but there is (!) one .. DOM is important when you need to set timers to ... hm ... time/pace things such as Dropbox accesses, Google Map requests, etc.... I use jQuery's Timer extension which needs a DOM to work. (2) you are free of Same Origin policy inside your extensions so contact whatever service you want... I use Dropbox for cloud-side storage, for example. (3) you can work with all the advanced features such as jQuery (load its JS from manifest.json), CANVAS/SVG, local storage, etc.

That's pretty much it. You can see my working testcases at this GitHub repository. It is a ready-to-use Chrome extension which you can load in its current form -- just point Chrome to this folder and tell it to load the extension. The icon should pop up in the bar immediately. Obviously I have no background or inpage scripts, but the testcases will show you where I am shooting at.

I will not go into details about the serverless.js file which contains all my custom components including those of GUI nature. The extension has several working testcases. Some of them are not finished yet ... like Stringex and possibly CloudStorage, but the rest will work. I am particularly proud of NICECOVERring and SidePaneStack components which I will use as low-level QUI components. Those definitely work and you are free to play around with them.

Tuesday, December 17, 2013

StackEdit for Markdown (.md) in Google Drives ... kind of works

UPDATE 2014/01/22 later that day. -- it looks like it still works but there is now a checkbox in StackEdit settings that says "Markdown Extra/GitHub Flavored Markdown syntax". If it is ON, the HTML inside .md is ignored. If you UNCHECK it, it works as is described below.

UPDATE 2014/01/22 -- This kind-of-works has recently turned into does-not-work. StackEdit obviously stopped recognizing HTML mixed in with your .md text. I used to have pretty looking pages and now they all reverted to custom format. There could be a workaround -- possibly adding the STYLE tag into the templates in StackEdit setting. I will try that later.

====

Yep. Kind of works. Meaning that it does work entirely satisfactor... torrrily .. is that a word? ... but has some flaws when it comes to sharing.

Specifically, PROS:
--------------
- you can work with content which is viewable in browsers. Literally with HTML. You do not have to type in HTML because .md has some custom behavior based on some syntax you use in text (do a search on markdown), but if you do type in HTML it will interpret it correctly;
- it is easier when writing notes (meeting minutes, manuals, etc.) when its closest rival -- Word, including the one in Google Drive. I hate to type docs I create on the fly in Word. See the example below for an example *pretty doc*.
- Print-to-PDF is excellent! Keeps all the formats, colors, etc. These docs are pretty!

Now, CONS:
----------
- a bit sluggish because it is not native to Google Drive
- viewing mode is one button away but the default is a split screen (input versus WYSIWYG) -- not good when you share your doc with others
- sharing is painful. Even if your share-ees have Google accounts, they have to walk through the installation process to enable StackEdit on their accounts. The biggest pain is that people who do not have Google accounts cannot view your .md docs ... at ALL. I still get a couple of those among my friends. Ended up creating a PDF for the file and sharing that.

Now, this blog is on a specific topic related to .md files. Default .md behavior is not what you might expect. For example, H1 is too big with huge margins, there is no italic, now underlined text, etc. However, including a small STYLE section at the beginning of your .md file will take care of all that. Think of this as a FORMAT stab which you an copy across a subset of your documents.

Here is mine:

STYLE -- enclose in angle braces as is normally done in HTML
em { font-size:larger; font-weight: bold; font-style: normal; text-decoration: none; }
strong { font-size:larger; font-weight: bold; text-decoration: underline; }
h1 { margin: 20px 0px 5px 0px; font-size: 20px; font-weight: bold; background-color: #ccc; padding: 3px 0px; }
h2 { margin: 20px 0px 5px 0px; font-size: 20px; font-weight: bold; background-color: #f00; padding: 3px 0px; color: #fff; }
code { background-color:#ccc; }
hr { border: 1px solid #999; margin: 5px 0px 10px; }
/STYLE

It looks like this in on the left size of your StackEdit:

And like in WYSIWYG on the right side (one of my current docs):

The syntax the style of which I altered is:
1. `code`
2. *stress*
3. **more stress**
4. # normal header/section
5. ## red header / section

For the rest of default syntax see a Markdown howto.

Tuesday, October 1, 2013

Markdown as a Native Google Drive Doctype

I have been maintaining some presence in GitHub for awhile now. For those who know, GitHub -- among many other programming-related portals -- uses .md aka Markdown format for readme files. Markdown is really handy for structured text. In fact, I prefer it to a Word file.

Now, let's say that you want to work on a file within a community. For this you would normally share a file -- say, make it public in your Google Drive -- and let other people edit it on their side. Works just fine with all the traditional Google Drive formats. Not with .md files until now.

Actually, this is a lie. Apparently, Google Drive DOCTYPES can be extended. See this:

That StackEdit doctype in the list is what I found in the list under "Connect More Apps". There list of special doctypes is very list, actually. I did not really look at all the others.

It is a bit weird on the first run -- that's when you have to be careful to allow popups and let StackEdit initiate the 3-way OAuth handshake to become an authorized app for your GoogleDrive, but after that it will work the same as any other application. Except it is closer to a WYSIWYG HTML editor because the concept of input is that of a markdown, by definition.

The problem of sharing your .md files with people who do not have Google accounts remains unsolved. It is possible with native doctypes but .md (actually, x-markdown mime) will require you to log in and install the application (3-way handshake needs an account).

So, summarizing

(1) StackEdit as native doctype in Google Drive -- OK
(2) .md files shared publicly for community edits (sharing) -- OK
(3) Sharing with people who do not have Google accounts -- FAILED

Home the (3) becomes possible is the future.

Friday, September 13, 2013

NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledge Generation on Top of Scientific Portals

The title is a mouthful, I know. But you need that long to describe the idea.

There is a story behind it. I have first-hand experience in how major scientific portals (won't tell you which one exactly) upgrade their portals. Not good, my friends, not good ... are the processes and routines these people use.

Anyway, NiceCover is literally a nice cover for the scientific portals. The target is:

(1) Build a social collaboration layer on top of scientific portals.
(2) Write serverless webapps.
(3) Make sure you are within terms of use, but go wild otherwise.

These slides report on the first step.

There will be more soon.

Wednesday, September 4, 2013

StopWastingFoodAPI

Now, is it really that difficult to solve the problem of 2/3 of the product wasted at our supermarkets?

Saturday, August 31, 2013

Social Applications that Live in Clouds and Help People Link Their Work

Social Application are not Social Networks, especially not those that are popular today. Social apps are simple applications which do work socially -- normally by facilitating social collaboration. Note that there are not such apps today.

You can see my previous posts about social collaboration here and here. In fact, one app that welcomes social collaboration is still running in the wild ... wide open to the public .. with not a single client other than my own in these few weeks. Let's put social collaboration itself aside.

Consider this app:

It lives in the cloud because it is hosted by, say, Google Drive (or Dropbox, or some other) and keeps its output in the same place in the cloud it came from. In order to make it work you need two things:

(1) Solution to the Stringex Problem on which I wrote before, preferably a working software client.

(2) People willing to participate to form the three in the figure in the bottom-up fashion. Note that there are Miners and Mappers.

The problem is that (2) creates security problems. Specifically, people might be afraid to open up their data to public. This fear is not completely ungrounded. Dropbox API, for example, opens your public spaces wide open which means that anyone with the keys can overwrite, erase, add, or do anything with your data. Which is why most such apps limit their scope to the Dropbox account of the user him/herself. You basically write to your own account. On the other hand, Social Apps NEED to share their public spaces.

Still can be done. As long as you provide a separate PRIVATE space which no one can touch and develop a SMART SYNC. The smartness of the sync should be judged the same as Wikipedia does it -- it should be much easier to revert malicious changes than to make them.

Open problem which I am currently trying to close...

Tuesday, August 27, 2013

Passwordless SSH Login in Fedora 18+

Been having this problem for a while. Most machines used to be either cloud stacks (like XCP) or FC16 machines (traffic capture) and I had no problem with them. For someone who runs automation across machines, inability to login into SSH passwordlessly is a huge problem. Not because of SSH per se, but because RSYNC over SSH is probably the safest way to sync files across machines and if each RSYNC keeps asking you for the password, ... then you cannot get anything done.

Found several places where this was discussed. But the actual solution was much simpler.

> vi /etc/ssh/sshd_config

#MaxAuthTries 6
#MaxSessions 10

#RSAAuthentication yes
PubkeyAuthentication yes

# The default is to check both .ssh/authorized_keys and .ssh/authorized_keys2
# but this is overridden so installations will only check .ssh/authorized_keys
#AuthorizedKeysFile .ssh/authorized_keys

#AuthorizedKeysCommand none

The key is the authorized_keys2. It looks like FC18 forces SSH to read *_keys file instead of _keys2. Since I was using the customary *_keys2 file, I could not login without the password. As soon as I commented it out, I got my passwordless login.

Weird how things change sometimes.

Thursday, August 22, 2013

The Stringex Problem: Client-Side Indexing in Clouds

Hadoop and Lucene-style indexing all sound nice, until you stumble upon a new practical usecase (read: problem) and need to build a webapp which needs indexing on the client side (read: browser) in realtime (read:continuously). That's when you run into the Stringex Problem. Hint: When you need to access the data you keep on the cloud, you need to mind the size of the hole through the membrane (read: API capability).