Negative matching on multiple ip addresses in SSH

In sshd_config, you can use the

Match

directive to apply different configuration parameters to ssh connections depending on their characteristics.

In particular, you can match on ip address, both positively and negatively.

You can specify multiple conditions in the match statement. All conditions must be matched before the match configuration is applied.

To negatively match an ip address, that is, to apply configuration if the connection is not from a particular ip address, use the following syntax

Match Address *,!62.29.1.162/32
ForceCommand /sbin/sample_script

To negatively match more than one ip address, that is, to apply configuration if the connection is not from one of more ip addresses, use the following syntax

Match Address *,!62.29.1.162/32,!54.134.118.96/32
ForceCommand /sbin/sample_script

Is Skype an appropriate tool in corporate environments?

This is a question that has plagued me for several years, in that I have never been able to establish a consistent level of Skype quality in a corporate environment, despite having lots of bandwidth and obtained the consultancy services of CCIE level network experts.

The answer to the question is ultimately, no.

Let me explain by running through the questions.

1. How does Skype work at a network level?

Skype is a “Peer To Peer” (P2P) application. That means that when 2 people are having a Skype conversation, their computers *should* be directly connected, rather than connected via a 3rd computer. For the sake of comparison, Google Hangouts is not a P2P application. Google Hangout participants connect to each other via Google Conference Servers.

2. Does Skype work with UDP or TCP?

Skype’s preference is for UDP, and when Skype can establish a direct P2P connection using UDP, which is typically the case for residential users, call quality is very good. This is because UDP is a much faster protocol than TCP when used for streaming audio and video.

3. What’s the difference between residential and corporate users?

Residential internet connections are typically allocated a temporary fixed public ip address. This IP gets registered to a Skype user on Skype’s servers, so when someone needs to contact that user, Skype knows where to direct the call, and can use UDP to establish a call between the participating users.

In corporate environments, where there are lots of users using the same internet connection, sharing of a a single public IP address between those users has to occur (Port Address Translation). That means that the Skype servers will have registered the same public ip address for all the users in that organisation. This means that Skype is not able to establish a direct UDP P2P connection between a user on the outside of that organisation and a user in that organisation, and has to use other means to make that connection.

4. What are those other means?

When direct connectivity between clients is not possible, Skype uses a process called “UDP hole punching”. In this mechanism, 2 computers that cannot communicate directly with each other communicate with one or more third party computers that can communicate with both computers.

Connection information is passed between the computers in order to try and establish a direct connection between the 2 computers participating in the Skype call.

If ultimately a direct connection cannot be established, Skype will use the intermediary computers to relay the connection between the 2 computers participating in the conversation.

In Skype terminology, these are known as “relay nodes”, which are basically just computers running Skype than have direct UDP P2P capability (typically residential users with good broadband speeds).

From the Skype Administrators Manual:

http://download.skype.com/share/business/guides/skype-it-administrators-guide.pdf

2.2.4 Relays

If a Skype client can’t communicate directly with another client, it will find the appropriate relays for the connection and call traffic. The nodes will then try connecting directly to the relays. They distribute media and signalling information between multiple relays for fault tolerance purposes. The relay nodes forward traffic between the ordinary nodes. Skype communication (IM, voice, video, file transfer) maintains its encryption end-to-end between the two nodes, even with relay nodes inserted.

As with supernodes, most business users are rarely relays, as relays must be reachable directly from the internet. Skype software minimizes disruption to the relay node’s performance by limiting the amount of bandwidth transferred per relay session. 

5. Does that mean that corporate Skype traffic is being relayed via anonymous third party computers?

Yes. The traffic is encrypted, but it is still relayed through other unknown hosts if a direct connection between 2 Skype users is not possible.

6. Is this why performance in corporate environments is sometimes not good?

Yes. If a Skype conversation is dependent on one of more relay nodes, and one of these nodes experiences congestion, this will impact on the quality of the call.

7. Surely, there is some solution to this?

A corporate network can deploy a proxy server, which is directly mapped to a dedicated public ip address. Ideally, this should be a UDP-enabled SOCKS5 server, but a TCP HTTP Proxy server can also be used. If all Skype connections are relayed through this server, Skype does not have to use relay nodes, as Port Address Translation is not in use.

8. So what’s the catch?

The problem with this solution is that it is not generally possible to force the Skype client to use a Proxy Server. When the client is configured to use a Proxy Server, it will only use it if there is no other way to connect to the Internet. So, if you have a direct Internet connection, even one based on Port Address Translation, which impacts on Skype quality, Skype will continue to use this, even if a better solution is available via a Proxy Server.

9. Why would Skype do this?

Skype is owned by Microsoft. Skype have a business product that attaches to Microsoft Active Directory that allows you do force a Proxy connection. So if you invest in a Microsoft network, Microsoft will give you a solution to enable better Skype performance in corporate networks. If you don’t want to invest in a Microsoft network, you’re stuck, and your only option is to block all outbound Internet access from your network and divert it via your Proxy server.

For a lot of companies, particularly software development companies who depend on 3rd party web services, this is not a practical option.

10. What is the solution?

At this time the primary options for desktop Audio/Video conferencing are either Skype or Google Hangouts.

When Skype can be used in an environment where P2P UDP connectivity is “always on”, it provides a superior audio/video experience to Google Hangouts, which is not P2P, and which communicates via central Google Servers.

Where an environment uses Port Address Translation, Skype performance will depend on the ability of Skype client to establish connections via relays, which means Skype performance becomes dependent on the resources available to those relays.

In this instance, Google Hangout may be a better choice where consistent quality is required, as quality can be guaranteed by providing sufficient bandwidth between the corporate network and Google.

 

How to use DJ Bernstein’s daemontools

When I first started working in IT, one of the first projects I had to undertake was to set up a QMail server, which first brought me into contact with DJ Bernstein and his various software components.

One of these was daemontools, which is a “a collection of tools for managing UNIX services”, and which is most frequently used in connection with Qmail.

The deamontools website is from another time. Flat HTML files, no CSS, horizontal rules…its like visiting some sort of online museum. In fact, the website hasn’t changed in over 20 years, and daemontools has been around for that long, and hasn’t changed much in the interim.

The reason for daemontools longevity is quite simple. It works. And it works every time, all the time, which isn’t something you can say about every software product.

So if you need to run a process on a UNIX/Linux server, and that process needs to stay up for a very long time, without interruption, there probably isn’t any other software than can offer the same reliability as daemontools.

Here’s a quick HOWTO:

Firstly, install it, exactly as described here:

http://cr.yp.to/daemontools/install.html

If you can an error during the installation about a TLS reference, edit the file src/conf-cc, and add

-include /usr/include/errno.h

to the gcc line.

Once installed, check:

1. That you have a /service directory
2. That the command /command/svscanboot exists

If this is the case, daemontools is successfully installed

Now, you can create the process/service that you want daemontools to monitor.

Create a directory under /service, with a name appropriate to your service, eg

/service/growfile

(you can also use a symbolic link for this directory, to point to an existing service installation)

In that directory, create a file called run, and give it 755 permission


touch /service/growfile/run
chmod 755 /service/growfile/run

Next, update the run file with the shell commands necessary to run your service


#!/bin/sh

while :
do
echo “I am getting bigger…” > /tmp/bigfile.txt
sleep 1
done

Your service is now set up. To have daemontools monitor it, run the following command:


/command/svscan &

(To start this at boot, add /command/svscanboot to /etc/rc.local, if the install hasn’t done this already)

To see this in action, run ps -ef and have a look at your process list. You will see

1. A process called svsscan, which is scanning the /service directory for new processes to monitor
2. A process called “supervise growfile”, which is keeping the job writing to the file alive

Also, run


tail -f /tmp/bigfile.txt

Every 1 second, you should see a new line being appended to this file:


I am getting bigger...
I am getting bigger...
I am getting bigger...
I am getting bigger...

To test deamontools, delete /tmp/bigfile.txt


rm -f /tmp/bigfile.txt

It should be gone, right?

No! Its still there!


tail -f /tmp/bigfile.txt


I am getting bigger...
I am getting bigger...
I am getting bigger...
I am getting bigger...

Finally, if you want to actually kill your process, you can use the “svc” command supplied with daemontools:

svc -h /service/yourdaemon: sends HUP
svc -t /service/yourdaemon: sends TERM, and automatically restarts the daemon after it dies
svc -d /service/yourdaemon: sends TERM, and leaves the service down
svc -u /service/yourdaemon: brings the service back up
svc -o /service/yourdaemon: runs the service once

This is the basic functionality of daemontools. There is a lot more on the website.

Managing Logstash with the Redis Client

Users of Logstash will be familiar with the stack of technologies required to implement a logstash solution:

The client that ships the logs to Redis

Redis which queues up the files for indexing

Logstash which creates the indices

Elasticsearch which stores the indices

Kibana which queries Elasticsearch

When you’re dealing with multiple components like this, things will inevitably for wrong.

For instance, say for some reason you client stops, and then you start it again 4 days later, and now the stack has to process 4 days of old log files before letting you search the latest ones.

One of the best ways to deal with this is to setup the Redis queue (“list” is the correct term) so that you can selectively remove entries from the list, so that chunks of old logs can be skipped.

Take a look at this config from the logstash shipper:


output {
  stdout { debug => false debug_format => "json"}
  redis { host => "172.32.1.172" data_type => "channel" key => "logstash-%{@type}-%{+yyyy.MM.dd.HH}" }
}

You’ll see here that I’ve modified the default key value for logstash, by adding the log file type and date stamp to the key. The default key value in the Logstash documentation is “logstash’, which means every entry goes into Redis with the same key value.

You will also notice that I have changed the data_type from the default “list” to “channel’, more of which in a moment.

To see what this means, you should now login to your Redis server with the standard redis-cli command line interface

To list all available keys, just type


KEYS *logstash*

and you will get something like


redis 127.0.0.1:6379> keys *logstash*
 1) "logstash-nodelog-2014.03.07.17"
 2) "logstash-javalog-2014.03.07.15"
 3) "logstash-applog-2014.03.07.14"
 4) "logstash-catalina-2014.03.08.23"
 5) "logstash-applog-2014.03.08.23"
 6) "logstash-catalina-2014.03.07.15"
 7) "logstash-nodelog-2014.03.07.14"
 8) "logstash-javalog-2014.03.07.14"
 9) "logstash-nodelog-2014.03.08.23"
10) "logstash-applog-2014.03.07.15"
11) "logstash-javalog-2014.03.08.23"

This shows that your log data are now stored in Redis according to log file type, and data and hour, rather than all just under the default “logstash” key. In other words, there are now multiple keys, rather than just the “logstash” key which is the default.

You also need to change the indexer configuration at this point, so that it looks for multiple keys in Redis rather than just the “logstash” key


input {
  redis {
    host => "127.0.0.1"
    type => "redis-input"
    # these settings should match the output of the agent
    data_type => "pattern_channel"
    key => "logstash*"

    # We use json_event here since the sender is a logstash agent
    format => "json_event"
  }
}

For data_type here, I am using “pattern_channel”, which means the indexer will ingest the data from any key where the key matches the pattern “logstash*”.

If you don’t change this, and you have changed your shipper, none of your data will get to Elasticsearch.

Using Redis in this way also requires a change to the default Redis configuration. When Logstash keys are stored in Redis in a List format, the List is constantly popped by the Logstash indexer, so it remains in a steady state in terms of memory usage.

When the Logstash Indexer pull data from a Redis channel, the data isn’t removed from Redis, and therefore grows.

To deal with this, you need to set up memory management in Redis, namely:

maxmemory 500mb
maxmemory-policy allkeys-lru

What this means is that when Redis reaches a limit of 500mb of used memory, it will drop keys according to a “Least Recently Used” algorithm. The default algorithm is volatile-lru, which is dependent on the TTL value of the key, but as Logstash doesn’t set the TTL on Redis keys, which need to use the allkeys-lru alternatively instead.

Now, if you want to remove a particular log file type from a particular date and time from the Logstash process, you can simply delete that data from Redis


DEL logstash-javalog-2014.03.08.23

You can also check the length of individual lists by using LLEN, to give you an idea of which logs from which dates and times will take the longest to process


redis 127.0.0.1:6379> llen logstash-javalog-2014.03.08.23
(integer) 385460

You can also check you memory consumption in Redis with:

redis 127.0.0.1:6379>info

Command line tool for checking status of instances in Amazon EC2

I manage between 10 and 15 different Amazon AWS accounts for different companies.

When I needed to find out information about a particular instance, it was a pain to have log into the web interface each time. Amazon do provide an API that allows you query data about instances, but to use that, you need to store an Access Key and Secret on your local computer, which isn’t very safe when you’re dealing with multiple account.

To overcome, this I patched together Tim Kay’s excellent aws tool with GPG and a little PHP, to create a tool which allows you query the status of all instances in a specific region in an Amazon EC2 account, using access credentials that are locally encrypted, so that storing them locally isn’t an issue.

Output from the tool is presented on a line by line basis, so you can use grep to filter the results.

Sample output:

ec2sitrep.sh aws.account1 us-east-1

"logs-use"  running  m1.medium  us-east-1a  i-b344b7cb  172.32.1.172  59.34.113.133
"adb2-d-use"  running  m1.small  us-east-1d  i-07d3e963  172.32.3.54  67.45.139.235
"pms-a-use"  running  m1.medium  us-east-1a  i-90852ced  172.32.1.27  67.45.108.146
"s2-sc2-d-use"  running  m1.medium  us-east-1d  i-3d40b442  172.32.3.26  67.45.175.244
"ks2-sc3-d-use"  running  m1.small  us-east-1d  i-ed2ed492  172.32.3.184  67.45.163.141
"ks1-sc3-c-use"  running  m1.small  us-east-1c  i-6efb9612  172.32.2.195  67.45.159.221
"adb1-c-use"  running  m1.small  us-east-1c  i-98cf44e4  172.32.2.221  67.45.139.196
"s1-sc1-c-use"  running  m1.medium  us-east-1c  i-956a76e8  172.32.2.96  67.45.36.97
"sms2-d-use"  running  m1.medium  us-east-1d  i-a86ef686  172.32.3.102  34.90.28.159
"uatpms-a-use"  running  m1.small  us-east-1a  i-b8cf5399  172.32.1.25  34.90.163.110
"uatks1-sc3-c-use"  running  t1.micro  us-east-1c  i-de336dfe  172.32.2.26  34.90.99.226
"uats1-sc1-c"  running  m1.medium  us-east-1c  i-35396715  172.32.2.217  34.90.183.23
"uatadb1-c-use"  running  t1.micro  us-east-1c  i-4d316f6d  172.32.2.29  34.90.109.171
"sms1-c-use"  running  m1.medium  us-east-1c  i-31b29611  172.32.2.163  34.90.100.25

(Note that public ips have been changed in this example)

You can obtain the tool from Bitbucket:

https://bitbucket.org/garrethmcdaid/amazon-ec2-sitrep/

How to monitor the Amazon Linux AMI Security Patch RSS feed with Nagios

People who use Amazon AWS will be familiar with the Amazon Linux AMI, which is a machine image provided by Amazon with a stripped down installation of Linux.

The AMI acts as a starting point for building up your own AMIs, and has its own set of repos maintained by Amazon for obtaining software package updates.

Amazon also maintains an RSS feed, which announces the availability of new security patches for the AMI.

One of the requirements of PCI DSS V2 compliance is as follows:

6.4.5 Document process for applying security patches and software updates

That means you have to have a written down process for being alerted to and applying software patches to servers in your PCI DSS scope.

You could of course commit to reading the RSS feed every day, but that’s human intervention, which is never reliable. You could also set up your Amazon servers to simply take a system wide patch update every day, but if you’d prefer to review the necessity and impact of patches before applying them, that isn’t going to work.

Hence, having your monitoring system tell you if a new patch has been released for a specific software component would be nice thing to have, and here it is, in the form of a Nagios plugin.

The plugin is written in PHP (I’m a ex-Web Developer) but is just as capable as when it comes to Nagios as PERL and Python (without the need for all those extra modules).

I’ve called it check_rss.php, as it can be used on any RSS feed. There is another check_rss Nagios plugin, but it won’t work in this instance, as it only checks the most recent port in the RSS stream, and doesn’t include any way to retire alerts.

You can obtain the Plugin from Bitbucket:

https://bitbucket.org/garrethmcdaid/nagios-rss-checker/src

The script takes the following arguments:

“RSS Feed URL”

“Quoted, comma Separated list of strings you want to match in the post title”

“Number of posts you want to scan”

“Number of days for which you want the alert to remain active”

eg

commands.cfg

define command {
    command_name check_rss
    command_line $USER1$/check_rss.php $ARG1$ $ARG2$ $ARG3$ $ARG4$
}

<sample>.cfg

<snip>
check_command   check_rss!http://aws.amazon.com/rss/amazon-linux-ami.rss!"openssl"!30!3
<snip>

You need to tell Nagios how long you want the alert to remain active, as you have no way of resolving the alert (ie you can’t remove it from the RSS feed)

This mechanism allows you to “silence” the alert after a number of days. This isn’t a feature of Nagios, rather of the script itself.

The monitor will alert if it finds *any* patches, and include *all* matching patches in its alert output.

How to install and setup Logstash

So you’ve finally decided to put a system in place to deal with the tsumnami of logs your web applications are generating, and you’ve looked here and there for something Open Source, and you’ve found Logstash, and you’ve had a go at setting it up…

…and then you’ve lost all will to live?

Any maybe too, you’ve found that every trawl through Google for some decent documentation leads you to this video of some guy giving a presentation about Logstash at some geeky conference, in which he talks in really general terms about Logstash, and doesn’t give you any clues as to how you go about bring it into existence?

Yes? Well, hopefully by landing here your troubles are over, because I’m going to tell you how to set up Logstash from scratch.

First, lets explain the parts and what they do. Logstash is in fact a collection of different technologies, in which the Java programme, Logstash, is only a part.

The Shipper

This is the bit that reads the logs and sends them for processing. This is handled by the Logstash Java programme.

Grok

This is the bit that takes logs that have no uniform structure and gives them a structure that you define. This occurs prior to the logs being shipped. Grok is a standalone technology. Logstash uses its shared libraries.

Redis

This is a standalone technology that acts as a broker. Think of it like a turnstile at a football ground. It allows multiple events (ie lines of logs) to queue up, and then spits them out in a nice orderly line.

The Indexer

This takes the nice ordered output from Redis, which is neatly structured, and indexes it, for faster searching. This is handled by the Logstash Java programme.

Elasticsearch

This is a standalone technology, into which The Indexer funnels data, which stores the data and provides search capabilities.

The Web Interface

This is the bit that provides a User Interface to search the data that has been stored in Elasticsearch. You can run the web server that is provided by the Logstash Java programme, or you can run the Ruby HTML/Javascript based web server client, Kibana. Both use the Apache Lucene structured query language, but Kibana has more features, a better UI and is less buggy (IMO).

(Kibana 2 was a Ruby based server side application. Kibana 3 is a HTML/Javascript based client side application. Both connect to an ElasticSearch backend).

That’s all the bits, so lets talk about setting it up.

First off, use a server OS that has access to lots of RPM repos. CentOS and Amazon Linux (for Amazon AWS users) are a safe bet, Ubuntu slightly less so.

For Redis, Elasticsearch and the Logstash programme itself, follow the instructions here:

http://logstash.net/docs/1.2.1/

(We’ll talk about starting services at bootup later)

Re. the above link, don’t bother working through the rest of the tutorial beyond the installation of the software. It demos Logstash using STDIN and STDOUT, which will only serve to confuse you. Just make sure that Redis, Elasticsearch and Logstash are installed and can be executed.

Now, on a separate system, we will setup the Shipper. For this, all you need it the Java Logstash programme and a shipper.conf config file.

Lets deal with 2 real-life, practical scenarios:

1. You want to send live logs to Logstash
2. You want to send old logs to Logstash

1. Live logs

Construct a shipper.conf file as follows:

input {

   file {
      type => "apache"
      path => [ "/var/log/httpd/access.log" ]
   }

}

output {
   stdout { debug => true debug_format => "json"}
   redis { host => "" data_type => "list" key => "logstash" }
}

What this says:

Your input is a file, located at /var/log/httpd/access.log, and you want to record the content of this file as the type “apache”. You can use wildcards in your specification of the log file, and type can be anything.

You want to output to 2 places: firstly, your terminal screen, and secondly, to the Redis service running on your Logstash server

2. Old logs

Construct a shipper.conf file as follows:

input {

tcp {
type => "apache"
port => 3333
}

}

output {
stdout { debug => true debug_format => "json"}
redis { host => "" data_type => "list" key => "logstash" }
}

What this says:

Your input is whatever data you read from TCP port 3333, and you want to record the content of this file as the type “apache”. You can use wildcards in your specification of the log file, and type can be anything.

You want to output to 2 places: firstly, your terminal screen, and secondly, to the Redis service running on your Logstash server.

That’s all you need to do for now on the Shipper. Don’t run anything yet. Go back to your main Logstash server.

In the docs supplied at the Logstash website, you were given instructions how to install Redis, Logstash and Elasticsearch, including the Logstash web server. We are not going to use the Logstash web server, and use Kibana instead, so you’ll need to set up Kibana (3, not 2. Version 2 is a Ruby based server side application).

https://github.com/elasticsearch/kibana/

Onward…

(We’re going to be starting various services in the terminal now, so you will need to open several terminal windows)

Now, start the Redis service on the command line:

./src/redis-server --loglevel verbose

Next, construct an indexer.conf file for the Indexer:

input {
   redis {
      host => "127.0.0.1"
      type => "redis-input"
      # these settings should match the output of the agent
      data_type => "list"
      key => "logstash"

      # We use json_event here since the sender is a logstash agent
      format => "json_event"
   }
}

output {
   stdout { debug => true debug_format => "json"}

   elasticsearch {
      host => "127.0.0.1"
   }
}

This should be self-explanatory: the Indexer is talking input from Redis, and sending it to Elasticsearch.

Now start the Indexer:

java -jar logstash-1.2.1-flatjar.jar agent -f indexer.conf

Next, start Elasticsearch:

./elasticsearch -f

Finally, crank up Kibana.

You should now be able to access Kibana at:

http://yourserveraddress:5601

Now that we have all the elements on the Logstash server installed and running, we can go back to the shipping server and start spitting out some logs.

Regardless of how you’ve set up your shipping server (live logs or old logs), starting the shipping process involves the same command:

java -jar logstash-1.2.1-flatjar.jar agent -f shipper.conf

If you’re shipping live logs, that’s all you will need to do. If you are shipping old logs, you will need to pipe them to the TCP port you opened in your shipper.conf file. Do this is a separate terminal window.

nc localhost 3333 < /var/log/httpd/old_apache.log

Our shipping configuration is setup to output logs both to STDOUT and Redis, so you should see lines of logs appearing on your terminal screen. If the shipper can’t contact Redis, it will tell you it can’t contact Redis.

Once you see logs being shipped, go back to your Kibana interface and run a search for content.

IMPORTANT: if your shipper is sending old logs, you need to search for logs from a time period that exists in those logs. there is no point in searching for content from the last 15 mins if you are injecting logs from last year.

Hopefully, you’ll see results in the Kibana window. If you want to learn the ins and outs of what Kibana can do, have a look at the Kibana website. If Kibana is reporting errors, retrace the steps above, and ensure that all of the components are running, and that all necessary firewall ports are open.

2 tasks now remain: using Grok and setting up all the components to run as services at startup.

Init scripts for Redis, ElasticSearch and Kibana are easy to find through Google. You’ll need to edit them to ensure they are correctly configured for your environment. Also, for the Kibana init script, ensure you use the kibana-daemon.rb Ruby script rather than the basic kibana.rb version.

Place the various scripts in /etc/init.d, and, again on CentOS, set them up to start at boot using chkconfig, and control them with the service command.

Grok isn’t quite so easy.

The code is available from here:

https://github.com/jordansissel/grok/

You can download a tarball of it from here:

https://github.com/jordansissel/grok/archive/master.zip

Grok has quite a few dependencies, which are listed in its docs. I was able to get all of these on CentOS using yum and the EPEL repos:

rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/$(uname -i)/epel-release-5-4.noarch.rpm

then

yum install -y gcc gperf make libevent-devel pcre-devel tokyocabinet-devel

Also, after you have compiled grok, make sure you run ldconfig, so that its libraries are shared with Logstash.

How to explain Grok?

In the general development of software over the last 20-30 years, very little thought has gone into the structure of log files, which means we have lots of different structures in log files.

Grok allows you to "re-process" logs from different sources so that you can give them all the same structure. This structure is then saved in Elasticsearch, which makes querying logs from different sources much easier.

Even if you are not processing logs from different sources, Grok is useful, in that you can give the different parts of a line of a log field names, which again makes querying much easier.

Grok "re-processing", or filtering, as it is called, occurs in the same place as your Shipper, so we add the Grok config to the shipper.conf file.

This involves matching the the various components in your log format to Grok data types, or patterns as they are referred to in Grok. Probably the easiest way to do this is with this really useful Grok debugger:

http://grokdebug.herokuapp.com/

Cut and paste a line from one of your logs into the input field, and then experiment with the available Grok patterns until you see a nice clean JSON object rendered in the output field below.

How to remove absolute image srcs in WordPress

I use WordPress for this site but to be honest, I absolutely hate WordPress.

Comment spam, Malware hacks, hostname is part of the config, answer to everything is a plugin, etc, etc, yuk, yuk, yuk.

I use it because all I want to do is make posts, and for that, it is fine. A lot of other people use it for Content Management on complex websites, and that’s where it starts to fall apart.

For instance, lets say you have a pretty important website, where you want changes to be verified by the site owner before they go live. So you do the sensible thing, and create a staging environment (staging.mysite.com), and promote code and DB changes to your live site (www.mysite.com) directly from your staging site, rather than manually updating your live site.

That’s the way things are done in the grown up Internet, but if you try to do this with WordPress, you’ve entered a world of pain.

When you use WordPress’s Media Library to put an image in a post, WordPress insists on creating an absolute src attribute for the image, including the protocol and the hostname.

eg

<img src="" src="http://staging.mysite.com/wp-content/upload/catpic.jpg" />

when ideally it should be using an absolute URL without the protocol and hostname

ie

 <img src="" src="/wp-content/upload/catpic.jpg" />

This means that you can’t port your site directing from staging to live, as your live site will contain images that are being pulled from your staging site. The same problem applies if you are changing your site to a completely different domain name.

Consulting the WP forums you will find all manner of solutions to this, ranging from (yes, you’ve guessed it) plugins, to running MySQL queries every time you want to update you site.

A much simpler way to do this is to user JQuery, which you insert into your code once (in script tags in the footer of your theme), and then never worry about again.

The necessary JQuery is as follows:


jQuery('img').each(function () {
       var curSrc = jQuery(this).attr('src');
       if (curSrc.substring(0, curSrc.indexOf('/', 4)) == 'http:') {
          var baseURL = curSrc.substring(0, curSrc.indexOf('/', 14));
          jQuery(this).attr("src",curSrc.replace(baseURL,""));
       }
});

What this does:

Each time your page loads, JQuery will cycle through all the images on that page. If it finds an image src that includes the protocol specifier ‘http:’, it will check what the base URL of your page is, and remove that value from the src of the image.

This will leave you with a page on which all the images are loaded relative to the root of your web server, rather than a Fully Qualified Domain Name.

How to change the hostname on an Amazon linux system without rebooting

I use the following config in the .bash_profile file for the root account on Linux systems:

PS1="${HOSTNAME}:\${PWD##*/} \$ "

This prints the server’s hostname in the shell command prompt, which is handy if you are working on lots of servers simultaneously.

However, I also do a lot of cloning of servers in Amazon. When the command prompt is carried over into the clone, you end up with the hostname of the clone source in the clone itself. Normally, you would solve this by changing /etc/sysconfig/network and rebooting, but this isn’t always practical.

Instead, just change /etc/sysconfig/network as usual, and then issue the following command:

echo "server name" > /proc/sys/kernel/hostname

Then logout and open a shell. New hostname sorted.

How to use custom element attributes in JQuery

Have a look at the following snippet of very basic HTML.

<input type="button" class="my_button" value="Submit" product_id="1234">

You’ll recognise this as a FORM button, with a value of “Submit” and a DOM element class of “my_button”.

Nothing strange in that. But what’s this 3rd attribute in the element, “product_id”? That’s not a valid HTML attribute, is it?

No, its not. Its a custom attribute that I’ve added. It could be anything, and it will have no impact on the output of the button in the browser.

So why add it?

Custom HTML attributes, blended with a Javascript Framework like JQuery, are an excellent way to develop fluid UIs in web applications.

Let’s use the above example. Let’s say we have a list of products in a shopping cart application, and we want to give the user the option to add products to a cart, but we want to do this with AJAX, so we don’t have to refresh the page each time.

Our web application renders the list of products, each with a Submit button that has a custom HTML attribute “product_id” with a value of the product’s catalogue ID.

So, in our Javascript, we use JQuery to trigger our required Javascript when the document loads. Standard stuff.

$(document).ready(js_Init);

In our initialisation function, we then attach another function, js_addToCart(), to the Click event on any element with class, “my_button”.

function js_Init() {
	$(".my_button").live("click", function(event) {
                  js_addToCart($(this).attr("product_id"));
        });
}

Now, notice the argument that we pass to this function.

Its a JQuery object first and foremost, that is, the element on which the Click event occurred. The argument we want is the value of the attribute of that element called “product_id”.

Our js_addToCart() function now has the catalogue ID of the product the user wants to add to their cart, which we can process with JQuery without ever having to do a FORM submission.

You can add as many custom attributes to an element as you wish, and pass as many of this back to JQuery as you wish, which means you should never really have to use a HTML form again.