Focus on Collections
We write programs to deal with collections of data. Let’s look at some of the tools you’ll need to manage and manipulate them effectively.
Enumerable
The granddaddy of it all. The Enumerable
module can be mixed into any class which implements an each
method. Most any class that acts like a collection will mixin Enumerable
, so it’s methods are some of the most valuable you can study.
You’ll find that most Enumerable
methods work with blocks. The method generally iterates through the collection, runs the block once for each element, then does something with the results.
The full API is at http://ruby-doc.org/core-1.9.3/Enumerable.html, but below are a few of the best.
all?
Is the block true for all elements in the collection?
1
|
|
any?
Is the block true for at least one element?
1 2 |
|
collect
/ map
One of the very most common, these synonym methods go through each element of the collection, run the block, gather the return value, then return a collection of those values.
1
|
|
each_slice
Slice the collection into smaller collections, then iterate through each of those:
1 2 3 4 |
|
each_with_index
Uses two local block parameters to iterate through the collection but also use the element’s position in the origininal collection:
1 2 3 4 |
|
detect
/ find
These synonym methods will return the first result for which the block is true.
1 2 |
|
select
/ find_all
These synonym methods will return an array of elements for which the block is true. It’ll still be an array even if only one element was found.
1 2 |
|
grep
This method is a searching chainsaw. It’s a little different from select
in that it can take a "pattern" argument which is a Regular Expression or a Range. It will return an array of the matching elements.
These synonym methods will return an array of elements for which the block is true. It’ll still be an array even if only one element was found.
1 2 |
|
Here the regular expression matches all words containing a b
. Where it gets interesting is if you also pass in a block, you get a result that’s similar to calling select
to pick elements, then collect
to run an operation and gather the results.
1 2 |
|
The results, [4, 4]
, are the lengths of "bear"
and "bird"
.
group_by
This method will run a block for each element and return a Hash
with the result as a key and an array containing all elements which generated that result as the value.
1 2 |
|
include?
Does this collection contain the specified object?
1 2 |
|
inject
/ reduce
The most lauded and most hated method in Enumerable
, you can use inject
to create magical rainbows while simultaneously confusing your colleagues.
1 2 |
|
This effectively says "Start with the value 0
, put that in a variable sum
, go through the collection and for each element add the length of the element to sum
, returning sum
".
max
/ min
These methods are basically shortcuts. max
is equivalent to .sort.last
while min
is equivalent to .sort.first
1 2 3 4 |
|
max_by
/ min_by
Similarly, these are like shortcuts for using sort_by
:
1 2 3 4 |
|
With max_by
, here, we get "ant"
because it has the "maximum" letter of all the words, "t"
.
With min_by
, we get "ant"
because it has the "minimum" letter of all the words, "a"
.
partition
This method is used to divide a collection into two sets based on whether the block is true or false:
1 2 |
|
It returns an Array
which contains two nested arrays. This is often used with Ruby’s automatic decomposition to store the results into separate variables:
1 2 3 4 5 6 |
|
reject
The opposite of select
, find the elements for which the block is false:
1 2 |
|
This is often used with the alternate form reject!
to remove the elements for which the block is true:
1 2 3 4 5 6 |
|
sort
/ sort_by
The simple sort
will delegate responsibility to the "spaceship method", <=>
, defined for the object. For String
objects, spaceship sorts them into alphabetical order with all capital letters coming before any lowercase letters.
1 2 |
|
Note how "C"
comes first.
The sort_by
method accepts a block to run on each element, then uses spaceship to sort the resulting values:
1 2 |
|
But within the similar lengths of 3 and 4, how do you determine the sorting order? A common technique is to have the block return an array with multiple criteria for sorting:
1 2 |
|
Now it sorts by length first, then by alphabetical order within a common length.
More than just an Array
The most common object implementing Enumerable
that you’ll work with is certainly Array
. The abstract data type we call array can be thought of as a contiguous list of elements, like a set of train cars on a track, each with their own payload, that can be walked from beginning to end or accessed at an arbitrary place in the middle. Because it is layed out as connected cells, each side by side, it can be easy to add an item to the end, but it is difficult to add items in the middle. (Think about how much physical effort it would take to add a car into the middle of a long train!)
In Ruby, the Array
class actually plays more roles than just the traditional array data type. It also can act like a linked list, a stack, or a queue, among others. We’ll take a look at the how the latter two data types can be simulated.
Making a stack with push
/pop
A stack is a collection where the last element added to it will be the first element to be removed, a concept known as last-in, first-out, or LIFO. The usual example for thinking about this is a stack of food trays in a cafeteria: when a tray is placed on top of the stack, it will be the next tray to be picked up and used, unless another tray is put on top of it beforehand.
The operations for manipulating a stack are called push
and pop
, and Ruby Array
objects implement them. We push
elements onto the Array
instance and we pop
elements off of it. It looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Stacks, among other handy uses, allow us to give priorty to things that have happened the most recently and can be useful for recursive operations.
Queueing it up with shift
/unshift
A queue is pretty recognizable data structure because it shows up frequently in everyday life. While an American may say they’re "standing in line" for a movie, a Briton would likely say they’re "standing in queue" (or "queuing"). A queue is a first-in, first-out, or FIFO, structure.
Ruby Array
s act as queues when we use the pop
and unshift
methods. We unshift
elements onto the left side of the array and pop
elements from the right. Alternatively, we could push
elements onto the right side and unshift
them from the left.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
With a queue, we can give priority to the oldest element.
Ruby does have an explicit Queue
class, but it’s built for sharing data between Ruby threads, not for the more general-purpose use cases here.
Exercises
Given the following set of data:
1 2 3 4 5 6 7 |
|
Use collection operations to…
- Find all the even numbers
- Find the square of each number
- Determine if there is a number evenly divisible by 31 (you’ll need the modulo operator: http://www.ruby-forum.com/topic/181880)
- Split the numbers into two sets: ones below 500 and ones above
- Print them in ascending order with a place marker, like this:
1 2 3
1. 17 2. 20 3. 22
- Find the sum of all numbers between 600 and 700
- Create groups by hundreds (100s, 200s, 300s), where each set is sorted in increasing order
- Find all numbers which have the digit 6