Quantcast
Channel: Drupal Blog by graham.taylor
Viewing all articles
Browse latest Browse all 2

Importing Nodes Using The Batch API

$
0
0

For a recent custom module I was building I was faced with the challenge of having to create a bunch of nodes from data stored in an XML file.

I decided not to use Feeds or Node Import modules for a couple of reasons –

  1. The XML structure was fairly custom (it was coming out of one of our internal .net databases – euugh!)
  2. I wanted to do the import on a user action - The nodes should only be imported when an an admin user had gone through an administration form first, choosing a couple of config options, then on the form submit, the import from the XML file would be performed.

Faced with the task at hand i decided that a good approach would be to use the batch API http://api.drupal.org/api/group/batch. I had also never used the Batch API before, so this was a good chance to learn some more Drupal goodness.

After looking at the handbook example here http://drupal.org/node/180528 I decided to get down to business in my custom module. I went ahead and defined my admin form with the fields i needed as normal and had it display on a URL path via hook_menu(). In the submit function for the form which is where the Batch API magic happens i define the following –

<?php
//define batch API process.
$batch = array(
 
'operations' => array(),
 
'finished' => 'mymodule_batch_finished',
 
'title' => t('Processing nodes for @title', array('@title' => $feed->name)),
 
'init_message' => t('Starting...'),
 
'progress_message' => t('Batch @current out of @total'),
 
'error_message' => t('An error occurred and some or all of the batch has failed.'),
);
?>

This $batch array is required and it tells the batch API what you are planning on do. The only required field is ‘operations’ and we’ll get on to that part in a bit, but just to briefly explain the rest –

  • 'finished' – this is the name of a callback function to call when the batch processing is finished.
  • 'title', 'init_message' and 'progress message' will all get displayed to the user on the batch processing screen to give them an indication of what is happening and how far we are through the batch import process. A full list of all available options are here - http://api.drupal.org/api/function/batch_set/6

Now onto 'operations' - the most important part. This is basically a function (or an array of functions to call when performing the batch processing). If you have one function, it will be called recursively until the batch processing is finished, otherwise the batch API will call function 1, then function 2, etc...

For my example i decided that i was only going to need 1 function which would be called recursively until all my nodes were imported from the XML file so my 'operations' part of the array looked like this –

<?php
$batch
['operations'][] = array('_mymodule_save_nodes', array($items));
?>

Now our $batch array is setup, we're ready to start batch processing. To invoke the batch API processing you simply need call batch_set() as follows –

<?php
batch_set
($batch);
?>

The batch api will kick in and call the function(s) in your 'operations' part of the array above. I'll simplify it here for example purposes but for my import nodes job the start of that function looks like so –

<?php
function mymodule_save_nodes($items = array(), &$context) {     
 
$limit = variable_get('mymodule_nodes_to_process', 50); 

  if (!isset(
$context['sandbox']['progress'])) {
   
$context['sandbox']['progress'] = 0;
   
$context['sandbox']['max'] = count($items);
  }
   
  if(!isset(
$context['sandbox']['items'])) {
   
$context['sandbox']['items'] = $items;
  }
?>

Notice the use of the $context argument. This is something that the batch API adds, and you will have to add it as an extra argument to any functions defined in the 'operations' part of the $batch array above. As the $context array is passed in by reference to the function it gets updated everytime this function gets called in the batch processing.

This makes it ideal for storing some data about the batch process.

First time round, I'm setting $context['sandbox']['progress'] = 0 (i.e. We’re at the start of the batch) and the max number of items ($context['sandbox']['max']) we are processing is the total number of nodes to import, which are stored in the $items array passed into the function.

I’m also setting the array of all the nodes ($items) to the 'sandbox' as we’ll be using the 'sandbox' $items during the processing rather than the $items array passed into the function.

The reason for using the $context['sandbox']['items'] instead of $items is because every time we come back into the function to process the next batch, $items will always carry the full number of nodes every time, whereas because $context gets continually updated throughout the process, we can reduce it (remove a node) every time we import one, and because it's stored in $context, the next time the function gets called, we will know how many nodes we have left to process.

Next up we need to do the actual processing, which means creating a new node and saving it as follows –

<?php
 
//begin saving xml feed item as node.
 
$counter = 0;   
  if(!empty(
$context['sandbox']['items'])) {
    if (
$context['sandbox']['progress'] != 0){
     
array_splice($context['sandbox']['items'], 0, $limit);
    }
    foreach(
$context['sandbox']['items'] as $item_id => $item) {
      if (
$counter != $limit) {
       
//build the node object.
       
$node = new StdClass();
       
$node->uid = 1;
       
$node->type = 'my_node_type';
       
$node->status = 1;
       
$node->title = $item['title_field_from_xml'];
   
$node->body = $item['body_field_from_xml'];       
   
   
//save the node
       
node_save($node);
?>

As you can see above, theres an internal counter which will run over the array and import any nodes up to the limit you have set per batch (in this case 50). So if we had a total of 250 nodes to import, and we’re on the first run, the above code would import 50 nodes, then the batch API would call the function again.

Still inside the for loop, while we’re running through our first 50 nodes you need to update the progress (let the user see whats happening on the screen) –

<?php
  $counter
++;
 
$context['sandbox']['progress']++;
 
$context['message'] = t('Now processing node %node of %count', array('%node' => $context['sandbox']['progress'], '%count' => $context['sandbox']['max']));
 
$context['results']['nodes'] = $context['sandbox']['progress'];
?>

Finally at the end of the function double check whether we’re at the end of our processing, or if we need to continue (in this example we’ve only done 50 out of 250, so we continue).

<?php
if ($context['sandbox']['progress'] != $context['sandbox']['max']) {
 
$context['finished'] = $context['sandbox']['progress'] / $context['sandbox']['max'];
}
?>

To complete the batch process of 250 nodes, this function will get called another 4 times (processing 50 nodes x 4 = 200).

On the final turn the batch has finished processing so the batch API calls the callback function defined in the ‘finished’ part of the $batch array. Mine looks something like this –

<?php
/**
* Batch 'finished' callback
*/
function mymodule_batch_finished($success, $results, $operations) {
  if (
$success) {
   
// Here we do something meaningful with the results.
   
$message = t('%nodes nodes processed', array('%nodes' => $results['nodes']));
   
watchdog('mymodule', '%nodes nodes processed', array('%nodes' => $results['nodes']), WATCHDOG_NOTICE);
  }
  else {
   
// An error occurred.
    // $operations contains the operations that remained unprocessed.
   
$error_operation = reset($operations);
   
$message = t('An error occurred while processing %error_operation with arguments: @arguments', array('%error_operation' => $error_operation[0], '@arguments' => print_r($error_operation[1], TRUE)));
   
watchdog('mymodule', 'An error occurred while processing %error_operation with arguments: @arguments', array('%error_operation' => $error_operation[0], '@arguments' => print_r($error_operation[1], TRUE)), WATCHDOG_ERROR);
  }
 
drupal_set_message($message);
}
?>

That’s it we’re done. Now we have a form which when submitted will import nodes retrieved from an XML file using Drupal's Batch API. Yay!


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles





Latest Images