Robot Conductor with Baxter

Reading music & Moving Baxter

Project Goals

Make Baxter accurately read sheet music and execute the arm motions required for conducting an orchestra or ensemble with correct timing of the beats.

Interesting Aspects

This Baxter conductor is able to recognize the following:

  • Time signature: how many notes are in a measure
  • Rhythm: duration of notes
  • Tempo: speed of the song
  • Dynamics: when a portion of the song changes volume
For time signature, Baxter is able to conduct songs in 4/4 time, recognize duration of ending notes and adjust ending motions accordingly, keep a consistent speed with songs 75 BPM and slower when conducting, and indicate crescendos with its left arm.

Potential Applications

It can be very beneficial for younger students playing slower songs. The robot conductor can teach students in band or orchestra to look up in order to keep a consistent speed as well as make dynamic changes when necessary. For example, with students who are just learning music, it might be easier for them to have a robot conductor as a visual aid rather than staring at the music sheet itself. Visual cues are a lot easier to process in general, and we hope that the Baxter conductor can be useful!

Design

Our project's desired functionality is to control a robot conductor who reads a sheet of music from an image, understands what cues need to be communicated to the musician, and convey those cues by executing arm motions at the proper time.



Components


  • Sensing: Optical Music Recognition (OMR) software reads the music for the cues that need to be conducted
  • Planning: Convert the music cues into robot arm motions
  • Actuation: Execute the motions with the robot arms with the required timing


Design Choices for Motion

We require our robot conductor to repeat a sequence of motions many times, so reproducibility of target poses is critical. To accomplish this, we chose to set target poses as robot joint positions instead of end effector poses. By doing this, we reduce the drift from desired positions that can accumulate over many iterations, and reduce the probability of the MoveIt path planner from choosing an undesirable roundabout path.

Conductor Motion
Diagram of Conductor Motion in 4/4 Time

When a human conductor speeds up the tempo, the conducting motion pattern becomes more compact. For example the 4/4 pattern in the image will be proportionally scaled down. In recreating this on our robot conductor, we quickly realized that guiding the end effector poses to trace out a pattern would be very tedious. We chose to use joint positions to create target poses for the endpoints of each beat, which are the numbered corners in the image. For the same arm velocity, allocating a shorter time to go to each target position allows for the pattern to become smaller and the tempo of conducting to increase.


To see specific implementation details, please see the Sensing, Planning, and Actuation sections.

Sensing

Music Recognition

Computer Vision/Optical Music Recognition by using Orchestra Software package

Input

Take sheet music as input

Image Processing

Noise Removal, Binarization, Staff Line Removal, Cutted Buckets, Segmentation and Detection, Recognition

Text Representation

Produce a text representation of the music which is machine-readable

Below is a visual summary of what Orchestra does.

Orchestra Summary

The Orchestra software takes in an image file of sheet music as input, cleans up the image using image processing techniques, and then outputs the sheet music into a readable text format.


First, the software takes an image file of a PDF of sheet music and removes any potential source of noise that it can find, such as markings or discoloration not related to the music. Then, it takes the cleaned image file and converts it to black and white colors so that it’s easier for the CV algorithm to parse through the image without having to deal with extra colors (i.e. a gray background). Orchestra gets rid of extraneous staff lines, and from this, the software breaks each line into separate arrays so that it’s easier to iterate through the song. Then, the time signature, note durations, and the measure bars are detected in order to figure out when a line starts and ends. From here, the specific time signature, pitch of the notes, the note durations, and chords are recognized and outputted into separate arrays. Orchestra is also able to recognize sharps, flats, and dotted notes.


For the conductor motion, we only really care about the time signature, the note durations, and the measure endings. Unfortunately, the OMR software was inconsistent with the output when it came to songs that weren’t in 4/4 time, so we only considered songs in 4/4 time given the time constraints of the project. Also, due to the nature of the Baxter hardware, songs that had a higher tempo or faster speed were a lot harder to execute in practice, so we chose songs that were slower than 75 BPM in order for the Baxter arms to be able to execute the full movements.

Planning

System Design and Operational Diagrams

Parser

Given the input of the text representation of the sheet music from Orchestra, we had to make some fundamental assumptions about the song to conduct to maximize fluidity of the Baxter arm movements. For starters, the time signature was assumed to be in 4/4 time for the entire duration of the song, and the tempo was assumed to be 75 BPM or slower.

From here, the parser iterates through a measure and conducts the 4/4 time motion for the right hand and sends it into right_arm_motions. For the left arm, it mirrors the right_arm_motions for a majority of the song and is sent into left_arm_motions. Depending on if there are dynamic changes, the left arm will indicate this by either raising its arm (crescendo) or lowering its arm (decrescendo). And for the ending measure, depending on the duration of the last note, both arms will be told to make a “hold” motion for the duration of the last note before finishing with an “end” motion.


Convert

Take text output of music and convert to robot commands

messages defination

ROS

Publish and subscribe system

subscribe Diagrams

Planner

The purpose of the planner is to plan the arm trajectory between positions at the correct speed.

The PathPlanner class instantiates the MoveIt commander, MoveGroupCommander with both_arms of the Baxter robot. The plan_to_joint_goal function takes the target joint position of both arms as an input and uses MoveIt to plan a trajectory from the current joint position to the target joint position. This plan is computed using the inverse kinematics (IK) solver TRAC-IK and returns a RobotTrajectory message. To increase the trajectory speed, we scale the joint velocity within RobotTrajectory.JointTrajectory by a user-determined factor.


Configuration

The main issue we encountered in implementing the conductor was getting accurate timings of the motions. We found that the TRAC-IK solver is significantly faster than the default KDL solver, drastically reducing the lag between successive motions. We include the TRAC-IK package and modify the Baxter MoveIt configuration kinematics.yaml file to switch IK solvers. Additionally, we turn off joint velocity limits in MoveIt configuration joint_limits.yaml, so velocity limits are set to Baxter maximum joint velocities.

Actuation

Baxter Conductor Motion

Setup

To set up the execution of our Baxter motions, we use Baxter’s joint trajectory controller and start an action server with “rosrun baxter_interface joint_trajectory_action_server.py”. To set up the planning of our motions, we start MoveIt with “roslaunch baxter_conductor_moveit_config demo_baxter.launch”. Here, MoveIt is configured using our modified kinematics.yaml and joint_limits.yaml to use the TRAC-IK solver and Baxter joint velocity limits. This starts the move_group node which communicates with the JointTrajectoryActionServer to actuate the motors given a planned trajectory. The use of an action server allows us to cancel a plan during execution, which is important for timing.

Getting Goal Position Specifics

For conducting, there is a set of positions that the hands/arms are in at certain timings of the music in order to give visual cues to the musicians. For the right arm, these are the beats for a given time signature. For the left arm, in addition to the beats, there are additional movements for controlling the loudness/dynamics of the music. Finally, there is an ending motion to coordinate the stop of the music.


These positions were recreated by hand on the Baxter robot, and a helper function save_joint_vals.py was used in order to name and save the joint positions as text files in the motion/positions/ folder. These positions were recorded for each arm individually, so different sequences of left and right arm motions were supported.


Care was taken with setting each joint so that each connecting motion would have efficient joint movements, and pose symmetrical for both arms when applicable.


Execution and Timing

Execution of the robot commands sent by the music reader is done by the function conductor_motion. This node subscribes to the conductor_commands topic and executes the motions given in music_commands messages with music_commands.tempo setting the time allocated to each motion.


The planner is initialized and we iterate through the list of motions given for each arm. At each step, the joint position indicated by the motion name is retrieved from a text file stored in motion/positions/. The joint positions for each arm are combined to form a goal joint position for both arms, which is input into the planner, and a plan is output.


With this plan, we use the execute function in MoveGroupCommander with synchronous execution (wait = False) so that we can run a timer and stop the execution when the time allocated by tempo has passed. From wherever the arm stopped, a plan for the next step was calculated. This results in a compact right arm beat pattern when tempo is increased.

Results

Hot Cross Buns



Canon D

Orchestra was able to successfully read music in 4/4, accurately read the pitch and duration of notes, and parse it in a readable format with no errors. From here, we were able to feed in the text format into our planning parser for further processing.


We were able to read the output text representation of the sheet music and send the appropriate information through ROS topics using a publish and subscribe system. Both the OMR software and the planner were configured to work reliably before the project showcase in order for Baxter to make the appropriate executions.


Baxter was able to execute the conducting motions for a simple case (Hot Cross Buns) and a more complex case (Canon in D). Tempos up to 75 beats per minute (BPM) were achieved.


Hot Cross Buns requires 4 sets (measures) of 4/4 conducting pattern with both arms with the last set being an ending sequence of two beats, a hold, and a cutoff. Baxter performs these actions with correct tempo at 60 BPM, staying synced with the music.


Canon in D has a more complex pattern which includes a left arm crescendo cue (where the arm rises over the course of a measure). To set up this crescendo motion, the left arm rests in a neutral position for the prior measure, and following the crescendo, resumes the 4/4 conducting pattern. While Baxter followed the music at 60 BPM well through the crescendo, the transition into resuming the 4/4 conducting pattern lagged behind the beat. The ending motion over a whole note (4 beats) was well executed.


A faster tempo 75 BPM for Hot Cross Buns was able to keep the timing with the music. However, the faster cutoffs of the motions caused a right arm motion to fail for one beat. Here, although the planner was unable to find a path for that beat, it was able to recover on the following motion. This recovery is a good sign of the robustness of the planning and execution design, since the conductor does not stop with a single error, which becomes even more important when conducting a long piece.

Conclusion

Overall, our group was very satisfied with our results. We were able to meet all of our milestone requirements that we initially set, and we were able to come up with a solution that was overall quite presentable. All of the smooth movements that we tried to implement, such as the cutoff, hold, crescendo, and transition motions, were successfully done.


While we were testing the Orchestra OMR software with different types of sheet music so that an image file from a phone can be used as input, we realized that the OMR software only works consistently with images that only contain music without words such as the title and lyrics. Also, lots of musical symbols such as tempo markings, fermatas and time signatures other than 4/4 were also unrecognized and led to gibberish output. So we had to use MuseScore and manually type out the cleanest version of sheet music that we possibly could for Hot Cross Buns and Canon in D so that the software could read it properly. If we had more time, we would definitely modify more of the Orchestra OMR software itself and use more diverse sheet music as training data so that the CV algorithm is able to detect and read more symbols accurately.


Currently, the speed of the arms can be specified in any range from 10 bpm to 80 bpm. The robot will intelligently adjust its limb speed in real-time to ensure that it always hits the next pose exactly when the next beat occurs. The robot will also execute smaller movements if the beats per minute is very high (this is to replicate human conductors since they will move their hands shorter distances if they need to conduct very fast).


Above 75 BPM, we observed that the conductor would get repeatedly “stuck” at certain positions, like the faster Hot Cross Buns example. Since this is caused by bad paths between the cut off joint position and the next target, implementing a new set of joint positions for a smaller conducting pattern for faster tempos may help resolve this issue. There would be a range of tempos where the normal and smaller patterns are used, and within those, the timing will modify their sizes further.

The entire application can be run with once single command that executes every required script. The user can run ./FullScript.sh and specity a beats per minute value, and the robot will execute all the required steps for conducting (recognizing the sheet music image, converting it to conductor motions, and executing those motions). Currently, the robot will conduct an image of sheet music that is stored in src/sensing/camera_output. Ideally, a future improvement would be to let the robot use the head camera to recognize a real piece of paper with sheet music (probably by placing an AR tag next to it), then storing the cropped image from the head camera in src/sensing/camera_output. This addition would connect perfectly to the code that we have already written.



Our project goal in terms of application was in teaching/providing practice for young musicians. At a higher level, the conductor interprets the music and can add small nuances as well as adjust the sound of the orchestra in real time. Further improvements to our project may include adding more left hand motions:

  • Cues for entrances of certain instruments
  • Emphasizing accented notes
  • Fermatas - hold until the conductor moves to next note


With a full orchestra and sensors, a real time feedback component can be added to listen to each section and adjust their volume with left hand gestures.

>

Team

Cooper Collier

Cooper Collier is a 4th year CS major. He worked on improving the movement engine for the robot, in particular by controlling the speed of movements depending on the tempo of the song

Dean Zhang

Dean Zhang is a 4th year EECS major. He worked on the sensing and planning portions of the project. He migrated and modified the OMR software, wrote the text representation parser and helped with the ROS publisher and subscriber nodes to send robot commands.

Fei Du

Fei Du is a 4th year EECS major. He worked on building the website and testing the package used in this project.

Ryu Akiba

Ryu Akiba is a 4th year applied mathematics major with a concentration in fluid mechanics. He enjoys working with data and mathematical models. He worked on the motion and timing of the robot conductor, as well as figuring out the MoveIt configuration.

Additional Materials