Parsing citations is essential for integrating bibliographical information published on the Internet. Most citation management techniques are based on the assumption that we can correctly identify the main components of a citation, such as authors’ names, title, publication venue, date, and the number of pages. In this paper, we propose a sequence-alignment based citation parser, called BibPro, to extract components of citations in any given formats. The basic idea of BibPro is to capture the structural properties from semi-structured format and transform these properties into a sequence template. The structural properties of a citation string include the order of punctuation marks and local structure in each field of a citation string. We use an encoding table and reserved words, which is automatically trained from the dataset, to represent each semantic unit as a unique symbol; and use a blocking process to capture local structure in each citation field. After building up sufficient amount of encoded sequence templates, BibPro then applies a sequence alignment software, e.g., BLAST (Basic Local Alignment Search Tool), to match the query citation string with the sequence templates.