Контекстно-бесплатная грамматика с неоднозначностью - небольшой компилятор, использующий генератор Ply lex/yacc
Я пишу небольшой компилятор, использующий генератор Ply lex/yaac для Python, для класса компилятора. Прямо сейчас мы погружаемся в фазу семантического анализа и вскоре к генерации кода. Но моя грамматика еще не завершена, так как я не могу написать правило для вызова функции, поскольку все, что я до сих пор пробовал, заканчивается конфликтом сдвига / уменьшения, что делает правило игнорируемым.
Есть следующие правила: function_call : ID function_call_args
а также:
function_call_args : expression
| function_call
также:
program : function_definition program
| class_definition program
| function_call
| empty
Когда я добавляю правило expression : function_call
это оказывается неоднозначным и дает сдвиг / уменьшение конфликта. Я хотел, чтобы function_call в моей грамматике мог быть выражением, тогда можно было бы написать некоторый код, например, a + b + f(2) * g(a/x) - h().
Независимо от того, добавляю ли я пустое в качестве альтернативы function_call_args или logic_expression, это также оказывается неоднозначным. Или я просто говорю, что мой оператор return для возврата function_call в качестве альтернативы также get уменьшает конфликт. Последнее не имеет значения, если бы function_call мог стать альтернативой выражению, этого было бы достаточно, чтобы иметь возможность вызывать функцию внутри любого места функции и даже если она возвращает вызов функции.
Ниже следует полный код лексера, парсера и полной грамматики:
Полная грамматика
Rule 0 S' -> program
Rule 1 program -> function_definition program
Rule 2 program -> class_definition program
Rule 3 program -> function_call
Rule 4 program -> empty
Rule 5 function_call -> ID function_call_args
Rule 6 function_call_args -> expression
Rule 7 function_call_args -> function_call
Rule 8 class_definition -> DEFINE CLASS ID LPAREN RPAREN LBRACE program RBRACE
Rule 9 function_definition -> DEFINE data_type ID LPAREN function_definition_args RPAREN block
Rule 10 function_definition_args -> var_list
Rule 11 function_definition_args -> empty
Rule 12 var_list -> var_declaration COMMA var_list
Rule 13 var_list -> var_declaration
Rule 14 var_declaration -> data_type ID
Rule 15 data_type -> VOID
Rule 16 data_type -> INT
Rule 17 data_type -> FLOAT
Rule 18 data_type -> STRING
Rule 19 data_type -> CHAR
Rule 20 block -> LBRACE statements RBRACE
Rule 21 statements -> statement statements
Rule 22 statements -> empty
Rule 23 statement -> matched_statement
Rule 24 statement -> unmatched_statement
Rule 25 matched_statement -> IF LPAREN logical_expression RPAREN matched_statement ELSE matched_statement
Rule 26 matched_statement -> WHILE LPAREN logical_expression RPAREN matched_statement
Rule 27 matched_statement -> FOR LPAREN var_declaration SEMI_COLON logical_expression SEMI_COLON attribution RPAREN matched_statement
Rule 28 matched_statement -> return_statement
Rule 29 matched_statement -> compound_statement
Rule 30 matched_statement -> simple_statement
Rule 31 matched_statement -> block
Rule 32 unmatched_statement -> IF LPAREN logical_expression RPAREN matched_statement
Rule 33 unmatched_statement -> IF LPAREN logical_expression RPAREN matched_statement ELSE unmatched_statement
Rule 34 logical_expression -> logical_term OR logical_term
Rule 35 logical_expression -> logical_term
Rule 36 logical_term -> logical_factor AND logical_factor
Rule 37 logical_term -> logical_factor
Rule 38 logical_factor -> boolean_statement
Rule 39 logical_factor -> NOT logical_factor
Rule 40 logical_factor -> LPAREN logical_expression RPAREN
Rule 41 boolean_statement -> value comparison_op value
Rule 42 boolean_statement -> value
Rule 43 value -> ID
Rule 44 value -> number
Rule 45 number -> int
Rule 46 number -> float
Rule 47 int -> INTCONST
Rule 48 float -> FLOATCONST
Rule 49 comparison_op -> LESS_THAN
Rule 50 comparison_op -> LESS_EQUAL
Rule 51 comparison_op -> GREATER_THAN
Rule 52 comparison_op -> GREATER_EQUAL
Rule 53 comparison_op -> EQUAL
Rule 54 comparison_op -> NOT_EQUAL
Rule 55 compound_statement -> var_list
Rule 56 compound_statement -> attribution
Rule 57 simple_statement -> expression
Rule 58 attribution -> var_declaration EQUALS expression
Rule 59 attribution -> var_declaration EQUALS string
Rule 60 attribution -> var_declaration EQUALS char
Rule 61 attribution -> ID EQUALS expression
Rule 62 attribution -> ID EQUALS STRINGCONST
Rule 63 attribution -> ID EQUALS CHARCONST
Rule 64 string -> STRINGCONST
Rule 65 char -> CHARCONST
Rule 66 expression -> expression PLUS term
Rule 67 expression -> expression MINUS term
Rule 68 expression -> term
Rule 69 term -> term TIMES factor
Rule 70 term -> term DIVIDE factor
Rule 71 term -> term MOD factor
Rule 72 term -> factor
Rule 73 factor -> number
Rule 74 factor -> ID
Rule 75 factor -> factor_expr
Rule 76 factor_expr -> LPAREN expression RPAREN
Rule 77 uminus_expression -> MINUS LPAREN expression RPAREN
Rule 78 empty -> <empty>
Rule 79 return_statement -> RETURN expression
Rule 80 function_definition -> DEFINE data_type ID LPAREN function_definition_args RPAREN error
лексер:
import ply.lex as lex
import ply.yacc as yacc
# Reserved words
reserved = {
'if' : 'IF',
'else' : 'ELSE',
'while' : 'WHILE',
'for' : 'FOR',
'switch' : 'SWITCH',
'case' : 'CASE',
'class' : 'CLASS',
'define' : 'DEFINE',
'int' : 'INT',
'float' : 'FLOAT',
'char' : 'CHAR',
'string' : 'STRING',
'void' : 'VOID',
'equal' : 'EQUAL',
'and' : 'AND',
'or' : 'OR',
'not' : 'NOT',
'do' : 'DO',
'return' : 'RETURN',
}
# The tokens declaration is made here.
tokens = [
# Literals (identifier, integer constant, float constant, string constant,
# char const)
# TODO add constants' Types
'ID',
#'NUMBER',
# Operators +,-,*,/,%
'PLUS',
'MINUS',
'TIMES',
'DIVIDE',
'MOD',
# Logical Operators
'LESS_THAN',
'LESS_EQUAL',
'GREATER_THAN',
'GREATER_EQUAL',
'NOT_EQUAL',
# Delimeters such as (),{},[],:
'LPAREN',
'RPAREN',
'LBRACE',
'RBRACE',
'COLON',
'COMMA',
'SEMI_COLON',
#Assignment Operators
'EQUALS',
# Integer literal
'INTCONST',
# Floating literal
'FLOATCONST',
# String literal
'STRINGCONST',
# Character constant 'c' or L'c'
'CHARCONST'
]
tokens += list(reserved.values())
# Regular expression rules for simple tokens
#t_NUMBER = r'\d+'
# Operators
t_PLUS = r'\+'
t_MINUS = r'-'
t_TIMES = r'\*'
t_DIVIDE = r'/'
t_MOD = r'\%'
# Logical Operators
t_LESS_THAN = r'<'
t_LESS_EQUAL = r'<='
t_GREATER_THAN = r'>'
t_GREATER_EQUAL = r'>='
t_NOT_EQUAL = r'!='
# Delimiters
t_LPAREN = r'\('
t_RPAREN = r'\)'
t_COLON = r'\:'
t_LBRACE = r'\{'
t_RBRACE = r'\}'
t_COMMA = r','
t_SEMI_COLON = R';'
# Assignment
t_EQUALS = r'='
# Integer literal
t_INTCONST = r'\d+'
# Floating literal
t_FLOATCONST = r'\d+[.]\d+'
# String literal
t_STRINGCONST = r'\"([^\\\n]|(\\.))*?\"'
# Character constant 'c' or L'c'
t_CHARCONST = r'\'([^\\\n]|(\\.))*?\''
# Rule for identificator
def t_ID(t):
r'[a-zA-Z_][a-zA-Z0-9_]*'
t.type = reserved.get(t.value,'ID') # Check for reserved words
return t
# Define a rule so we can track line numbers
def t_newline(t):
r'\n+'
t.lexer.lineno += len(t.value)
# Compute column.
# input is the input text string
# token is a token instance
def find_column(input, token):
line_start = input.rfind('\n', 0, token.lexpos) + 1
return (token.lexpos - line_start) + 1
# A string containing ignored characters (spaces and tabs)
t_ignore = ' \t'
# Error handling rule
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# One line comments Python alike
def t_comment(t):
r"[ ]*\043[^\n]*" # \043 is '#'
pass
# Build the lexer
#lexer = lex.lex()
lex.lex()
if __name__ == '__main__':
lex.runmain()
Parser:
import sys
sys.path.append("..")
import ply.yacc as yacc
from lexical_analyzer import tokens
from ast import *
from semantical_analyser.semantical_analyzer import SymbolTable, Var, Scope
from semantical_analyser.stack import Stack
symbolTable = SymbolTable()
stack = Stack()
stack.push(Scope())
precedence = (
('nonassoc', 'LESS_THAN', 'GREATER_THAN'), # Nonassociative operators
('nonassoc', 'LESS_EQUAL', 'GREATER_EQUAL'), # Nonassociative operators
('left', 'PLUS', 'MINUS'),
('left', 'TIMES', 'DIVIDE', 'MOD'),
('right', 'UMINUS'), # Unary minus operator
)
# Initial Rule
def p_program(p):
'''program : function_definition program
| class_definition program
| function_call
| empty'''
if len(p) == 3:
p[0] = Program(p[1], p[2])
else:
p[0] = p[1]
stack.push(Scope())
pass
#def p_function_call_expr(p):
# '''expression : function_call'''
def p_function_call(p):
'''function_call : ID LPAREN function_call_args RPAREN'''
#p[0] = Node("FUNCTION_CALL", [p[1], p[2], p[3]])
p[0] = FunctionCall(p[1], p[2])
pass
def p_function_call_args(p):
'''function_call_args : expression
| function_call'''
#p[0] = Node("FUNCTION_CALL_ARGS", [p[1]])
p[0] = p[1]
pass
def p_class_definition(p):
''' class_definition : DEFINE CLASS ID LPAREN RPAREN LBRACE program RBRACE'''
#p[0] = ClassDefinition("CLASS_DEFINITION", [p[1], p[2], p[3], p[4], p[5], p[6]], p[7], p[8]])
p[0] = ClassDefinition(p[3], p[7])
pass
# Function definition
def p_function_definition(p):
'''function_definition : DEFINE data_type ID LPAREN function_definition_args RPAREN block'''
#p[0] = Node("FUNCTION_DEFINITION", [p[1], p[2], p[3], p[4], p[5], p[6], p[7]])
p[0] = FunctionDefinition(p[2], p[3], p[5], p[7])
pass
# Arguments that goes inside a function definition
def p_function_definition_args(p):
'''function_definition_args : var_list
| empty'''
#p[0] = Node("FUNCTION_DEFINITION_ARGS", [p[1]])
p[0] = p[1]
pass
# list of declaration of variables or a single variable
def p_var_list(p):
'''var_list : var_declaration COMMA var_list
| var_declaration'''
if len(p) >= 3:
p[0] = VarList(p[1], p[3])
else:
p[0] = p[1]
pass
# declaration of var
def p_var_declaration(p):
'var_declaration : data_type ID'
p[0] = VarDeclaration(p[1], p[2])
var = Var("ID", p[2], p[1])
symbolTable.addVar(var)
topOfStack = stack.peek()
topOfStack.addObj(var)
pass
# data types for variables
def p_data_type(p):
'''data_type : VOID
| INT
| FLOAT
| STRING
| CHAR'''
p[0] = DataType(p[1])
pass
# block - a block is enclosed by braces {stmt}
def p_block(p):
'block : LBRACE statements RBRACE'
#p[0] = Node("BLOCK", [p[1], p[2], p[3]])
p[0] = Block(p[2])
#stack.push(Scope())
pass
def p_statements(p):
'''statements : statement statements
| empty'''
if len(p) >= 3:
p[0] = Statements(p[1], p[2])
else:
p[0] = Node("EMPTY", [])
pass
# statement is either a compound statement or a simple statement
def p_statement(p):
'''statement : matched_statement
| unmatched_statement'''
#p[0] = Node("STATEMENT", [p[1]])
p[0] = p[1]
pass
# Matched Statement - This is used to deal with If ambiguity
def p_matched_statement(p):
''' matched_statement : IF LPAREN logical_expression RPAREN matched_statement ELSE matched_statement
| WHILE LPAREN logical_expression RPAREN matched_statement
| FOR LPAREN var_declaration SEMI_COLON logical_expression SEMI_COLON attribution RPAREN matched_statement
| return_statement
| compound_statement
| simple_statement
| block'''
if len(p) == 9:
#p[0] = Node("FOR-CLAUSE", [p[1], p[2], p[3], p[4], p[5], p[6], p[7], p[8], p[9]])
p[0] = ForStatement(p[3], p[5], p[7], p[9])
elif len(p) == 8:
#p[0] = Node("IF-ELSE-CLAUSE", [p[1], p[2], p[3], p[4], p[5], p[6], p[7]])
p[0] = IfElseStatement(p[3], p[5], p[7])
elif len(p) == 6:
#p[0] = Node("WHILE-CLAUSE", [p[1], p[2], p[3], p[4], p[5]])
p[0] = WhileStatement(p[3], p[5])
else:
#p[0] = Node("MATCHED_STATEMENT", [p[1]])
p[0] = p[1]
pass
# Unmateched statement - This is used to deal with If ambiguity
def p_unmatched_statement(p):
''' unmatched_statement : IF LPAREN logical_expression RPAREN matched_statement
| IF LPAREN logical_expression RPAREN matched_statement ELSE unmatched_statement '''
if len(p) == 7:
#p[0] = Node("IF-ELSE-CLAUSE", [p[1], p[2], p[3], p[4], p[5], p[6]])
p[0] = IfElseStatement(p[3], p[5], p[6])
else:
# p[0] = Node("IF-CLAUSE", [p[1], p[2], p[3], p[4], p[5]])
p[0] = IfStatement(p[3], p[5])
#stack.push(Scope())
#currentScope = stack.peek()
#newScope = Scope()
#currentScope.addObj(newScope)
pass
# Logical expression OR
def p_logical_expression_or(p):
'logical_expression : logical_term OR logical_term'
p[0] = Node("LOGICAL_EXPRESSION", [p[1], p[2]], p[3])
pass
# Logical expression term
def p_logical_expression_term(p):
'logical_expression : logical_term'
p[0] = Node("LOGICAL_EXPRESSION_TERM", [p[1]])
pass
# Logical expression AND
def p_logical_term_and(p):
'logical_term : logical_factor AND logical_factor'
#p[0] = Node("LOGICAL_EXPRESSION_AND", [p[1]])
p[0] = BinaryOp(p[1], p[2], p[3])
pass
# Logical expression factor
def p_logical_term_factor(p):
'logical_term : logical_factor'
#p[0] = Node("LOGICAL_EXPRESSION_FACTOR", [p[1]])
p[0] = p[1]
pass
# Logical expression NOT, boolean or another logical expression enclosed by parens
def p_logical_factor(p):
'''logical_factor : boolean_statement
| NOT logical_factor
| LPAREN logical_expression RPAREN'''
if len(p) == 4:
#p[0] = Node("LOGICAL_EXPRESSION_FACTOR", [p[1], p[2], p[3]])
p[0] = BinaryOp(p[1], p[2], p[3])
elif len(p) == 3:
#p[0] = Node("LOGICAL_EXPRESSION_FACTOR", [p[1], p[2]])
p[0] = NotExpression(p[2])
else:
#p[0] = Node("LOGICAL_EXPRESSION_FACTOR", [p[1]])
p[0] = p[1]
pass
# boolean_statement
def p_boolean_statement(p):
'''boolean_statement : value comparison_op value
| value'''
if len(p) == 4:
#p[0] = Node("BOOLEAN_STATEMENT", [p[1], p[3]], p[2])
p[0] = BinaryOp(p[1], p[2], p[3])
else:
#p[0] = Node("BOOLEAN_STATEMENT", [p[1]])
p[0] = p[1]
pass
def p_value(p):
''' value : ID
| number'''
#p[0] = Node("ID_OR_NUMBER", [p[1]])
p[0] = p[1]
pass
def p_number(p):
'''number : int
| float'''
p[0] = Number(p[1])
pass
def p_int(p):
'int : INTCONST'
p[0] = Const('int', p[1])
pass
def p_float(p):
'float : FLOATCONST'
p[0] = Const('float', p[1])
pass
def p_comparison_op(p):
'''comparison_op : LESS_THAN
| LESS_EQUAL
| GREATER_THAN
| GREATER_EQUAL
| EQUAL
| NOT_EQUAL'''
#p[0] = Node("COMPARISON_OP", [p[1]])
p[0] = p[1]
pass
# Compound statement is a while or for or do while or switch case
def p_compound_statment(p):
'''compound_statement : var_list
| attribution'''
#p[0] = Node("COMPOUND_STATEMENT", [p[1]])
p[0] = p[1]
pass
# A simple statement is a declaration, or
def p_simple_statement(p):
'''simple_statement : expression'''
#p[0] = Node("SIMPLE_STATEMENT", [p[1]])
p[0] = p[1]
pass
# Production for atrribution
def p_atrribution(p):
'''attribution : var_declaration EQUALS expression
| var_declaration EQUALS string
| var_declaration EQUALS char
| ID EQUALS expression
| ID EQUALS STRINGCONST
| ID EQUALS CHARCONST'''
#p[0] = Node("ATTRIBUTION", [p[1], p[3]], p[2])
#p[0] = BinaryOp(p[1], p[2], p[3])
p[0] = Attribution(p[1], p[3])
pass
def p_string(p):
'string : STRINGCONST'
p[0] = Const('string', p[1])
pass
def p_char(p):
'char : CHARCONST'
p[0] = Const('char', p[1])
pass
# Gramatical rules for expressions
def p_expression_plus(p):
'expression : expression PLUS term'
#p[0] = Node("binop_plus", [p[1],p[3]], p[2])
p[0] = BinaryOp(p[1], p[2], p[3])
pass
def p_expression_minus(p):
'expression : expression MINUS term'
#p[0] = Node("BINOP_MINUS", [p[1], p[3]], p[2])
p[0] = BinaryOp(p[1], p[2], p[3])
pass
def p_expression_term(p):
'expression : term'
#p[0] = Node("BINOP_TERM", [p[1]])
p[0] = p[1]
pass
def p_term_times(p):
'term : term TIMES factor'
#p[0] = Node("BINOP_TERM_TIMES", [p[1], p[3]], p[2])
p[0] = BinaryOp(p[1], p[2], p[3])
pass
def p_term_div(p):
'term : term DIVIDE factor'
#p[0] = Node("BINOP_DIVIDE", [p[1], p[3]], p[2])
p[0] = BinaryOp(p[1], p[2], p[3])
pass
def p_term_mod(p):
'term : term MOD factor'
#p[0] = Node("BINOP_MOD", [p[1], p[3]], p[2])
p[0] = BinaryOp(p[1], p[2], p[3])
pass
def p_term_factor(p):
'term : factor'
#p[0] = Node("BINOP_FACTOR", [p[1]])
p[0] = p[1]
pass
def p_factor_num(p):
'''factor : number
| ID
| factor_expr'''
#p[0] = Node("BINOP_FACTOR", [p[1]])
p[0] = p[1]
pass
def p_factor_expr(p):
'factor_expr : LPAREN expression RPAREN'
#p[0] = Node("BINOP_EXPR", [p[1], p[3]], p[2])
p[0] = p[1]
pass
def p_uminus_expr(p):
'uminus_expression : MINUS LPAREN expression RPAREN %prec UMINUS'
pass
# The empty production
def p_empty(p):
'empty :'
p[0] = None #Node("EMPTY", [])
pass
# Return statement1
def p_return_statement(p):
'return_statement : RETURN expression'
#p[0] = Node("RETURN_STATEMENT", [p[1], p[2]])
p[0] = ReturnStatement(p[2])
pass
# Error rule for syntax errors
def p_error(p):
if p:
print("Syntax error at token", p.type)
print("Syntax error at '%s'" % p.value)
print("line : '%s'" % p.lineno)
print("column: '%s'" % p.lexpos)
#print("Syntax error in input!")
#parser.errok()
else:
print("Syntax error at EOF")
def p_function_definition_error(p):
'''function_definition : DEFINE data_type ID LPAREN function_definition_args RPAREN error'''
print("Syntax error in function statement at '%s'" % p)
#print('Line' + str(p.linespan(1)))
parser = yacc.yacc()
from read_file_into_buffer import readFileIntoBuffer
data = readFileIntoBuffer('test.fpl')
ast = parser.parse(data, tracking=True)
print(ast)
1 ответ
Предполагая, что ваш язык C-подобен, f 2
не является допустимым вызовом функции; это должно быть написано f(2)
, Но ваша грамматика говорит:
Rule 5 function_call -> ID function_call_args
Rule 6 function_call_args -> expression
что, безусловно, позволяет f 2
(который является ID
с последующим expression
). С другой стороны, эта грамматика не позволяет f()
ни f(1,2)
,
(Третье производство function_call_args -> function_call
избыточно, так как у вас есть function_call_args -> expression
а также expression -> function_call
, Избыточное производство создает конфликты при разборе, поэтому я не учел это.)
Что вам нужно function_call
быть что-то очень похоже на function_definition
, с явными скобками и необязательным списком выражений.